How Identity Resolution Works
This article provides a technical explanation of how identity resolution operates within the Untitled platform, including the phases of resolution, validation logic, and matching safeguards.
Introduction
The system is designed to maintain high confidence at internet scale while balancing match rate and precision.
Identity resolution within Untitled is built on:
Deterministic signal triangulation
Distributed validation across large datasets
Cross-device HEM anchoring
Ongoing feedback loops
Confidence reinforcement over time
Accuracy is not binary. It is signal-weighted and environment-dependent. In production environments with sufficient traffic and activation, matching accuracy typically falls within the 90–95% range.
Overview of the Resolution Framework

Identity resolution follows a three-phase framework:
Verification — Establishing deterministic linkage between digital identifiers and a primary identity marker
Efficacy — Expanding that linkage through the Identity Graph and enrichment layers
Accuracy — Reinforcing and improving record confidence through activation feedback loops
Each phase builds on the prior one.
Phase 1: Verification

Objective
Establish a deterministic linkage between a browser event and a primary identity marker.
Primary Identity Marker
The system uses Hashed Email Addresses (HEMs) as the foundational identity anchor within the Identity Graph.
HEMs enable:
Cross-device linkage
Offline-to-online reconciliation
Deterministic record reinforcement
Signals Captured When the Tag Fires
When the Untitled Identity Tag executes, the system observes:
IP address
Device characteristics
Browser characteristics
Cookie identifiers
Timestamped engagement activity
Specifications of this can be found in this article.
Triangulation Process
The system attempts to triangulate:
Observed IP address
Device-to-browser relationship
Cookie identifier
Known HEM associations
Validation includes:
Comparing observed IP against known IP footprint history
Validating device-to-HEM relationships
Timestamp alignment across signals
If sufficient deterministic alignment exists, a valid linkage is formed.
Why Signal Density Matters
The more frequently a visitor is observed, the more confidence increases.
Accuracy improves:
Proportionally with site scale
Logarithmically with session frequency
In practical terms: More tag fires → More triangulation → Higher confidence.
Low-traffic environments provide fewer signals, increasing the likelihood of ambiguity.
Phase 2: Efficacy

Objective
Expand and enrich the verified identity through the Identity Graph.
Once a valid HEM linkage is established, the system maps online identifiers to:
B2C demographic data
B2B firmographic data
Cross-device associations
Behavioral attributes
Distributed Validation Model
To maintain record confidence at scale, the system relies on:
First-party and third-party cookie sync partnerships
Email engagement IP footprint validation
Mobile movement data (MAID/HEM associations)
Log-level timestamp validation
Rather than correcting individual records in isolation, the system relies on distributed signal consensus. This avoids overfitting to single data points and instead maintains majority confidence alignment across signals.
The Self-Healing Mechanism
Records are continuously evaluated for:
Signal reinforcement
Drift detection
Stale or invalid attributes
If signals weaken or contradict prior associations, confidence scoring adjusts accordingly. This distributed reinforcement model enables ongoing recalibration without sacrificing match scale.
Phase 3: Accuracy

Objective
Increase confidence through real-world activation and client-side validation.
The highest level of resolution occurs when:
Records are activated in marketing channels
A conversion or transaction occurs
First-party data confirms identity
When this feedback is ingested:
Confidence improves locally
Enrichment becomes more precise
Similar audience modeling improves
This creates a reinforcing loop: Activation → Validation → Confidence Improvement → Better Future Resolution
Deterministic vs Probabilistic Matching
Current State
The system is primarily deterministic and rule-based.
Strengths:
High confidence in observed signals
Reduced reliance on speculative modeling
Strong performance in high-signal environments
Limitation:
In low-volume settings, deterministic logic may produce extraneous linkages from partial signals.
Upcoming Enhancements
We are rolling out limited probabilistic modeling designed to:
Evaluate statistical likelihood that multiple records belong to the same individual
Reduce deterministic false positives
Improve ambiguity handling
Return only records meeting defined confidence thresholds
Trade-off:
Slightly reduced attribute fill in smaller datasets
Meaningfully higher precision overall
Everything is a balance between match rate and confidence.
What Can Undermine Accuracy?
Identity resolution can be impacted by:
Low traffic volume (<1K monthly uniques)
Inconsistent visitor sessions
Heavy post-resolution filtering
Small export subsets
QA or Dev environments
These conditions reduce signal density and increase ambiguity.
Last updated
Was this helpful?