Data Quality & Identity Resolution
This article explains how identity resolution works within the Untitled platform, what impacts data quality (DQ), and how to think about trade-offs when activating records.
Internet-scale identity data is inherently probabilistic at the margins. While our system is built to maintain high confidence levels, understanding how and why mismatches occur is critical to evaluating performance properly.
The goal is not theoretical perfection. The goal is maximizing reliable match rates while maintaining strong confidence thresholds that support positive unit economics in marketing activation.
What Is Identity Resolution?
Identity resolution is the process of connecting anonymous website visitors to a real individual or business profile.

Standalone websites typically identify only 1–2% of traffic using first-party cookies or user-submitted data. Untitled expands that capability by leveraging:
A distributed Identity Graph
Deterministic validation signals
Ongoing data feedback loops
This allows for identification of a materially larger percentage of anonymous traffic, provided sufficient signal density exists.
Expected Accuracy Levels
Across production deployments, identity matching accuracy typically falls in the 90–95% range.
Important context:
Internet-scale datasets are never 100% perfect.
A small percentage of mismatches is expected in any large identity system.
Accuracy improves with traffic scale and engagement frequency.
Small filtered exports can appear less accurate than overall system averages.
Accuracy is influenced by signal density, not just matching logic.
What Impacts Data Quality?
Several variables influence match confidence:
1. Traffic Volume
Websites with:
Fewer than ~1,000 monthly unique visitors
Newly launched properties
QA or Dev environments
May exhibit more variability in match confidence.
Performance becomes materially more stable above ~5,000 monthly uniques.
2. Session Frequency
Accuracy improves logarithmically with engagement frequency.
More repeat sessions and longer engagement windows create stronger triangulation between:
IP address
Device
Browser
Hashed email identifier
More signals = higher confidence.
3. Filtering & Export Size
Applying heavy demographic or firmographic filters and then exporting extremely small subsets can create the perception of lower accuracy. Small samples amplify anomalies.
Deterministic Matching Today
The current system relies primarily on deterministic matching and rule engines.
This approach:
Performs strongly in high-traffic environments
Avoids excessive modeling assumptions
Prioritizes observed validation over speculative inference
However, in low-signal environments, deterministic systems can produce what we call extraneous linkages — technically valid but lower-confidence matches formed from partial signals.

We are actively rolling out probabilistic signal filtering upgrades to reduce false positives and improve confidence scoring when data is ambiguous.
The Feedback Loop Effect
The highest level of accuracy occurs when:
Records are activated in marketing channels
Users convert or transact
First-party data is fed back into the platform
As activation increases, local accuracy improves.
This creates a reinforcing cycle:
More activation → More signal → Better resolution → Higher confidence

The Strategic Perspective
When evaluating data quality, the relevant question is not: “Is this 100% perfect?”
The relevant questions are:
Is the majority of the dataset reliable?
Is the cost of activation justified by the confidence level?
Are we applying appropriate filters for higher-cost channels?
Are we leveraging intent signals to increase precision?
Identity resolution at scale is about disciplined confidence management and unit economics. The following articles in this section dive deeper into this topic.
Last updated
Was this helpful?