How Identity Resolution Works

This article provides a technical explanation of how identity resolution operates within the Untitled platform, including the phases of resolution, validation logic, and matching safeguards.

Introduction

The system is designed to maintain high confidence at internet scale while balancing match rate and precision.

Identity resolution within Untitled is built on:

  • Deterministic signal triangulation

  • Distributed validation across large datasets

  • Cross-device HEM anchoring

  • Ongoing feedback loops

  • Confidence reinforcement over time

Accuracy is not binary. It is signal-weighted and environment-dependent. In production environments with sufficient traffic and activation, matching accuracy typically falls within the 90–95% range.


Overview of the Resolution Framework

Identity resolution follows a three-phase framework:

  1. Verification — Establishing deterministic linkage between digital identifiers and a primary identity marker

  2. Efficacy — Expanding that linkage through the Identity Graph and enrichment layers

  3. Accuracy — Reinforcing and improving record confidence through activation feedback loops

Each phase builds on the prior one.


Phase 1: Verification

Objective

Establish a deterministic linkage between a browser event and a primary identity marker.

Primary Identity Marker

The system uses Hashed Email Addresses (HEMs) as the foundational identity anchor within the Identity Graph.

HEMs enable:

  • Cross-device linkage

  • Offline-to-online reconciliation

  • Deterministic record reinforcement

Signals Captured When the Tag Fires

When the Untitled Identity Tag executes, the system observes:

  • IP address

  • Device characteristics

  • Browser characteristics

  • Cookie identifiers

  • Timestamped engagement activity

Specifications of this can be found in this article.

Triangulation Process

The system attempts to triangulate:

  • Observed IP address

  • Device-to-browser relationship

  • Cookie identifier

  • Known HEM associations

Validation includes:

  • Comparing observed IP against known IP footprint history

  • Validating device-to-HEM relationships

  • Timestamp alignment across signals

If sufficient deterministic alignment exists, a valid linkage is formed.


Why Signal Density Matters

The more frequently a visitor is observed, the more confidence increases.

Accuracy improves:

  • Proportionally with site scale

  • Logarithmically with session frequency

In practical terms: More tag fires → More triangulation → Higher confidence.

Low-traffic environments provide fewer signals, increasing the likelihood of ambiguity.


Phase 2: Efficacy

Objective

Expand and enrich the verified identity through the Identity Graph.

Once a valid HEM linkage is established, the system maps online identifiers to:

  • B2C demographic data

  • B2B firmographic data

  • Cross-device associations

  • Behavioral attributes

Distributed Validation Model

To maintain record confidence at scale, the system relies on:

  • First-party and third-party cookie sync partnerships

  • Email engagement IP footprint validation

  • Mobile movement data (MAID/HEM associations)

  • Log-level timestamp validation

Rather than correcting individual records in isolation, the system relies on distributed signal consensus. This avoids overfitting to single data points and instead maintains majority confidence alignment across signals.


The Self-Healing Mechanism

Records are continuously evaluated for:

  • Signal reinforcement

  • Drift detection

  • Stale or invalid attributes

If signals weaken or contradict prior associations, confidence scoring adjusts accordingly. This distributed reinforcement model enables ongoing recalibration without sacrificing match scale.


Phase 3: Accuracy

Objective

Increase confidence through real-world activation and client-side validation.

The highest level of resolution occurs when:

  • Records are activated in marketing channels

  • A conversion or transaction occurs

  • First-party data confirms identity

When this feedback is ingested:

  • Confidence improves locally

  • Enrichment becomes more precise

  • Similar audience modeling improves

This creates a reinforcing loop: Activation → Validation → Confidence Improvement → Better Future Resolution


Deterministic vs Probabilistic Matching

Current State

The system is primarily deterministic and rule-based.

Strengths:

  • High confidence in observed signals

  • Reduced reliance on speculative modeling

  • Strong performance in high-signal environments

Limitation:

  • In low-volume settings, deterministic logic may produce extraneous linkages from partial signals.


Upcoming Enhancements

We are rolling out limited probabilistic modeling designed to:

  • Evaluate statistical likelihood that multiple records belong to the same individual

  • Reduce deterministic false positives

  • Improve ambiguity handling

  • Return only records meeting defined confidence thresholds

Trade-off:

  • Slightly reduced attribute fill in smaller datasets

  • Meaningfully higher precision overall

Everything is a balance between match rate and confidence.


What Can Undermine Accuracy?

Identity resolution can be impacted by:

  • Low traffic volume (<1K monthly uniques)

  • Inconsistent visitor sessions

  • Heavy post-resolution filtering

  • Small export subsets

  • QA or Dev environments

These conditions reduce signal density and increase ambiguity.

Last updated

Was this helpful?