Hashed Email (HEM)

Definition

A hashed email (HEM) is a cryptographic representation of an email address created by running the address through a one-way hash function (e.g., SHA-256). The output is a fixed-length string that cannot be feasibly reversed to reveal the original address when implemented with appropriate safeguards (e.g., salts/HMACs).

Relation to marketing

HEM is widely used as a privacy-preserving match key across platforms and datasets. It lets marketers link first-party records to ad platforms, publishers, measurement systems, and clean rooms without transmitting raw, personally identifiable email addresses. Typical uses include identity resolution, audience onboarding, cross-channel frequency control, suppression, measurement, and attribution in environments where third-party cookies and mobile IDs are constrained.

How to calculate

Inputs and decisions

  • Normalization policy: Define how you standardize emails before hashing. At minimum, trim whitespace and lowercase. Decide (and document) whether to apply domain-specific canonicalization (e.g., handling “+tag” or dot rules) — only do this if both sides use the exact same rules.
  • Hash function: Prefer SHA-256. Avoid MD5 for new implementations.
  • Protection against guessing: Because email space is guessable, use HMAC-SHA-256 with a secret key or salted hashing when you do not need cross-party interoperability. For open-ecosystem matches (e.g., to ad platforms), unsalted SHA-256 is often required; mitigate with contractual and process controls.

Minimal, widely interoperable workflow

  1. Normalize: email_normalized = trim(lowercase(email_raw))
  2. Hash: hem = SHA-256(email_normalized)
  3. Encode as hex; store/transmit the hex string.

More privacy-protective workflow (when both sides can coordinate)

  1. Normalize as above.
  2. Compute HMAC: hem = HMAC-SHA-256(secret_key, email_normalized)
    (Rotate keys; use per-partner keys to prevent linkage across partners.)

Verification

  • Hash a known test email using your policy and confirm equality with counterpart outputs. Version and log your normalization and hashing policy so future changes don’t break matches.

How to utilize

  • Audience onboarding: Convert customer lists to HEM and upload to platforms that accept hashed identifiers to build or refresh custom audiences.
  • Identity resolution: Use HEM as a deterministic key to join records across CRM, CDP, analytics, and ad platforms.
  • Suppression and compliance: Share HEM-based suppression lists with partners without exposing raw emails.
  • Frequency and reach management: Coordinate caps and deduplicate exposure across channels by matching HEMs in clean rooms.
  • Attribution and measurement: Join impression/click logs to conversion files via HEM inside a privacy-safe environment.
  • Look-alike modeling and enrichment: Seed modeling with HEM-mapped audiences where platforms support it.

Comparison to similar approaches

ApproachWhat it isStrengthsLimitationsTypical use
Hashed Email (HEM)One-way hash of normalized emailDeterministic match, privacy-preserving vs raw email, cookie-independentVulnerable to guessing if unsalted; requires consistent normalizationOnboarding, suppression, identity resolution
Raw EmailPlaintext email addressHighest match rate and portabilityHigh privacy and regulatory risk; restricted sharingInternal systems, consented communications
Phone HashHash of phone numberUseful where phones are prevalentFormatting variance; similar guessability risksOnboarding, identity resolution
MAIDMobile advertising ID (IDFA/GAID)Built for ads; device-levelAvailability declining; consent constraintsMobile attribution (where allowed)
First-Party Cookie/IDSite/app-scoped identifierStrong within a domain/appPoor cross-site portabilityOn-site personalization, analytics
Publisher/Platform UIDProprietary user IDsHigh match within ecosystemWalled-garden lock-inWithin a single platform
Clean Room Join KeysEncrypted/computed keys for joinsStrong privacy with compute-in-placeSetup overhead; requires partnersMeasurement, reach, overlap

Best practices

  • Document normalization policy and keep it consistent; version any change.
  • Prefer SHA-256; avoid legacy MD5 for new work.
  • Use HMAC or salted hashes when cross-party interoperability is not required; use per-partner keys.
  • Minimize raw PII exposure: Hash at the edge; restrict access to plaintext emails to the smallest set of services and people.
  • Encrypt in transit and at rest; implement key management with rotation and audit trails.
  • Store only what you need: Retain normalized plaintext only if operationally required; otherwise keep just the HEM.
  • Validate inputs: Enforce RFC-compliant email format before hashing; reject nulls and placeholders.
  • Multi-hash strategies: When interoperating broadly, you may store both SHA-256 (hex) and HMAC-SHA-256 variants, clearly labeled.
  • Contractual controls: Specify hashing policies, allowed uses, retention, and deletion SLAs with partners.
  • Testing and monitoring: Maintain test vectors, spot-check match rates, and alert on unexpected drift.
  • Post-cookie ecosystem: Broader reliance on deterministic, consented identifiers such as HEM for audience building and measurement.
  • Clean room adoption: More joins will occur via privacy-preserving computation with ephemeral, encrypted match keys rather than sharing raw HEMs.
  • Per-partner cryptography: Migration from unsalted SHA-256 to HMAC or derived keys by partner to reduce linkage risk.
  • Stronger compliance expectations: Clearer regulatory guidance on pseudonymous identifiers, with tighter consent, purpose limits, and retention controls.
  • Interoperable frameworks: Growth of standardized schemas and normalization policies to reduce match friction across platforms.