Data Warehouse

Definition
Relation to marketing
How to calculate
How to utilize
Comparison to similar approaches
Best practices
Future trends
Related Terms

Definition

A data warehouse is a centralized, governed repository that stores integrated, historical data from multiple operational systems for querying, analysis, and reporting. It emphasizes consistent schemas, conformed dimensions, and reproducible metrics optimized for analytics rather than transaction processing.

Relation to marketing

For marketing teams, a data warehouse provides a single environment to join channel spend, campaign performance, web and app behavior, CRM, and revenue data. It enables consistent KPI definitions (e.g., CPA, ROAS, LTV), cohort analysis, multi-touch attribution inputs, and executive dashboards that align marketing with sales and finance.

How to calculate

While a data warehouse is not a metric itself, planning often involves a few practical calculations:

Storage sizing: Estimated monthly data volume × retention period × replication factor.
Compute sizing (baseline): Average daily query concurrency × average query complexity (light/medium/heavy) mapped to warehouse service class or node size.
Cost modeling (cloud): (Storage GB × storage rate) + (Compute hours × hourly rate) + data egress + optional features (e.g., caching, serverless ingestion).
Refresh windows: Source update frequency + transformation duration + validation buffers to meet SLA/SLOs for downstream dashboards.

How to utilize

Centralize and model: Ingest source data (ads, web events, CRM, ecommerce, email, call center), standardize keys, and build conformed dimensions (customer, account, product, campaign).
Define metrics once: Create a semantic layer or metrics store to encode shared definitions (e.g., Qualified Lead, Opportunity, Net Revenue, Active Subscriber).
Serve downstream needs: Power BI dashboards, campaign performance views, pipeline and forecasting, experimentation readouts, and data science features.
Support governance: Enforce data quality tests, lineage, PII handling, and access controls that satisfy legal and security requirements.
Enable self-serve: Publish certified marts for marketers and RevOps, while exposing curated tables for analysts and data scientists.

Comparison to similar approaches

Aspect	Data Warehouse	Data Lake	Lakehouse	Data Mart	Operational Data Store (ODS)
Primary goal	Analytics with structured, governed data	Low-cost storage for raw/semi-structured data	Unified lake + warehouse features	Subject-specific analytics subset	Operational reporting near real time
Typical schema	Star/snowflake, strongly typed	Schema-on-read, flexible	ACID tables with open formats	Denormalized, business-facing	Near source schema
Data quality	High; validated and curated	Variable; raw to lightly processed	Moderate to high with table formats	High for the specific domain	Medium; freshness prioritized
Best for marketers	Consistent KPIs, dashboards, attribution inputs	Landing raw exports, experimentation sandboxes	Blending raw + curated at scale	Team-focused reporting (e.g., paid media)	Near-real-time ops views (e.g., leads queue)
Query performance	High and predictable	Variable; depends on engines	High with caching/indices	High for scoped workloads	Optimized for operational queries

Best practices

Start with business outcomes: Align models to revenue, pipeline, customer growth, and retention questions.
Adopt data contracts: Define SLAs, schemas, and expectations with source owners to reduce breakage.
Model for understandability: Use conformed dimensions and clear, stable naming; avoid leaking source quirks into analytics tables.
Codify transformations: Use version-controlled pipelines with tests for schema, nulls, duplicates, uniqueness, and referential integrity.
Manage PII: Minimize attributes in analytics tables; apply hashing or tokenization; enforce purpose-based access.
Layered architecture: Land raw data (bronze), standardize and join (silver), and publish curated marts/semantic models (gold).
Document and certify: Publish metric definitions, lineage, and data dictionaries; badge trusted datasets.
Design for concurrency: Size compute and caching for peak dashboard loads; isolate heavy workloads with resource groups/warehouses.
Plan refresh SLAs: Match ingestion and transformation cadence to reporting needs (e.g., hourly spend, daily LTV updates).
Monitor and alert: Track pipeline health, freshness, and KPI anomalies; maintain runbooks for recovery.

Future trends

Metrics layers and headless BI: Centralized, queryable metric definitions reduce drift across tools.
Lakehouse adoption: Open table formats (e.g., Iceberg/Delta/Hudi) bring ACID and performance to object storage while keeping warehouse-like governance.
Real-time and streaming: Incremental models and change data capture (CDC) shorten time-to-insight for campaign pacing and journey triggers.
Privacy-by-design analytics: Computation on masked data, consent-aware joins, and differential privacy for audience insights.
AI-assisted engineering: Automated data quality testing, lineage inference, and semantic model generation accelerate development.
Cost governance: Workload-aware autoscaling, serverless consumption models, and FinOps practices keep spend aligned with value.

Data Lake
Lakehouse
Data Mart
ETL (Extract, Transform, Load)
ELT (Extract, Load, Transform)
Metrics Layer / Semantic Layer
Change Data Capture (CDC)
Star Schema
Conformed Dimensions
Business Intelligence (BI)
Reverse ETL

Martechipedia™ Wiki

Data Warehouse

Table of Contents

Definition

Relation to marketing

How to calculate

How to utilize

Comparison to similar approaches

Best practices

Future trends

Related

Table of Contents

Definition

Relation to marketing

How to calculate

How to utilize

Comparison to similar approaches

Best practices

Future trends

Related Terms

Related