I Built a Data Warehouse Data Model. Here’s What Actually Happened.

I’m Kayla. I plan data. I ship dashboards. I also break stuff and fix it fast. Last winter, I rebuilt our data warehouse model for our growth team and finance folks. (For the blow-by-blow on that rebuild, here’s the full story.) I thought it would take two weeks. It took six. You know what? I’d do it again.

I used Snowflake for compute, dbt for transforms, Fivetran for loads, and Looker for BI. My model was a simple star. Mostly. I also kept a few history tables like Data Vault hubs and satellites for the messy parts. If you're still comparing star, snowflake, and vault patterns, my notes on trying multiple data warehouse models might help. That mix kept both speed and truth, which sounds cute until refunds hit on a holiday sale. Then you need it. Still sorting out the nuances between star and snowflake designs? The detailed breakdown in this star-vs-snowflake primer lays out the pros and cons.

Let me explain what worked, what hurt, and the real stuff in the middle.

What I Picked and Why

Warehouse: Snowflake (medium warehouse most days; small at night)
Transforms: dbt (tests saved my butt more than once)
Loads: Fivetran for Shopify, Stripe, and Postgres
BI: Looker (semantic layer helped keep one version of “revenue”)

I built a star schema with one big fact per process: orders, sessions, and ledger lines. I used small dimension tables for people, products, dates, and devices. When fields changed over time (like a customer’s region), I used SCD Type 2. Fancy name, simple idea: keep history.

Real Example: Sales

I started with sales. Not because it’s easy, but because it’s loud.

Fact table: fact_orders
- Grain: one row per order line
- Keys: order_line_id (surrogate), order_id, customer_key, product_key, date_key
- Measures: revenue_amount, tax_amount, discount_amount, cost_amount
- Flags: is_refund, is_first_order, is_subscription
Dim tables:
- dim_customer (SCD2): customer_key, customer_id, region, first_seen_at, last_seen_at, is_current
- dim_product: product_key, product_id, category, sku
- dim_date: date_key, day, week, month, quarter, year

What it did for us:

A weekly revenue by region query dropped from 46 seconds to 3.2 seconds.
The finance team matched Stripe gross to within $82 on a $2.3M month. That was a good day.
We fixed “new vs repeat” by using customer_key + first_order_date. No more moving targets.

If you’re working in a larger org, here’s a candid look at what actually works for an enterprise data warehouse based on hands-on tests.

Pain I hit:

Late refunds. They landed two weeks after the sale and split across line items. I added a refunds table and a model that flips is_refund and reverses revenue_amount. Clean? Yes. Fun? No.
Tax rules. We sell in Canada and the US. I added a dim_tax_region map, then cached it. That killed a join that cost us 15% more credits than it should.

Real Example: Web Events

Marketing wanted “the full journey.” I built two facts.

fact_events: one row per raw event (page_view, add_to_cart, purchase)
- device_key, customer_key (nullable), event_ts, event_name, url_path
fact_sessions: one row per session
- session_id, customer_key, device_key, session_start, session_end, source, medium, campaign

Side note: if your growth squad is also running edgy or adult-only influencer pushes on TikTok, you’ll eventually want concrete examples of how that content drives engagement so you can model the right attributes (creator tier, hashtag, audience age) in these same event tables. The no-filter roundup of viral clips at TikTok Nudes showcases real metrics and screenshots, giving you inspiration on which fields are worth capturing when you pull that data into your warehouse. Similarly, if you’re experimenting with location-based classifieds or dating promos, a quick scan of Backpage Morgantown surfaces real-time listing counts, posting cadence, and category splits you can transform into dimensional attributes and benchmarks for localized campaign modeling.

I stitched sessions by sorting events by device + 30-minute gaps. Simple rule, tight code. When a user logged in mid-session, I backfilled customer_key for that session. Small touch, big win.

What it gave us:

“Ad spend to checkout in 24 hours” worked with one Looker explore.
We saw weekends run 20% longer sessions on mobile. So we moved push alerts to Sunday late morning. CTR went up 11%. Not magic. Just good timing.

What bit me:

Bots. I had to filter junk with a dim_device blocklist and a rule for 200+ page views in 5 minutes. Wild, but it cut fake traffic by a lot.

Real Example: The Ledger

Finance is picky. They should be.

fact_gl_lines: one row per journal line from NetSuite
- journal_id, line_number, account_key, cost_center_key, amount, currency, posted_at
dim_account, dim_cost_center (SCD2)

We mapped Shopify refunds to GL accounts with a mapping table. I kept it in seeds in dbt so changes were versioned. Monthly close went from 2.5 days to 1.5 days because the trial balance matched on the first run. Not perfect, but close.

What Rocked

The star model was easy to teach. New analysts shipped a revenue chart on day two.
dbt tests caught null customer_keys after a Fivetran sync hiccup. Red light, quick fix, no blame.
Looker’s measures and views kept revenue one way. No more four dashboards, five numbers.

What Hurt (And How I Fixed It)

Too many tiny dims. I “snowflaked” early. Then I merged dim_category back into dim_product. Fewer joins, faster queries.
SCD2 bloat. Customer history grew fast. I added monthly snapshots and kept only current rows in the main dim. History moved to a history view.
Time zones. We sell cross-border. I locked all facts to UTC and rolled out a dim_date_local per region for reports. Set it once, breathe easy.
Surrogate keys vs natural keys. I kept natural ids for sanity, but used surrogate keys for joins. That mix saved me on backfills.
I’ve also weighed when an ODS beats a warehouse; see my field notes on ODS vs Data Warehouse for the trade-offs.

Cost and Speed (Real Numbers)

Snowflake credits: average 620/month; spikes on backfills to ~900
95th percentile query time on main explores: 2.1 seconds
dbt Cloud runtime: daily full run ~38 minutes; hourly incremental jobs 2–6 minutes
BigQuery test run (I tried it for a week): similar speed, cheaper for ad-hoc, pricier for our chatty BI. We stayed with Snowflake. Running Snowflake in a heavily regulated environment turned up similar pros and cons in this hospital case study.

A Few Rules I’d Tattoo on My Laptop

Name the grain. Put it in the table description. Repeat it.
Write the refund story before you write revenue.
Keep a date spine. It makes time math easy and clean.
Store money in cents. Integers calm nerves.
Add is_current, valid_from, valid_to to SCD tables. Future you says thanks.
Document three sample queries per fact. Real ones your team runs.
Keep a small “business glossary” table. One row per metric, with a plain note.

If you need hands-on examples of these rules in action, BaseNow curates open data-model templates you can copy and tweak.

Who This Fits (And Who It Doesn’t)

Good fit: small to mid teams (2–20 data folks) who need trust and speed.
Not great: pure product analytics with huge event volume and simple questions. A wide table in BigQuery might be enough there. Before you pick, here’s my hands-on take on data lakes vs data warehouses.
Heavy compliance or wild source changes? Add a Data Vault layer first, then serve a star to BI. It slows day one, but saves month six. For a head-to-head look at when a vault or