I’m Kayla, and yes, I did this myself. I took our old warehouse and moved it to a new stack. It took months. It also saved my nights and my nerves.
If you’re after the full blow-by-blow, here’s the longer story of how I actually modernized our data warehouse from the ground up.
You know what pushed me? A 2 a.m. page. Our old SQL Server job died again. Finance needed numbers by 8. I stared at a red SSIS task for an hour. I said, “We can’t keep doing this.” So we changed. A lot.
Where I started (and why I was tired)
We had:
- SQL Server on a loud rack in the closet
- SSIS jobs that ran at night
- CSV files on an old FTP box
- Tableau on top, with angry filters
Loads took 6 to 8 hours. A bad CSV would break it all. I watched logs like a hawk. I felt like a plumber with a leaky pipe.
That messy starting point is exactly why I keep a laminated copy of my go-to rules for a data warehouse taped to my monitor today.
What I moved to (and what I actually used)
I picked tools I’ve used with my own hands:
- Snowflake on AWS for the warehouse
- Fivetran for connectors (Salesforce, NetSuite, Zendesk)
- dbt for models and tests
- Airflow for job runs
- Looker for BI
Picking that stack sometimes felt like speed-dating—scrolling through feature “profiles,” testing chemistry in short bursts, and committing only when it clicked. If you’ve ever swiped for a match online, you’ll recognize the pattern; efficiency matters when options are endless. A fun example is SPDate where a real-time matching engine shows how smart filtering quickly pairs people who fit each other’s criteria. Similarly, niche local hubs can be equally instructive—check out the classifieds scene in South Elgin via Backpage South Elgin where you can browse hyper-local listings and see firsthand how a tightly focused marketplace sharpens both supply and demand.
When the stack was in place, I sat down and built a data-warehouse data model that could grow without toppling over.
I set Snowflake to auto-suspend after 5 minutes. That one switch later saved us real money. I’ll explain.
First real win: Salesforce in 30 minutes, then, oops
Fivetran pulled Salesforce in about 30 minutes. That part felt like magic. But I hit API limits by noon. Sales yelled. So I moved the sync to the top of the hour and set “high-volume objects” to 4 times a day. No more limit errors. I learned to watch Fivetran logs like I watch coffee brew—steady is good.
Like any cautious engineer, I’d already told myself “I tried a data-warehouse testing strategy and it saved me more than once,” so the next step was obvious—tests everywhere.
dbt saved me from bad data (and my pride)
I wrote dbt tests for “not null” on state codes. Day one, the test failed. Why? Two states had blank codes in NetSuite. People were shipping orders with no state. We fixed the source. That tiny test kept a big mess out of Looker. I also built incremental models. One table dropped from 6 hours to 40 minutes. Later I used dbt snapshots for “who changed what and when” on customers. SCD Type 2, but plain words: it tracks history.
I did mess up names once. I renamed a dbt model. Twelve Looker dashboards broke. I learned to use stable view names and point Looker there. New names live inside. Old names live on. Peace.
Since then, I’ve reminded every new hire that I test data warehouses so I can actually sleep at night.
Airflow: the flaky friend I still keep
Airflow ran my jobs in order. Good. But I pushed a big data frame into XCom. Bad. The task died. So I switched to writing a small file to S3 and passed the path. Simple. Stable. I also set SLAs so I got a ping if a job ran long. Not fun, but helpful.
Snowflake: fast, but watch the meter
Snowflake ran fast for us. I loved zero-copy clone. I cloned prod to a test area in seconds. I tested a risky change at 4 p.m., shipped by 5. Time Travel also saved me when I deleted a table by mistake. I rolled it back in a minute, and my heart rate went back down.
Now the part that stung: we once left a Large warehouse running all weekend. Credits burned like a bonfire. After that, I set auto-suspend to 5 minutes and picked Small by default. We turn on Medium only when a big report needs it. We also used resource monitors with alerts. The bill got sane again.
If you wonder how Snowflake fares in high-stakes environments, here’s how I ran our hospital’s data warehouse on Snowflake—spoiler: heartbeats mattered even more there.
A quick detour: Redshift, then not
Years back, I tried Redshift at another job. It worked fine for a while. But we fought with vacuum, WLM slots, and weird queue stuff when folks ran many ad hoc queries. Concurrency got tough. For this team, I picked Snowflake. If you live in AWS and love tight control, Redshift can still be fine. For us, Snowflake felt simple and fast.
I’ve also watched many teams debate the merits of ODS vs Data Warehouse like it’s a Friday-night sport. Pick what fits your latency and history needs, not the loudest opinion.
Real, everyday results
- Finance close went from 5 days to 3. Less hair-pulling.
- Marketing got near real-time cohorts. They ran campaigns the same day.
- Data freshness moved from nightly to every 15 minutes for key tables.
- Support saw a customer’s full history in one place. Fewer “let me get back to you” calls.
We shipped a simple “orders by hour” dashboard that used to be a weekly CSV. It updated every 15 minutes. Folks clapped. Not loud, but still.
Teams later asked why we landed on this design; the short answer is that I tried different data-warehouse models before betting on this one.
Governance: the part I wanted to skip but couldn’t
Roles in Snowflake confused me at first. I made a “BUSINESS_READ” role with a safe view. I masked emails and phone numbers with tags. Legal asked for 2-year retention on PII. I set a task to purge old rows. I also added row-level filters for EU data. Simple rules, less risk. Boring? Maybe. Needed? Yes.
Those guardrails might feel dull, but they’re exactly what actually works for an enterprise data warehouse when the auditors come knocking.
Stuff that annoyed me
- Surprise costs from ad hoc queries. A giant SELECT can chew through credits. We now route heavy work to a separate warehouse with a quota.
- Looker PDTs took forever one Tuesday at 9 a.m. I moved that build to 5 a.m. and cut it in half by pushing joins into dbt.
- Fivetran hit a weird NetSuite schema change. A column type flipped. My model broke. I added a CAST in staging and set up a Slack alert for schema drift.
What I’d do again (and what I wouldn’t)
I’d do:
- Start with one source, one model, one dashboard. Prove it.
- Use dbt tests from day one. Even the simple ones.
- Keep stable view names for BI. Change under the hood, not on the surface.
- Turn on auto-suspend. Set Small as the default warehouse.
- Tag PII and write it down. Future you will say thanks.
I wouldn’t:
- Let folks query prod on the biggest warehouse “just for a minute.”
- Rename core fields without a deprecation plan.
- Pack huge objects into Airflow XCom. Keep it lean.
If your team looks like mine
We were 6 people: two analytics engineers, one data engineer, one analyst, one BI dev, me. If that sounds like you, this stack fits:
- Fivetran + dbt + Snowflake + Airflow + Looker
For more practical guidance
