Blog

Thoughts on data,
pipelines & AI.

Practical insights on data engineering, the modern data stack, AI automation, and building systems that actually scale.

Why Your Data Pipeline is Slower Than It Should Be (And How to Fix It)

Most slow pipelines aren't a compute problem — they're a design problem. After building pipelines that cut execution time from hours to minutes at a Fortune 500, here's what I've learned about the real culprits.

Read Article →
dbt vs. Raw SQL: When to Use Each (And When to Mix Both)

dbt transformed how analytics engineers write transformations — but it's not always the right tool. Here's my framework for deciding when raw SQL wins.

Building a Real-Time Data Pipeline with Kafka and PySpark on GCP

A step-by-step walkthrough of the architecture behind ingesting 10,000+ live data points per minute using Spark Structured Streaming, Kafka, and BigQuery.

AI Agents for Data Teams: What's Actually Useful in 2026

Everyone's talking about AI agents. But which ones are actually production-ready for data teams? I break down what's working, what's hype, and what I'm building with.

Snowflake vs BigQuery in 2026: A Practical Comparison

I've built production pipelines on both. Here's an honest, hands-on comparison — not a spec sheet, but real tradeoffs that matter when you're choosing a warehouse.

Data Governance Without the Bureaucracy: Lessons from Fintech

Good governance doesn't have to slow you down. Here's how we enforced lineage, validation, and compliance at scale without killing engineering velocity.