Blog — Ramiz Khatib | Data Engineering & AI

Featured · May 2026

Why Your Data Pipeline is Slower Than It Should Be (And How to Fix It)

Most slow pipelines aren't a compute problem — they're a design problem. After building pipelines that cut execution time from hours to minutes at a Fortune 500, here's what I've learned about the real culprits.

Data Pipelines Performance dbt

Read Article →

Apr 2026 · 6 min read

dbt vs. Raw SQL: When to Use Each (And When to Mix Both)

dbt transformed how analytics engineers write transformations — but it's not always the right tool. Here's my framework for deciding when raw SQL wins.

dbtSQL

Apr 2026 · 8 min read

Building a Real-Time Data Pipeline with Kafka and PySpark on GCP

A step-by-step walkthrough of the architecture behind ingesting 10,000+ live data points per minute using Spark Structured Streaming, Kafka, and BigQuery.

KafkaPySparkGCP

Mar 2026 · 5 min read

AI Agents for Data Teams: What's Actually Useful in 2026

Everyone's talking about AI agents. But which ones are actually production-ready for data teams? I break down what's working, what's hype, and what I'm building with.

AI AgentsLLMs

Mar 2026 · 7 min read

Snowflake vs BigQuery in 2026: A Practical Comparison

I've built production pipelines on both. Here's an honest, hands-on comparison — not a spec sheet, but real tradeoffs that matter when you're choosing a warehouse.

SnowflakeBigQueryCloud

Feb 2026 · 4 min read

Data Governance Without the Bureaucracy: Lessons from Fintech

Good governance doesn't have to slow you down. Here's how we enforced lineage, validation, and compliance at scale without killing engineering velocity.

GovernanceFintech

Thoughts on data,pipelines & AI.

Thoughts on data,
pipelines & AI.