Senior Data Engineer · 6+ Years · Bengaluru, India

Saket Kumar

I build open, cost-efficient data platforms — Snowflake, Databricks, and BigQuery — with a focus on compute cost, query latency, and CI/CD release cycles. Currently New Relic's founding data engineer on the Product Rating platform.

Snowflake Databricks BigQuery Apache Iceberg dbt Apache Spark Airflow

See projects → Get in touch

01
Live · v0.1.1 on PyPIApache-2.0

dbt-polyglot

Compile-time SQL-dialect transpiler for dbt. Run models authored in Snowflake, BigQuery, Redshift, T-SQL, or DuckDB on Spark / Databricks unchanged — via sqlglot transpilation with a Spark correctness fix-up layer. Battle-tested inside my New Relic Snowflake → lakehouse migration.
- Python
- sqlglot
- dbt-core
- Spark
- PyPI
View on PyPI →Source (GitHub) →
02
58 modulesDocker · uv

lakehouse-lab

A self-authored, laptop-safe Data Engineering production-challenges curriculum. 58 modules across Spark performance (skew, OOM, AQE), Apache Iceberg & Delta Lake correctness, Kafka + Structured Streaming, Debezium CDC, dbt quality with Great Expectations, and Airflow — with a full incident-simulator capstone.
- Spark
- Iceberg
- Delta Lake
- Kafka
- Debezium
- dbt
- Airflow
Explore the repo →

I'm a Senior Data Engineer with 6+ years designing and scaling cloud-native data platforms. I've delivered $45K+ in annual Snowflake credit savings on the flagship ELT pipeline alone, 50% dbt model speed-ups, and I'm currently leading migration of 440+ dbt models from Snowflake to a fully open-source lakehouse — Iceberg on S3, Spark Thrift compute, dbt-spark, Project Nessie catalog, and Airflow 3.

I gravitate toward the "unsexy" wins: cost curves, dialect gaps, correctness fixtures, migration safety. I open-source what I'd want to find, not what I'd want to ship. Multi-cloud (AWS · GCP · Azure), GDPR/PII-aware by design.

Off-hours: cosmology, open-source data tooling, and slowly writing more than I read.

Jul 2024 — Present Bengaluru, India
Senior Data Engineer · New Relic

Founding Member — Product Rating Data (India)
- Open Lakehouse Migration (Snowflake → Iceberg): Leading migration of ELT pipeline (440+ dbt models) to a fully open-source lakehouse — Iceberg on S3, Spark Thrift compute, dbt-spark, Project Nessie catalog, Airflow 3. Automated model-parity validation + mismatch-spike dashboard for zero-downtime cutover.
- Snowflake Cost & Performance: Refactored critical dbt models — $45.2K annual credit savings, 50% faster queries, 100% data parity.
- CI/CD for Data: Jenkins + GitHub framework (linting, BDD, dbt Cloud triggers, quality gates, Slack) — 85% cut in deployment lead time.
- Zero-Downtime Migration & Monetization Modeling: Migrated a 10 GB/hour billing pipeline from Airflow 1.0 to dbt Cloud + Snowflake. Rating engine supported 15+ new product SKUs (Feb 2025).
Jun 2022 — May 2024 Bengaluru, India
Data Engineer · Falabella

LATAM e-commerce — Falabella.cl third-party marketplace
- Fast Shipping Tags: Data product on BigQuery/DataProc/Pub-Sub for 4M+ SKUs at 97% accuracy — drove 50% lift in platform conversion.
- Serverless Ingestion Framework: Cloud Functions + Federated Queries — 80% faster pipeline setup, killed Compute Engine overhead.
- Airflow Observability at Scale: Custom monitoring for 2,000+ DAGs — automated alerts, root-cause analytics, 60% MTTR improvement, 99.9% availability.
Oct 2020 — Jun 2022 Hyderabad, India
Specialist Programmer · Infosys

Clients: Walmart & Five Below
- Enterprise ETL on Databricks & Spark (Walmart): Spark-Scala + PySpark ETL on Databricks and GCP DataProc, GCS → BigQuery with automated data-quality gates. Spark Structured Streaming for near-real-time.
- PII Encryption Framework (Five Below): Cross-org AES-256/PGP framework in Python/PySpark/Scala/Java/PGPy, processing 80 GB+ files for end-to-end PII compliance.

Warehouses & Lakehouses

Databricks
Snowflake
BigQuery
Delta Lake
Apache Iceberg
dbt (Core & Cloud)

Big Data, Streaming & Orchestration

Apache Spark
PySpark
Spark Structured Streaming
Kafka
Debezium (CDC)
Airflow
ETL / ELT
Great Expectations

Cloud & Infrastructure

AWS (Glue, S3, Athena, EMR)
GCP (BigQuery, DataProc, Pub/Sub, Cloud Functions)
Azure
Kubernetes
Docker
Terraform
MongoDB
Linux

Languages & Frameworks

Python
SQL
NoSQL
Scala
Shell
FastAPI
Pandas
Pytest
REST APIs

DevOps, BI & Governance

Jenkins
GitHub Actions
Prometheus / Grafana
Looker Studio
Tableau
GDPR / PII
AES-256 / PGP
Data Quality Gates

Certifications

dbt Fundamentals · dbt Labs (2024)
Azure Data Fundamentals DP-900 · Microsoft (2021)
Deep Learning Specialization · DeepLearning.AI (2020)
Machine Learning · Andrew Ng, Stanford / Coursera (2019, 95%)
Python Advanced · Cutshort (2023)
AWS AI & ML Scholarship · Udacity (2026)

Jul 5, 2026 break your pipeline (before prod does it for you) 58 broken pipelines, one laptop, zero cloud bill. the DE curriculum for people who've never been on-call yet.…

Jul 5, 2026 A dbt project. Snowflake to Spark. Zero rewrites. How I built dbt-polyglot — a compile-time SQL-dialect transpiler that lets you migrate a dbt project from Snowflake to Spark without editing a single .sql file.…

Read all posts on saket.whiz.pub →

I'm always up for talking about lakehouses, dbt patterns, migration battle-scars, or interesting job problems. Fastest reach — email or LinkedIn.

Saket Kumar

Open-source projects I ship & maintain.

dbt-polyglot

lakehouse-lab

Data platforms should be fast, cheap, and boring — in that order.

Where I've built things.

Senior Data Engineer · New Relic

Data Engineer · Falabella

Specialist Programmer · Infosys

The stack I actually use.