Senior Data Engineer · 6+ Years · Bengaluru, India

Saket Kumar

I build open, cost-efficient data platforms — Snowflake, Databricks, and BigQuery — with a focus on compute cost, query latency, and CI/CD release cycles. Currently New Relic's founding data engineer on the Product Rating platform.

Snowflake Databricks BigQuery Apache Iceberg dbt Apache Spark Airflow

See projects → Get in touch

01
Live · v0.1.1 on PyPIApache-2.0

dbt-polyglot

Compile-time SQL-dialect transpiler for dbt. Run models authored in Snowflake, BigQuery, Redshift, T-SQL, or DuckDB on Spark / Databricks unchanged — via sqlglot transpilation with a Spark correctness fix-up layer. Battle-tested inside my New Relic Snowflake → lakehouse migration.
- Python
- sqlglot
- dbt-core
- Spark
- PyPI
View on PyPI →Source (GitHub) →
02
58 modulesDocker · uv

lakehouse-lab

A self-authored, laptop-safe Data Engineering production-challenges curriculum. 58 modules across Spark performance (skew, OOM, AQE), Apache Iceberg & Delta Lake correctness, Kafka + Structured Streaming, Debezium CDC, dbt quality with Great Expectations, and Airflow — with a full incident-simulator capstone.
- Spark
- Iceberg
- Delta Lake
- Kafka
- Debezium
- dbt
- Airflow
Explore the repo →

I'm a Senior Data Engineer with 6+ years designing and scaling cloud-native data platforms. I've delivered $45K+ in annual Snowflake credit savings on the flagship ELT pipeline alone, 50% dbt model speed-ups, and I'm currently leading migration of 440+ dbt models from Snowflake to a fully open-source lakehouse — Iceberg on S3, Spark Thrift compute, dbt-spark, Project Nessie catalog, and Airflow 3.

I gravitate toward the "unsexy" wins: cost curves, dialect gaps, correctness fixtures, migration safety. I open-source what I'd want to find, not what I'd want to ship. Multi-cloud (AWS · GCP · Azure), GDPR/PII-aware by design.

Off-hours: cosmology, open-source data tooling, and slowly writing more than I read.

Jul 2024 — Present Bengaluru, India
Senior Data Engineer · New Relic

Founding Member — Product Rating Data (India)
- Open Lakehouse Migration (Snowflake → Iceberg): Leading migration of ELT pipeline (440+ dbt models) to a fully open-source lakehouse — Iceberg on S3, Spark Thrift compute, dbt-spark, Project Nessie catalog, Airflow 3. Automated model-parity validation + mismatch-spike dashboard for zero-downtime cutover.
- Snowflake Cost & Performance: Refactored critical dbt models — $45.2K annual credit savings, 50% faster queries, 100% data parity.
- CI/CD for Data: Jenkins + GitHub framework (linting, BDD, dbt Cloud triggers, quality gates, Slack) — 85% cut in deployment lead time.
- Zero-Downtime Migration & Monetization Modeling: Migrated a 10 GB/hour billing pipeline from Airflow 1.0 to dbt Cloud + Snowflake. Rating engine supported 15+ new product SKUs (Feb 2025).
Jun 2022 — May 2024 Bengaluru, India
Data Engineer · Falabella

LATAM e-commerce — Falabella.cl third-party marketplace
- Fast Shipping Tags: Data product on BigQuery/DataProc/Pub-Sub for 4M+ SKUs at 97% accuracy — drove 50% lift in platform conversion.
- Serverless Ingestion Framework: Cloud Functions + Federated Queries — 80% faster pipeline setup, killed Compute Engine overhead.
- Airflow Observability at Scale: Custom monitoring for 2,000+ DAGs — automated alerts, root-cause analytics, 60% MTTR improvement, 99.9% availability.
Oct 2020 — Jun 2022 Hyderabad, India
Specialist Programmer · Infosys

Clients: Walmart & Five Below
- Enterprise ETL on Databricks & Spark (Walmart): Spark-Scala + PySpark ETL on Databricks and GCP DataProc, GCS → BigQuery with automated data-quality gates. Spark Structured Streaming for near-real-time.
- PII Encryption Framework (Five Below): Cross-org AES-256/PGP framework in Python/PySpark/Scala/Java/PGPy, processing 80 GB+ files for end-to-end PII compliance.

Warehouses & Lakehouses

Databricks
Snowflake
BigQuery
Delta Lake
Apache Iceberg
dbt (Core & Cloud)

Big Data, Streaming & Orchestration

Apache Spark
PySpark
Spark Structured Streaming
Kafka
Debezium (CDC)
Airflow
ETL / ELT
Great Expectations

Cloud & Infrastructure

AWS (Glue, S3, Athena, EMR)
GCP (BigQuery, DataProc, Pub/Sub, Cloud Functions)
Azure
Kubernetes
Docker
Terraform
MongoDB
Linux

Languages & Frameworks

Python
SQL
NoSQL
Scala
Shell
FastAPI
Pandas
Pytest
REST APIs

DevOps, BI & Governance

Jenkins
GitHub Actions
Prometheus / Grafana
Looker Studio
Tableau
GDPR / PII
AES-256 / PGP
Data Quality Gates

Certifications

dbt Fundamentals · dbt Labs (2024)
Azure Data Fundamentals DP-900 · Microsoft (2021)
Deep Learning Specialization · DeepLearning.AI (2020)
Machine Learning · Andrew Ng, Stanford / Coursera (2019, 95%)
Python Advanced · Cutshort (2023)
AWS AI & ML Scholarship · Udacity (2026)

Jul 5, 2026 Hello world Welcome to your new Whiz blog. This post is a quick markdown guide you can edit, unpublish, or delete when you are ready.…

Read all posts on saket.whiz.pub →

I'm always up for talking about lakehouses, dbt patterns, migration battle-scars, or interesting job problems. Fastest reach — email or LinkedIn.

Saket Kumar

Open-source projects I ship & maintain.

dbt-polyglot

lakehouse-lab

Data platforms should be fast, cheap, and boring — in that order.

Where I've built things.

Senior Data Engineer · New Relic

Data Engineer · Falabella

Specialist Programmer · Infosys

The stack I actually use.