Senior Data Engineer · 6+ Years · Bengaluru, India

Saket Kumar

I build open, cost-efficient data platforms — Snowflake, Databricks, and BigQuery — with a focus on compute cost, query latency, and CI/CD release cycles. Currently New Relic's founding data engineer on the Product Rating platform.

Snowflake Databricks BigQuery Apache Iceberg dbt Apache Spark Airflow
Featured Work

Open-source projects I ship & maintain.

  1. 01
    Live · v0.1.1 on PyPIApache-2.0

    dbt-polyglot

    Compile-time SQL-dialect transpiler for dbt. Run models authored in Snowflake, BigQuery, Redshift, T-SQL, or DuckDB on Spark / Databricks unchanged — via sqlglot transpilation with a Spark correctness fix-up layer. Battle-tested inside my New Relic Snowflake → lakehouse migration.

    • Python
    • sqlglot
    • dbt-core
    • Spark
    • PyPI
  2. 02
    58 modulesDocker · uv

    lakehouse-lab

    A self-authored, laptop-safe Data Engineering production-challenges curriculum. 58 modules across Spark performance (skew, OOM, AQE), Apache Iceberg & Delta Lake correctness, Kafka + Structured Streaming, Debezium CDC, dbt quality with Great Expectations, and Airflow — with a full incident-simulator capstone.

    • Spark
    • Iceberg
    • Delta Lake
    • Kafka
    • Debezium
    • dbt
    • Airflow
About

Data platforms should be fast, cheap, and boring — in that order.

I'm a Senior Data Engineer with 6+ years designing and scaling cloud-native data platforms. I've delivered $45K+ in annual Snowflake credit savings on the flagship ELT pipeline alone, 50% dbt model speed-ups, and I'm currently leading migration of 440+ dbt models from Snowflake to a fully open-source lakehouse — Iceberg on S3, Spark Thrift compute, dbt-spark, Project Nessie catalog, and Airflow 3.

I gravitate toward the "unsexy" wins: cost curves, dialect gaps, correctness fixtures, migration safety. I open-source what I'd want to find, not what I'd want to ship. Multi-cloud (AWS · GCP · Azure), GDPR/PII-aware by design.

Off-hours: cosmology, open-source data tooling, and slowly writing more than I read.

Experience

Where I've built things.

  1. Jul 2024 — Present Bengaluru, India

    Senior Data Engineer · New Relic

    Founding Member — Product Rating Data (India)

    • Open Lakehouse Migration (Snowflake → Iceberg): Leading migration of ELT pipeline (440+ dbt models) to a fully open-source lakehouse — Iceberg on S3, Spark Thrift compute, dbt-spark, Project Nessie catalog, Airflow 3. Automated model-parity validation + mismatch-spike dashboard for zero-downtime cutover.
    • Snowflake Cost & Performance: Refactored critical dbt models — $45.2K annual credit savings, 50% faster queries, 100% data parity.
    • CI/CD for Data: Jenkins + GitHub framework (linting, BDD, dbt Cloud triggers, quality gates, Slack) — 85% cut in deployment lead time.
    • Zero-Downtime Migration & Monetization Modeling: Migrated a 10 GB/hour billing pipeline from Airflow 1.0 to dbt Cloud + Snowflake. Rating engine supported 15+ new product SKUs (Feb 2025).
  2. Jun 2022 — May 2024 Bengaluru, India

    Data Engineer · Falabella

    LATAM e-commerce — Falabella.cl third-party marketplace

    • Fast Shipping Tags: Data product on BigQuery/DataProc/Pub-Sub for 4M+ SKUs at 97% accuracy — drove 50% lift in platform conversion.
    • Serverless Ingestion Framework: Cloud Functions + Federated Queries — 80% faster pipeline setup, killed Compute Engine overhead.
    • Airflow Observability at Scale: Custom monitoring for 2,000+ DAGs — automated alerts, root-cause analytics, 60% MTTR improvement, 99.9% availability.
  3. Oct 2020 — Jun 2022 Hyderabad, India

    Specialist Programmer · Infosys

    Clients: Walmart & Five Below

    • Enterprise ETL on Databricks & Spark (Walmart): Spark-Scala + PySpark ETL on Databricks and GCP DataProc, GCS → BigQuery with automated data-quality gates. Spark Structured Streaming for near-real-time.
    • PII Encryption Framework (Five Below): Cross-org AES-256/PGP framework in Python/PySpark/Scala/Java/PGPy, processing 80 GB+ files for end-to-end PII compliance.
Toolkit

The stack I actually use.

Warehouses & Lakehouses

  • Databricks
  • Snowflake
  • BigQuery
  • Delta Lake
  • Apache Iceberg
  • dbt (Core & Cloud)

Big Data, Streaming & Orchestration

  • Apache Spark
  • PySpark
  • Spark Structured Streaming
  • Kafka
  • Debezium (CDC)
  • Airflow
  • ETL / ELT
  • Great Expectations

Cloud & Infrastructure

  • AWS (Glue, S3, Athena, EMR)
  • GCP (BigQuery, DataProc, Pub/Sub, Cloud Functions)
  • Azure
  • Kubernetes
  • Docker
  • Terraform
  • MongoDB
  • Linux

Languages & Frameworks

  • Python
  • SQL
  • NoSQL
  • Scala
  • Shell
  • FastAPI
  • Pandas
  • Pytest
  • REST APIs

DevOps, BI & Governance

  • Jenkins
  • GitHub Actions
  • Prometheus / Grafana
  • Looker Studio
  • Tableau
  • GDPR / PII
  • AES-256 / PGP
  • Data Quality Gates

Certifications

  • dbt Fundamentals · dbt Labs (2024)
  • Azure Data Fundamentals DP-900 · Microsoft (2021)
  • Deep Learning Specialization · DeepLearning.AI (2020)
  • Machine Learning · Andrew Ng, Stanford / Coursera (2019, 95%)
  • Python Advanced · Cutshort (2023)
  • AWS AI & ML Scholarship · Udacity (2026)
Writing

Notes from the field.

Read all posts on saket.whiz.pub →
Get in touch

The best inbox is a short one.

I'm always up for talking about lakehouses, dbt patterns, migration battle-scars, or interesting job problems. Fastest reach — email or LinkedIn.