Data Engineer

VIVEK KUMAR

Cloud-native data pipelines on Azure & AWS.
Medallion Lakehouse · PySpark · Databricks · Kafka

vivekbamauli@gmail.com →

Azure

Databricks

Kafka

PySpark

0

M+ Records/Day

0

% Processing Time ↓

0

% Manual Cut

0

% Reliability

0

M+ Records/Run (AWS)

Work History

Experience

Dataminerz Innovative Solutions

Data Engineer Intern

Jan 2026 – May 2026 · Noida, Uttar Pradesh

⬡ Project 1 — Guidepoint · Medallion Architecture (Azure / Databricks)

Architected Medallion (Bronze → Silver → Gold) on Azure Databricks — 40% faster than legacy batch processes
Implemented Lakehouse on ADLS Gen2 ingesting 10M+ records/day from structured & semi-structured sources using PySpark
Built ETL workflows with PySpark & SQL achieving 35% improvement in transformation efficiency across Silver and Gold layers
Delivered analytics-ready Gold datasets for 3+ stakeholder groups via Azure Synapse — 50+ GB daily throughput
Reduced pipeline change-request turnaround by 25% through cross-functional collaboration

⬡ Project 2 — Structurely · AWS Medallion Architecture Pipeline

Built Landing → Bronze → Silver → Gold pipeline ingesting Salesforce CRM & MongoDB into Aurora PostgreSQL — 8M+ records/run
Developed 4 AWS Glue PySpark jobs: full ingestion, cleaning, transformation, loading — 90% less manual handling
Designed metadata-driven control plane on Aurora PostgreSQL with pipeline_config & pipeline_audit tables and 5 PL/pgSQL functions
Configured S3 multi-layer Parquet storage — 30% storage cost reduction vs CSV
Set up Glue Crawlers to auto-update Glue Data Catalog after every S3 write — saving 5+ hrs/week of manual schema management
Orchestrated zero-touch daily execution via AWS Glue Workflow + EventBridge; 100% secrets via Secrets Manager

Technical Arsenal

Skills

Languages

PythonSQLPySpark

Data Engineering & ETL

Medallion ArchitectureLakehouse ETL DesignIncremental Load Watermark MgmtCDC / SCD

Azure Stack

Azure DatabricksADLS Gen2 Azure SynapseAzure Data Factory Azure SQL DBKey Vault Event HubsBlob Storage

AWS Stack

AWS GlueS3 LambdaEventBridge Aurora PostgreSQLSecrets Manager CloudWatchEC2

Big Data & Streaming

Apache SparkApache Kafka Apache FlinkApache Airflow Debezium CDCApache Iceberg HDFS

Databases & Analytics

PostgreSQLMongoDB ClickHouseMySQL dbtPower BI Grafana

DevOps & Tools

DockerGit / GitHub LinuxBash VS Code

PySpark / Big Data90%

Azure Stack87%

AWS Stack82%

Python / SQL92%

Kafka / Streaming78%

Built Things

Projects

01 / Azure · Lakehouse

Guidepoint Medallion Architecture

Enterprise-grade Bronze → Silver → Gold pipeline on Azure Databricks ingesting 10M+ records/day. Full Lakehouse on ADLS Gen2 with Delta Lake. Synapse analytics-ready outputs.

DatabricksADLS Gen2PySparkAzure SynapseDelta Lake

02 / AWS · Medallion

Structurely AWS Data Pipeline

Metadata-driven Medallion pipeline from Salesforce CRM & MongoDB → Aurora PostgreSQL ingesting 8M+ records/run. Glue PySpark jobs, S3 Parquet, EventBridge orchestration.

AWS GlueS3EventBridgeAurora PostgreSQLPySpark

03 / Batch · Big Data

Retail Analytics Platform

End-to-end analytics pipeline with Docker, Airflow, Spark, HDFS, EC2, and Power BI dashboards. Production-grade batch orchestration with full monitoring.

DockerAirflowSparkHDFSEC2Power BI

04 / Real-time · B.Tech Final Year

SENTINEL — Disaster Monitoring

Real-time pipeline ingesting weather, earthquake & disaster data via OpenWeatherMap, USGS, GDACS APIs. WebSocket-powered live dashboard with Chart.js visualizations.

WebSocketChart.jsOpenWeatherMapUSGSGDACS

05 / Streaming

Real-Time E-Commerce Analytics

Large-scale streaming with Kafka KRaft, Debezium CDC, Apache Flink, Iceberg, ClickHouse OLAP, Airflow, dbt, and Grafana dashboards — fully containerized on Docker + EC2.

KafkaDebeziumFlinkIcebergClickHousedbt

06 / MongoDB · AWS

MongoDB Atlas → AWS Pipeline

Python incremental extraction from MongoDB Atlas to S3 with PostgreSQL audit tracking. Replicated with AWS-native: Glue, Lambda, EventBridge, Secrets Manager, CloudWatch.

MongoDB AtlasAWS LambdaGlueS3CloudWatch

Get In Touch