Data Engineer
VIVEK KUMAR
Cloud-native data pipelines on Azure & AWS.
Medallion Lakehouse · PySpark · Databricks · Kafka
Azure
Databricks
Kafka
PySpark
Work History
Experience
Dataminerz Innovative Solutions
Data Engineer Intern
Jan 2026 – May 2026 · Noida, Uttar Pradesh
⬡ Project 1 — Guidepoint · Medallion Architecture (Azure / Databricks)
- Architected Medallion (Bronze → Silver → Gold) on Azure Databricks — 40% faster than legacy batch processes
- Implemented Lakehouse on ADLS Gen2 ingesting 10M+ records/day from structured & semi-structured sources using PySpark
- Built ETL workflows with PySpark & SQL achieving 35% improvement in transformation efficiency across Silver and Gold layers
- Delivered analytics-ready Gold datasets for 3+ stakeholder groups via Azure Synapse — 50+ GB daily throughput
- Reduced pipeline change-request turnaround by 25% through cross-functional collaboration
⬡ Project 2 — Structurely · AWS Medallion Architecture Pipeline
- Built Landing → Bronze → Silver → Gold pipeline ingesting Salesforce CRM & MongoDB into Aurora PostgreSQL — 8M+ records/run
- Developed 4 AWS Glue PySpark jobs: full ingestion, cleaning, transformation, loading — 90% less manual handling
- Designed metadata-driven control plane on Aurora PostgreSQL with pipeline_config & pipeline_audit tables and 5 PL/pgSQL functions
- Configured S3 multi-layer Parquet storage — 30% storage cost reduction vs CSV
- Set up Glue Crawlers to auto-update Glue Data Catalog after every S3 write — saving 5+ hrs/week of manual schema management
- Orchestrated zero-touch daily execution via AWS Glue Workflow + EventBridge; 100% secrets via Secrets Manager
Technical Arsenal
Skills
Languages
Data Engineering & ETL
Azure Stack
AWS Stack
Big Data & Streaming
Databases & Analytics
DevOps & Tools
Built Things
Projects
01 / Azure · Lakehouse
Guidepoint Medallion Architecture
Enterprise-grade Bronze → Silver → Gold pipeline on Azure Databricks ingesting 10M+ records/day. Full Lakehouse on ADLS Gen2 with Delta Lake. Synapse analytics-ready outputs.
02 / AWS · Medallion
Structurely AWS Data Pipeline
Metadata-driven Medallion pipeline from Salesforce CRM & MongoDB → Aurora PostgreSQL ingesting 8M+ records/run. Glue PySpark jobs, S3 Parquet, EventBridge orchestration.
03 / Batch · Big Data
Retail Analytics Platform
End-to-end analytics pipeline with Docker, Airflow, Spark, HDFS, EC2, and Power BI dashboards. Production-grade batch orchestration with full monitoring.
04 / Real-time · B.Tech Final Year
SENTINEL — Disaster Monitoring
Real-time pipeline ingesting weather, earthquake & disaster data via OpenWeatherMap, USGS, GDACS APIs. WebSocket-powered live dashboard with Chart.js visualizations.
05 / Streaming
Real-Time E-Commerce Analytics
Large-scale streaming with Kafka KRaft, Debezium CDC, Apache Flink, Iceberg, ClickHouse OLAP, Airflow, dbt, and Grafana dashboards — fully containerized on Docker + EC2.
06 / MongoDB · AWS
MongoDB Atlas → AWS Pipeline
Python incremental extraction from MongoDB Atlas to S3 with PostgreSQL audit tracking. Replicated with AWS-native: Glue, Lambda, EventBridge, Secrets Manager, CloudWatch.
Get In Touch
Contact
Let's
Build
Together.
Build
Together.
Email
vivekbamauli@gmail.com
Phone
+91 97195 90617
LinkedIn
linkedin.com/in/vivek-kumar-5494892ab
Location
Noida / Hathras, Uttar Pradesh, India
Education
B.Tech — Computer Science & Engineering
B.S.A. College of Engineering & Technology, Mathura
AKTU · 2022 – 2026
AKTU · 2022 – 2026
Certifications
SQL Certification — HCL GUVI2025 ✓
DP-203: Azure Data Engineer AssociateIn Progress
AWS Certified Cloud PractitionerIn Progress
Open To