Parth Shrivastava
Senior Data Engineer
Pune, IN.About
Highly accomplished Senior Data Engineer with over 4 years of expertise in designing and optimizing robust, scalable data pipelines within cloud-native environments. Proven track record in developing real-time and batch ETL frameworks using PySpark, SQL, and Delta Lake on Azure Databricks and AWS, consistently reducing execution times by up to 60% and processing 1TB of data daily. Adept at leveraging advanced data engineering techniques to enhance performance, optimize costs, and deliver high-quality data solutions for complex analytical and business intelligence needs.
Work
Pune, Maharashtra, India
→
Summary
Designed and optimized end-to-end ETL pipelines across cloud environments, ensuring efficient data processing and significant cost reduction.
Highlights
Designed and optimized end-to-end ETL pipelines on Databricks using PySpark and Delta Lake, processing 1TB of records daily across Azure Data Lake Storage (ADLS) and AWS S3.
Optimized complex SQL queries and Spark jobs through partitioning, broadcast joins, and skew mitigation, cutting execution times by up to 60%.
Optimized storage and compute costs by implementing partitioning, bucketing, and Z-Ordering in Databricks, applying lifecycle policies in AWS S3 and Azure Blob Storage.
Collaborated with cross-functional teams to migrate critical datasets from on-prem to hybrid cloud environments (Azure), ensuring compliance with data governance and security standards.
Partnered with business stakeholders, analysts, and data scientists to translate reporting requirements into efficient data models and pipelines.
→
Summary
Contributed to the development of AI-powered solutions for biomedical signal processing, focusing on arrhythmia detection and classification.
Highlights
Pre-processed and segmented over 50K ECG signals using NumPy and SciPy for arrhythmia detection, enhancing data readiness for analysis.
Built advanced deep learning models (1D CNN, Temporal Convolutional Networks) achieving a 94% F1-score on arrhythmia classification.
Deployed batch scoring pipelines to automate arrhythmia detection on new ECG data, improving diagnostic efficiency.
Awards
Rising Star Award
Awarded By
Mediaocean Pvt.Ltd
Recognized for outstanding performance and significant contributions at Mediaocean Pvt.Ltd.
All India Rank 12, Robocon 2019
Awarded By
IIT Delhi
Achieved 12th rank in the national robotics competition Robocon 2019, representing Team Automatons at IIT Delhi with two robots.
Skills
Programming Languages
Python, SQL, Java.
Data Frameworks & Tools
Databricks, Spark (PySpark), Pandas, Delta Lake, Apache Airflow.
Cloud & Infrastructure
AWS, Azure (Data Lake Gen2, Event Hubs, Blob Storage), AWS S3, Azure Blob Storage.
Operating Systems
Windows, Linux.
Data & Analytics
ETL Pipelines, Data Lineage, Data Governance, Data Quality, Machine Learning, Automation, Data Testing.
DevOps & CI/CD
Git, Docker, Jenkins, CI/CD Pipelines.