Road Classification via Geospatial Telematics Data
Summary
Engineered a comprehensive pipeline to preprocess raw telematics data, implementing APIs and optimization strategies to accelerate ML model training and develop visual analysis dashboards.
Highly skilled Data Engineer with expertise in building and optimizing robust ETL pipelines, data warehouses, and analytics solutions using Azure, Python, and SQL. Proven ability to reduce processing times by up to 95%, improve reporting accuracy by 20%, and drive data-driven insights for business and R&D initiatives. Adept at transforming complex data into actionable intelligence and ensuring data quality for critical decision-making across diverse industries.
Data Engineer
Hyderabad, Telangana, India
→
Summary
Currently leading the design and optimization of incremental ETL pipelines and data warehousing solutions to deliver high-reliability data ingestions and business intelligence reporting for key business operations.
Highlights
Designed and optimized incremental ETL pipelines using Azure Data Factory (ADF), Databricks, ADLS, Python, and SQL, enabling efficient daily ingestions of multiple pipelines (~1GB) with high reliability.
Built and maintained a Snowflake data warehouse, implementing star schemas (fact and dimension tables) to enable efficient business intelligence reporting.
Automated ETL workflows using ADF Dataflows, Copy Activities, and Notebooks to transform raw data from databases, Excel, CSV, and IBM sources into structured formats.
Provided production support by resolving pipeline failures, debugging daily jobs, and performing root cause analysis for data mismatches raised through ServiceNow.
Partnered with Business Analysts to deliver accurate and timely financial insights by applying data validation and quality checks, improving reporting accuracy by 20% and reducing manual data fixes.
Data Scientist Intern
Pune, Maharashtra, India
→
Summary
Developed scalable data processing pipelines and interactive dashboards, contributing to advanced data analysis and integration for R&D initiatives.
Highlights
Developed a scalable data processing pipeline, reducing execution time by 85% through optimized methodologies.
Built interactive Streamlit dashboards, enhancing data visualization, accessibility, and user engagement for key stakeholders.
Utilized geospatial data libraries, open-source tools, and APIs for advanced data manipulation and feature extraction, generating critical insights.
Collaborated with R&D engineers to integrate data insights into vehicle dynamics and tire mechanics research, supporting product innovation.
→
Bachelor of Engineering (B.E.)
Computer Science
Grade: 8.37
Issued By
Microsoft
Issued By
Zach Wilson
Python, SQL, Java, C/C++.
Microsoft Azure (Data Factory, Databricks, Synapse, ADLS, Fabric).
Snowflake, MySQL, SQL Server.
ADF Dataflows, Copy Activity, Apache Spark, PySpark.
Git, GitHub.
Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn, Streamlit, Geospatial Libraries.
Data Modeling, Data Structures and Algorithms, Problem Solving, Dashboard Development (Power BI).
CodeChef-4 Star Coder, LeetCode, Kaggle.
Summary
Engineered a comprehensive pipeline to preprocess raw telematics data, implementing APIs and optimization strategies to accelerate ML model training and develop visual analysis dashboards.
Summary
Constructed an end-to-end ETL pipeline for transforming Google Timeline data into a structured model, leveraging Azure Data Factory, Databricks, Synapse Analytics, Power BI, and Flask.