About
Highly analytical Data Engineer with a Bachelor of Technology from IIT (BHU) Varanasi, specializing in building and optimizing scalable data pipelines and distributed data lake architectures. Proven ability to leverage AWS services, Apache technologies, and advanced programming to achieve significant efficiency gains, including a 30% reduction in data migration time. Recognized for strong problem-solving skills, evidenced by a top 0.5% rank in JEE Advanced and a 2nd place win in the Gen AI Hackathon for an AI-powered staffing recommendation system.
Work
Bengaluru, Karnataka, India
→
Summary
Engineered and maintained robust ETL pipelines utilizing AWS services to ensure seamless data ingestion, transformation, and loading processes.
Highlights
Developed a Python script leveraging boto3 and concurrent.futures to optimize data migration between Amazon S3 buckets, achieving a 30% reduction in transfer time.
Created and optimized stored procedures in Amazon Redshift for complex data operations, significantly enhancing performance and scalability.
Maintained robust ETL pipelines using AWS Glue, Amazon Redshift, and Amazon S3, ensuring seamless data ingestion, transformation, and loading processes.
Bengaluru, Karnataka, India
→
Summary
Currently leads the development and optimization of robust data pipelines and data lake architecture for scalable and performant analytics.
Highlights
Built and scheduled data pipelines using Apache Airflow to ingest data from Google Sheets, REST APIs, and MongoDB into Trino tables, ensuring reliable data availability.
Implemented Debezium and Kafka for real-time change data capture from MongoDB collections, centralizing data into the core platform.
Contributed to an in-house data architecture leveraging Apache Iceberg, Amazon S3, and Trino, optimizing query performance for scalable and performant analytics.
Supported data transformation workflows using Apache Spark for efficient batch processing within a distributed data lake environment.
Awards
2nd Rank, Accordion Gen AI Hackathon 2024
Awarded By
Accordion
Awarded for developing an innovative AI-powered staffing recommendation system that leveraged NLP and embedding-based similarity search.
Codeforces Expert (Max rating 1727, Global Rank 675)
Awarded By
Codeforces
Achieved Codeforces Expert status with a max rating of 1727 and a Global Rank of 675 in Codeforces Round 927, solving over 350+ Data Structures & Algorithms problems, showcasing advanced problem-solving and algorithmic skills.
Top 0.5% Rank, JEE ADVANCED 2020
Awarded By
JEE ADVANCED
Achieved a top 0.5% ranking among over 1 Million candidates in the highly competitive JEE ADVANCED 2020 examination, demonstrating exceptional aptitude in science and engineering.
Languages
English
Skills
Programming Languages
Python, SQL, C++.
Big Data & Data Engineering
Apache Airflow, Apache Spark, Kafka, Apache Iceberg, Debezium, Trino, ETL, Data Pipelines, Data Lake, Distributed Systems, Data Warehousing.
Cloud Platforms & Databases
AWS Glue, Amazon Redshift, Amazon S3, boto3, MySQL, FAISS.
Web Frameworks & DevOps
Django, Flask, Streamlit, Docker, RabbitMQ.
Artificial Intelligence & Machine Learning
NLP, Embedding-based Similarity Search, AI-powered Recommendation Systems.