Rajat Gupta

AWS, GCP & Databricks Certified Lead Data Engineer

New Delhi, IN.

About

Highly accomplished Lead Data Engineer with 11 years of experience driving data initiatives for Fortune 500 banking, finance, and telecom clients. AWS, GCP, and Databricks certified, I specialize in designing and implementing robust big data solutions, cloud architectures, and scalable ETL pipelines. My expertise spans data governance, real-time processing, and performance optimization, consistently delivering high-impact results in complex, multi-cloud environments.

Work

Icloud Solutions DMCC

Lead Data Engineer (AWS Data Architect)

Dubai, Dubai, UAE

Sep 2024

→

Dec 2024

Summary

Led data architecture and engineering for Zahid Group, focusing on enhancing data discovery, lineage, and governance through advanced schema strategies and Databricks-driven transformations.

Highlights

Developed comprehensive schema strategies and metadata layers, including a Data Catalog, to enhance data discovery, lineage, and governance for Zahid Group.

Utilized Databricks and Apache Spark to cleanse and transform complex datasets, implementing a medallion architecture for structured and semi-structured data.

Streamlined data operations and automation by building reusable utilities, reducing manual effort and improving efficiency.

Synechron Technologies

Lead Data Engineer (GCP Data Architect)

Gurugram, Haryana, India

Jan 2024

→

Aug 2024

Summary

Architected and implemented robust data ingestion and transformation pipelines on GCP for American Express, ensuring efficient data flow and business logic processing.

Highlights

Created scalable Data Ingestion Pipelines from diverse RDBMS databases to a centralized storage layer, orchestrated by Cloud Composer DAGs.

Transformed complex business logic using Databricks Spark, running efficiently on Cloud Composer jobs to support critical financial operations.

National Stock Exchange Information Technology (NSE-IT)

Module Lead (AWS Data Architect)

Remote, Maharashtra, India

Jan 2022

→

Dec 2023

Summary

Led the development of a new data lake for financial data processing, designing serverless architectures and optimizing CI/CD pipelines for enhanced data recovery and debt management.

Highlights

Built an event-based framework utilizing Cloud Functions for near real-time processing of financial data, leveraging Databricks workflows for efficiency.

Designed and implemented a serverless architecture with reusable components, adopted by multiple product teams to process financial data related to debt and recovery.

Contributed to Python Spark notebooks, SQL queries, and workflows for daily ETL data loads, ensuring high data quality and availability.

Automated CI/CD pipelines for code deployment and migration across Dev, QA, and Production environments using Terraform and Jenkins Groovy script, reducing deployment time and errors.

Collaborated with business stakeholders to gather requirements and translate them into technical user stories, facilitating seamless data onboarding from multiple source teams.

NatWest Group (Royal Bank of Scotland)

Software Engineer (AWS Data Architect)

Remote, Scotland, United Kingdom of Great Britain and Northern Ireland

Mar 2021

→

Aug 2021

Summary

Developed a new data lake on AWS, creating efficient data ingestion and ETL pipelines to support critical data processing for the bank's operations.

Highlights

Created data ingestion pipelines from various RDBMS databases to AWS S3, leveraging AWS DMS for efficient data transfer and integration.

Developed robust ETL data pipelines using PySpark, Glue Jobs, Lambda, DynamoDB, Athena, and S3 with Parquet files, optimizing data processing for the new data lake.

Built new infrastructure in AWS using Terraform and a CI/CD Pipeline with Bitbucket and TeamCity, enhancing deployment efficiency and reliability.

Collabera Technologies

Senior Big Data Developer (AWS Data Architect)

Gurugram, Haryana, India

Jan 2020

→

Feb 2021

Summary

Led the design and implementation of ETL pipelines on AWS cloud for ZS Associates (Amgen), focusing on data transformation, warehousing, and performance optimization.

Highlights

Led requirements gathering, design, and implementation for ETL pipelines on AWS cloud, ensuring alignment with business needs for data integration.

Developed and designed reusable libraries, frameworks, and utilities for ETL data pipelines, data transformations, warehousing, and validations.

Optimized performance for PySpark data pipelines, significantly improving data load times and overall system efficiency for daily reports on AWS Databricks.

Wipro Technologies

Specialist / Lead Data Engineer

Greater Noida, Uttar Pradesh, India

Mar 2018

→

Aug 2019

Summary

Managed real-time data ingestion via Kafka and spearheaded data transformation and migration projects for major clients like Citigroup and Vodafone.

Highlights

Handled real-time data ingestion via Kafka with Spark Streaming for large-scale data processing, ensuring timely and efficient data availability.

Translated complex functional and technical requirements into detailed high and low-level designs, facilitating effective project execution.

Integrated HSM API with Kafka to provide hardware and software-level 256-bit encryption for secure debit/credit card transactions, enhancing data security.

Played a key role in data transformation using Spark scripts for structured and semi-structured data, improving data quality and usability.

Performed data analysis by implementing various machine learning algorithms via Spark ML, optimizing hyperparameter deployment with GridSearchCV for enhanced model performance.

Ericsson Global India Services Pvt. Ltd.

Assistant Engineer / Senior Data Engineer

Gurugram, Haryana, India

Jul 2014

→

Sep 2017

Summary

Developed PySpark scripts for tariff plan regression and managed data ingestion for new data lakes, supporting critical KPIs and improving end-user satisfaction.

Highlights

Developed PySpark scripts to calculate traditional and ad-hoc KPIs from structured and semi-structured data, providing critical insights for tariff plan regression.

Managed data ingestion using Sqoop, cleaning, and manipulation of data with Spark scripts, ensuring data quality and readiness for analysis.

Performed ETL from ENIQ Oracle database to HDFS using Sqoop and processed data with Hive scripts for the new Datalake for LTE Network, supporting critical KPIs.

Airtel

Java Developer

Gurugram, Haryana, India

Jul 2014

→

Mar 2015

Summary

Developed a new dashboard to accommodate all KPIs for 3G and 4G networks, migrating from a legacy PHP system to Java.

Highlights

Migrated an old dashboard written in PHP scripts to a new Java-based dashboard, enhancing performance and scalability for 3G and 4G network KPIs.

Gopisoft Pvt. Ltd.

Associate Software Engineer / Java Developer

New Delhi, Delhi, India

Mar 2013

→

Jul 2014

Summary

Developed data entity beans and established relationships for the Auto Giant Organization portal, enhancing data management and application functionality.

Highlights

Developed Data Entity Beans (POJOs) and established relationships between them, improving data integrity and application architecture for the portal.

Education

IIIT Bangalore

Bangalore, Karnataka, India

Sep 2018

→

Sep 2019

Post Graduate Diploma

Data Science

Grade: 3.51/4

Courses

Inferential Statistics

Hypothesis Testing

Machine Learning Algorithms

Neural Networks

Deep Learning

CDAC Noida

Noida, Uttar Pradesh, India

Aug 2012

→

Feb 2013

Post Graduate Diploma

Advanced Computing

Amity University

Noida, Uttar Pradesh, India

Aug 2008

→

May 2012

B. Tech.

Electronics & Telecommunication

Awards

Intel® Edge AI Scholarship

Dec 2019

Awarded By

Intel & Udacity

Awarded for excellence in Edge AI.

80% Scholarship for PGD in Data Science

Sep 2018

Awarded By

Swades Foundation (NGO)

Awarded a significant scholarship for Post Graduate Diploma in Data Science.

Publications

Electronic Communication Systems & Applications Of GSM

Aug 2012

Published by

Scilab Consortium (INRIA, France)

Summary

Published two documents related to porting examples from standard textbooks to Scilab and improving documentation for the Scilab Textbook Companion project, funded by Ministry of HRD.

Certificates

GCP Associate Cloud Engineer

Oct 2024

Issued By

Google Cloud Platform (GCP)

AWS Solutions Architect Associate

Aug 2023

Issued By

Amazon Web Services (AWS)

Databricks Data Engineer Associate

Jan 2023

Issued By

Databricks

Skills

Big Data Tools

HDFS, MapReduce, Sqoop, Hive, PIG, Impala, Oozie, ZooKeeper, Spark, Kafka, Airflow, dbt.

Cloud Platforms

AWS (EC2, S3, RDS, Redshift, EMR, SNS, DynamoDB, SageMaker, Glue, DMS, Lambda, SQS, Kinesis), Azure (ADLS, ADF), GCP (BigQuery, BigTable, CloudSQL, Dataproc, Cloud Composer, Kubernetes, Dataform).

Databases

Oracle, MySQL, Teradata, SQL Server, PostgreSQL.

DevOps & CI/CD

Docker, Kubernetes, Terraform, Jenkins.

Hadoop Distribution

Cloudera, Databricks, Confluent Kafka, AWS EMR.

IDEs

Eclipse, IBM RAD, PyCharm, IntelliJ IDEA, Jupyter Notebook, IBM Rational Team Concert, Autosys.

Programming Languages

Core Java, Python, MATLAB, C++, PHP.

Build Tools, Web Services & Git

Maven, Gradle, sbt, REST API, Bitbucket, Git, Jira.

Python Packages

scikit-learn, NumPy, Pandas, SciPy, Plot.ly, Pyplot, Beautiful Soup, Matplotlib, Math, Random, Seaborn, StatsModels.

Machine Learning

Linear/Logistic Regression, SVM, Random Forests, PCA, K Means, Hierarchical Clustering, Time Series, ANN.

Projects

Scilab Textbook Companion (IIT Bombay)

Mar 2011

→

Aug 2012

Summary

Contributed to an open-source development project funded by the Ministry of HRD, focusing on porting examples from standard textbooks to Scilab and improving documentation.