MEGHNA

Data Engineer- AI/ML

Jersey City, US.

About

Results-driven Data Engineer and Machine Learning specialist with over 5 years of experience in designing, deploying, and optimizing large-scale ML systems for personalization, fraud detection, and credit risk. Expertly leverages ranking algorithms, Python, TensorFlow, PyTorch, and distributed computing frameworks to deliver measurable business impact and enhance critical KPIs, serving 100M+ predictions daily. Proven ability to build scalable data pipelines, ensure robust data governance, and collaborate cross-functionally to drive innovation in complex data environments.

Work

Financial Client

Senior Data Engineer- AI/ML

New York, New York, US

Jan 2024

→

May 2024

Summary

Led the architecture and deployment of real-time ML systems for critical financial applications, focusing on personalization and fraud detection while driving significant improvements in customer engagement and operational efficiency.

Highlights

Architected and deployed real-time ML systems for personalization and fraud detection, serving 100M+ predictions daily and improving customer engagement by 21% and fraud detection accuracy by 15%.

Designed scalable data pipelines with Apache Spark, Kafka, and Airflow on Azure Synapse and Databricks, processing high-velocity financial data and reducing ETL latency by 78% (from 45 to under 10 minutes).

Implemented deep learning models for credit risk scoring, leveraging TensorFlow and distributed computing with Spark and Ray, enhancing decision accuracy by 18%.

Conducted A/B testing with experiment logging to optimize personalization algorithms, increasing conversion rates by 12% through iterative ranking model improvements.

Built and optimized SQL queries on Snowflake and Redshift for feature stores, ensuring data integrity and reducing query times by 38%.

Developed generative AI tools using Hugging Face Transformers for compliance automation, ensuring explainability and adherence to Dodd-Frank and Basel III standards.

Led MLOps initiatives with MLflow, Docker, and Kubernetes, automating model retraining and drift detection, achieving 99.9% uptime for production ML systems.

Created Tableau dashboards to visualize KPIs (credit exposure, fraud alerts), improving executive decision-making efficiency by 30% while ensuring 99.8% data quality and GDPR/HIPAA compliance.

IBM

Associate Systems Engineer- Data and Analytics

Bangalore, India, India

Jan 2021

→

Dec 2022

Summary

Developed and optimized ML-powered recommendation systems and scalable ETL pipelines for e-commerce, contributing to significant improvements in personalization, query performance, and data governance.

Highlights

Developed ML-powered recommendation systems using collaborative filtering and learning-to-rank algorithms with scikit-learn and PyTorch, serving 50M+ daily predictions and improving e-commerce personalization by 22%.

Built scalable ETL pipelines with Airflow and Spark, ingesting 1.5TB+ monthly transaction data into Snowflake and BigQuery, optimizing query performance by 40%.

Designed A/B testing frameworks to evaluate ranking model performance, logging experiments with MLflow and increasing conversion rates by 15%.

Optimized SQL queries and data models (Star/Snowflake schemas) in BigQuery for real-time analytics, reducing latency for customer behavior insights by 30%.

Deployed ML models in production using Docker and Kubernetes, ensuring scalability and reliability for high-traffic e-commerce platforms.

Collaborated with cross-functional teams to align ML solutions with business goals, delivering predictive analytics for demand forecasting and inventory optimization.

Built Tableau and Power BI dashboards to visualize KPIs (customer lifetime value, sales trends), reducing decision-making time by 25%.

Implemented data governance and lineage frameworks, improving traceability and compliance with retail data standards.

Accenture

Data Integration Engineer

Hyderabad, India, India

Jun 2019

→

Dec 2020

Summary

Engineered robust ETL pipelines and developed ranking algorithms for healthcare analytics, focusing on patient outcome prediction, data accessibility, and compliance.

Highlights

Engineered ETL pipelines with PySpark and AWS Glue to process 500K+ healthcare records, enabling ML model training for patient outcome prediction and improving query performance by 35%.

Developed ranking algorithms for healthcare analytics using scikit-learn, supporting personalized care recommendations and increasing care quality scores by 20%.

Optimized SQL queries on Redshift for dimensional data models, reducing processing times by 30% and enabling real-time analytics for claims processing.

Implemented A/B testing to evaluate ML model performance, logging experiments to ensure reproducibility and compliance with HIPAA standards.

Built Power BI dashboards to visualize healthcare KPIs (readmission rates, claims turnaround), improving data accessibility by 20%.

Leveraged AWS S3 and Lambda for scalable data ingestion, ensuring GDPR/HIPAA compliance and reducing manual interventions by 40%.

Collaborated with clinical teams to integrate HL7/FHIR data, enabling seamless data exchange for ML-driven insights.

Education

Saint Louis University

Saint Louis, Missouri, United States of America

Master

Computer and Information Sciences

Malla Reddy Engineering College

Hyderabad, India, India

Bachelor

Electronics and Communication Engineering

Certificates

Gen AI Solutions

Issued By

ambariCloud

Certified Azure Fundamentals AZ-900

Issued By

Microsoft

IBM Certified Advocate Cloud V1

Issued By

IBM

IBM Certified Data Science Foundations

Issued By

IBM

Azure Spark Databricks Essential Training

Issued By

Microsoft

Learning BigQuery

Issued By

Skills

Programming & ML Frameworks

Python, Pandas, NumPy, scikit-learn, TensorFlow, PyTorch, Hugging Face Transformers, SQL, Java, Scala, JavaScript.

Machine Learning & AI

Ranking Algorithms, Collaborative Filtering, Learning-to-Rank, Deep Learning, LLMs, Transformers, Generative AI, A/B Testing, Causal Inference, Explainable AI.

Big Data & Distributed Computing

Apache Spark, PySpark, Ray, Hadoop (HDFS, Hive), Databricks.

Data Engineering & ETL

Apache Airflow, AWS Glue, Azure Data Factory, Apache Kafka, Spark Structured Streaming.

Data Warehouses & Databases

Snowflake, AWS Redshift, Google BigQuery, Azure Synapse, PostgreSQL, MongoDB.

Cloud Platforms

AWS (S3, Lambda, EMR, Redshift, Athena), Azure (Data Lake, Synapse), Google Cloud (BigQuery, Dataflow).

MLOps & CI/CD

MLflow, Airflow, Docker, Kubernetes, Git, Jenkins, Azure DevOps.

Data Modeling & Governance

Star/Snowflake Schemas, Data Lakes, Data Quality (Talend, Informatica), Data Lineage, GDPR/HIPAA Compliance, PII Masking.

Visualization & Reporting

Tableau, Power BI, Plotly.

Compliance & Regulations

HIPAA, GDPR, AML, KYC, Dodd-Frank, Basel III.

Projects

Diabetes Prediction Tool with Personalized recommendations

Summary

Developed a Python-based ETL pipeline to process and transform 10,000+ synthetic patient records, enabling accurate data processing and visualization for healthcare analytics. This project focused on creating personalized recommendations and enhancing stakeholder interpretability.