Sahil Jain

Data Scientist | Machine Learning & AI | Python | Generative AI
Roorkee, IN.

About

Highly analytical and results-driven Data Scientist specializing in Machine Learning, AI, and Generative AI, currently pursuing a B.Tech at IIT Roorkee. Proven ability to leverage Python, SQL, and advanced ML techniques to transform complex data into actionable insights, evidenced by an internship where reporting efficiency improved by 50% and multiple AI-powered projects. Eager to apply expertise in developing innovative data-driven solutions and contributing to cutting-edge AI advancements.

Work

Top Trove Foundation
|

Data Analyst Intern

Remote, N/A, India

Summary

Spearheaded data analysis and visualization efforts for a non-profit, leveraging Python and SQL to derive actionable insights and enhance reporting efficiency.

Highlights

Processed and analyzed data for over 1,200 beneficiaries using Python and SQL, delivering concise summary reports that facilitated strategic, data-driven decision-making for key organizational stakeholders.

Developed interactive Power BI dashboards, linked to MySQL databases, to track donations, volunteer hours, and student enrollment, boosting reporting efficiency and data accessibility by approximately 50%.

Conducted comprehensive Exploratory Data Analysis (EDA) with Python (Pandas, Matplotlib, Seaborn) to uncover key behavioral and demographic factors impacting user engagement patterns.

Experimented with Scikit-learn ML pipelines, including classification and clustering, to model and predict user engagement, providing insights for targeted outreach strategies.

Education

Indian Institute of Technology, Roorkee
Roorkee, Uttarakhand, India

B.Tech

Metallurgical and Materials Engineering

Modern Delhi Public School
Faridabad, Haryana, India

12th Grade

High School

Grade: 93%

Modern Delhi Public School
Faridabad, Haryana, India

10th Grade

High School

Grade: 96.6%

Awards

Top 1% JEE Advanced Rank Holder

Awarded By

JEE Advanced

Secured an All India Rank of 7229 in JEE Advanced, demonstrating exceptional analytical thinking, problem-solving, and high performance under pressure in one of the world's most competitive exams.

National Mental Math Champion

Awarded By

Abacus 10th National Competition

Championed in the Abacus 10th National Competition on Japanese Sorobon and Mental Maths, showcasing advanced mental calculation abilities.

Global Rank 68 in Codechef Starters 188 (Rated)

Awarded By

Codechef

Achieved a global rank of 68 (Top 0.5%) among 15,000+ participants in algorithmic programming in Codechef Starters 188, demonstrating strong competitive programming skills.

Languages

English

Certificates

Generative AI using Google Gemini

Issued By

Google

Career Essentials in Data Analysis

Issued By

Microsoft

Learning Data Analytics

Issued By

LinkedIn

Skills

Programming & Scripting

Python, SQL (MySQL), C++, NumPy, Pandas, scikit-learn, SHAP, Data Structures & Algorithms.

Machine Learning & AI

Supervised Learning, Unsupervised Learning, Regression, Classification, Gradient Boosting, Clustering, Dimensionality Reduction, Feature Engineering, Handling Imbalanced datasets, Hyperparameter Tuning, Pipeline Building, TF-IDF Embeddings, Large Language Models (LLMs), Prompt Engineering, Model Evaluation & Validation, Neural Networks.

Data Analysis & Statistical Modeling

EDA, Hypothesis Testing, Root Cause Analysis, Anomaly Detection, Cross-Validation.

Data Visualization & Dashboarding

Matplotlib, Seaborn, Plotly, Power BI, MS Excel, DAX, Power Query.

Database & ETL

SQL, MySQL (Window Functions), MongoDB, Data Modeling, Handling Structured and Unstructured Data.

Tools & Platforms

Jupyter Notebook, Google Colab, Streamlit, GitHub, VS Code, PyCharm, Microsoft Office Suite.

Projects

Flight Delay Prediction and Root Cause Analysis using Random Forest Pipeline

Summary

Engineered a robust Random Forest pipeline to accurately predict flight delays and identify root causes, enhancing operational insights for airlines.

AI-Powered Story Teller using Gemini and Streamlit

Summary

Built an interactive, real-time story generation application leveraging Google Gemini API and Streamlit, offering personalized narratives.

Posts Recommendation System using Natural Language Processing (NLP)

Summary

Designed and implemented an NLP-based recommendation system to suggest personalized posts to users by leveraging TF-IDF embeddings and engagement data.

Morning Buddy – AI-Powered Daily Companion using Gemini and Streamlit

Summary

Engineered a multi-API Streamlit application integrating Gemini LLM with real-time weather, news, and personalized itinerary generation.

Sahil Jain