Sasanka Jana

Data Scientist
Kolkata, IN.

About

Highly motivated and results-driven Data Scientist with an M.Tech in Computer Science from ISI Kolkata, specializing in machine learning, deep learning, and advanced language models. Proven expertise in building scalable, end-to-end data workflows and AI solutions on cloud platforms like Databricks and Azure. Adept at extracting valuable insights, translating data into impactful results, and leveraging modern cloud technologies to drive data-driven decision-making and automation.

Education

Indian Statistical Institute, Kolkata
Kolkata, West Bengal, India

Master of Technology

Computer Science

Grade: 76.9%

Midnapore College, Vidyasagar University
Midnapore, West Bengal, India

Master of Science

Mathematics

Grade: CGPA: 7.71

Work

Exsegen Genomics Research Private Limited
|

Research Analyst

Hyderabad, Telangana, India

Summary

Led genomic data science initiatives, developing advanced toolkits and applying machine learning to classify complex biological data for precision diagnostics.

Highlights

Developed a comprehensive Python toolkit, "Galen-Kit," integrating modules for ETL, EDA, ML, and reporting, streamlining genomic workflows and data analysis.

Engineered and deployed "Galen-CLI," a command-line interface, significantly enhancing the efficiency of bioinformatic pipeline execution for research teams.

Analyzed large-scale transcriptomic and clinical datasets (TCGA, CGGA, Exsegen) using differential expression, PCA, and pathway enrichment to classify Oligodendroglioma subgroups.

Built and validated machine learning models that accurately predict unknown sample subgroups, advancing precision in cancer classification and diagnostic capabilities.

Processed 763 clinical samples and over 18,000 methylation probes to identify four distinct Medulloblastoma molecular subtypes, facilitating targeted therapeutic strategies.

Constructed robust ML models utilizing methylation profiles, ensuring high performance and reliability through rigorous validation techniques for molecular variant identification.

Hindalco Industries Limited
|

Data Science Intern

Kolkata, West Bengal, India

Summary

Optimized industrial operations and improved efficiency by developing data pipelines and conducting root cause analysis using scalable cloud infrastructure.

Highlights

Built robust data pipelines that optimized operational workflows, enhancing efficiency and generating actionable analytics for business improvements.

Conducted in-depth investigations into boiler tube leakages at the Muri plant, leveraging coal quality and water chemistry reports to identify critical root causes.

Utilized Oracle Cloud Infrastructure (OCI) for scalable data analysis, processing large datasets to derive insights and support decision-making.

Collaborated effectively with cross-functional teams to integrate data solutions and presented key findings to senior stakeholders, influencing strategic decisions.

Awards

Qualified GATE 2024 in Data Science and Artificial Intelligence (DA)

Awarded By

Graduate Aptitude Test in Engineering (GATE)

Achieved qualification in the Graduate Aptitude Test in Engineering (GATE) 2024 for Data Science and Artificial Intelligence (DA), demonstrating strong foundational knowledge and aptitude in the field.

Qualified GATE 2022 in Mathematics (MA)

Awarded By

Graduate Aptitude Test in Engineering (GATE)

Achieved qualification in the Graduate Aptitude Test in Engineering (GATE) 2022 for Mathematics (MA), showcasing robust analytical and problem-solving skills.

Certificates

Complete Generative AI Course with LangChain and Hugging Face

Issued By

Verified

Databricks Certified Data Engineer Associate

Issued By

Databricks

Supervised Machine Learning: Regression and Classification by Stanford University

Issued By

Stanford University

Skills

Programming

Python, C, SQL (MySQL), Bash, CLI Automation, Galen-CLI.

Data Science

Machine Learning, Deep Learning, NLP, LLMs, RAG, Predictive Modeling, Statistical Modeling.

Data Analytics

EDA, ETL, Power BI, Plotly, Seaborn, pandas, NumPy, Matplotlib.

Cloud Platforms

Azure Databricks, AWS S3, Oracle Cloud, OCI.

Libraries & Tools

PyTorch, TensorFlow, LangChain, Hugging Face, OpenAI, Streamlit, Gradio.

Deployment

MLOps, Git, GitHub, Quarto.

Projects

Dissertation: Enhancing Confidence Calibration in Long-Tailed Recognition

Summary

Addressed class imbalance and model miscalibration in deep neural networks, developing tailored augmentation strategies and a novel AWABS approach to improve calibration and performance.

AI Agent & LLM Project: Conversational RAG With PDF Uploads and Chat History

Summary

Developed a web-based RAG application enabling natural language queries on uploaded PDFs and maintaining context across interactions using LangChain and Hugging Face models.

AI Agent & LLM Project: Trend Analysis Using AI Agent

Summary

Developed a scalable AI system using transformer-based embeddings for dynamic semantic clustering and topic discovery to detect, group, and track trending customer review topics.

Model Deployment Experience: AI and LLM Application Deployment

Summary

Deployed AI and LLM-based applications using Streamlit and Hugging Face Spaces for real-time interaction and semantic search capabilities.

Power BI Project: Pizza Sales KPI Dashboard

Summary

Built an interactive Power BI dashboard to monitor pizza sales performance by tracking key KPIs.

Power BI Project: Sales & Profit Performance Dashboard

Summary

Designed a dynamic Power BI dashboard to analyze sales trends, profit performance, and state-wise distribution.

AI Agent & LLM Project: Q&A With ChatGROQ

Summary

Developed and deployed a Q&A application using Groq's Gemma2-9b-It model and LangChain for response generation and a clean Streamlit user interface.