Lekshmi Thulasidharan

Lekshmi Thulasidharan

Data Scientist | AI/ML/NLP Specialist
Madison, US.

About

Highly accomplished PhD candidate with over 5 years of experience in Data Science, specializing in AI, Machine Learning, and Natural Language Processing. Proven ability to build and deploy complex ML solutions using Python, TensorFlow, and Hugging Face, consistently delivering production-ready tools and leading impactful research initiatives.

Work

Auxillium Health
|

Data Scientist Intern

Remote, N/A, US

Summary

Currently developing and optimizing AI-powered conversational agents for healthcare applications, focusing on robust data retrieval and model refinement.

Highlights

Built a Retrieval-Augmented Generation (RAG)-based chatbot for Wound Tele.AI Pro, leveraging LlamaIndex, ChromaDB, and Hugging Face to provide evidence-backed answers to wound care queries.

Developed a semantic chunking and preprocessing pipeline for dense passage retrieval across PubMed articles, enhancing data preparation for AI models.

Evaluated 5+ embedding models and chunking strategies on 50+ wound-related queries, improving precision@5 by 17%.

Prototyped a local QA tool using Haystack, ChromaDB, Hugging Face, and Streamlit for efficient offline testing and user validation.

Collaborated with wound care experts to validate chatbot output and refine domain alignment for patient use cases, demonstrating effective leadership and cross-functional teamwork.

University of Wisconsin Madison
|

Graduate Researcher

Madison, WI, US

Summary

Led an 8-member research team in astrophysics, applying advanced data science and statistical modeling to analyze large-scale stellar datasets and generate significant scientific insights.

Highlights

Led an 8-member team to author a peer-reviewed study on vertical kinematics of the Milky Way, utilizing Gaia survey data.

Queried, cleaned, and integrated 500,000+ stellar records using SQL and Python; applied bootstrapping, correlation analysis, and hypothesis testing to extract vertical motion patterns.

Validated findings with greater than 3-sigma confidence, demonstrating end-to-end execution from data engineering to insight generation.

Tata Institute of Fundamental Research
|

Summer Data Intern

Mumbai, Maharashtra, India

Summary

Engineered features and applied machine learning models to classify particles from a large-scale simulated detector dataset, achieving high accuracy.

Highlights

Engineered features from a 500k-instance simulated Belle detector dataset, applying a Boosted Decision Tree model to classify low-momentum muons from background particles.

Achieved 85% accuracy in classifying low-momentum muons, significantly improving particle identification.

Education

University of Wisconsin – Madison
Madison, WI, United States of America

PhD

Physics

Grade: 3.82/4.00 GPA

Courses

Foundations of Data Science

Data Management in Data Science

Theory and Methods of Mathematical Statistics

Skills

Languages

Python, SQL.

Frameworks & Libraries

Pandas, NumPy, Scikit-learn, TensorFlow, Hugging Face Transformers, Streamlit, Power BI (basic), Git, LlamaIndex, Haystack, ChromaDB.

ML & GenAI

Deep Learning, NLP, Retrieval-Augmented Generation (RAG), LLM Fine-Tuning, Embeddings, Vector Search.

Tools & Platforms

Docker, Google Cloud, Snowflake, dbt, Airbyte, GitHub.

Data Handling & Modeling

Data Wrangling, Data Cleaning, Feature Engineering, ETL Pipelines, Statistical Modeling.

Projects

PathoPredictX-GC: Auditing and Interpretability Tool for Gastric Cancer Histopathology

Summary

Developed and deployed an auditing and interpretability tool for gastric cancer histopathology, leveraging deep learning to classify gastric cancer tumor microenvironment tissue images and enhance diagnostic trust.

Cloud-Based ELT Pipeline and Trading Analytics

Summary

Designed and implemented a robust cloud-based ELT pipeline to integrate diverse data sources into Snowflake, enabling comprehensive trading performance analytics.