Evaluating SAE interpretability without explanations
Published by
ArXiv
Summary
Evaluating SAE interpretability without explanations
Interpretability Researcher at EleutherAI. PhD in Theoretical and Applied Mechanics.
Interpretability Researcher
→
Summary
Leads cutting-edge research in AI interpretability, developing methods to understand complex model behaviors.
Highlights
Pioneered novel interpretability methods utilizing Sparse Autoencoders to enhance the transparency and reliability of complex AI models.
Developed advanced automated interpretability tools, directly contributing to the field of AI safety and responsible AI development.
Authored and co-authored multiple publications (3 in 2025, 2 in 2024) on SAE interpretability, LLM feature interpretation, and transferability of interpretability methods, significantly advancing scientific discourse.
PostDoc
Rome, Lazio, Italy
→
Summary
Conducted advanced research into memristive behavior, focusing on nanophysics and material science.
Highlights
Led experimental and computational research on memristive behavior, investigating hydrophobic gating, nanofluidics, and water intrusion in hydrophobic materials.
Authored and co-authored significant findings, including a publication in Nature Communications on neuromorphic applications of hydrophobically gated memristive nanopores.
Contributed to the understanding of complex material interactions at the nanoscale, providing foundational insights for novel technological applications.
PhD Researcher (Theoretical and Applied Mechanics)
Rome, Lazio, Italy
→
Summary
Completed a PhD in Theoretical and Applied Mechanics, focusing on computational approaches to material science.
Highlights
Awarded PhD Summa Cum Laude for thesis on computational approaches to the study of intrusion in hydrophobic materials, demonstrating exceptional academic rigor and research capability.
Developed and implemented advanced computational models to simulate and analyze complex physical phenomena, specifically water intrusion in materials.
Published research in Communications Physics on the impact of secondary channels on wetting properties, contributing to the understanding of nanoscale fluid dynamics.
Collaborated with Professor Alberto Giacomello, applying theoretical mechanics to solve challenging problems in material science.
→
PhD SUMMA CUM LAUDE
Theoretical and Applied Mechanics
Grade: SUMMA CUM LAUDE
→
Master Degree
Physics
→
Bachelors Degree
Physics
Published by
ArXiv
Summary
Evaluating SAE interpretability without explanations
Published by
ArXiv
Summary
Transcoders Beat Sparse Autoencoders for Interpretability
Published by
ArXiv
Summary
Sparse Autoencoders Trained on the Same Data Learn Different Features
Published by
ICML
Summary
Automatically Interpreting Millions of Features in Large Language Models
Published by
AAAI
Summary
Does Transformer Interpretability Transfer to RNNs?
Published by
Nature Communications
Summary
Hydrophobically gated memristive nanopores for neuromorphic applications
Published by
Communications Physics
Summary
The impact of secondary channels on the wetting properties of interconnected hydrophobic nanopores
Mechanistic Interpretability, Sparse Autoencoders, Automated Interpretability Tools, AI Safety, Large Language Models (LLMs), Transformer Interpretability, RNN Interpretability.
Nanofluidics, Memristive Behavior, Hydrophobic Gating, Theoretical Modeling, Scientific Computing, Data Analysis, Simulation, Material Science.
Scientific Research, Experimental Design, Problem-Solving, Peer Review, Academic Writing, Publication, Collaborative Research.
Python, Computational Modeling, LaTeX.