Spanish-F5 Open Source TTS Model
Summary
Independently developed and released the leading open-source Text-to-Speech system for Spanish by finetuning the F5-TTS architecture.
github.com/jpgallegoar | huggingface.co/jpgallegoar
Highly accomplished and results-driven AI Engineer with intensive, full-time experience specializing in cutting-edge Generative AI, particularly in Speech, Language, and Video modalities. Proven ability to rapidly acquire expertise and drive projects from conception to impact, highlighted by the creation of the leading open-source Spanish TTS model with over 40k downloads (Spanish-F5) which directly led to a CTO role. Seeking to leverage deep hands-on skills in PyTorch, Hugging Face, MLOps, and innovative model development in a challenging AI/ML role.
Technology Consultant
Highlights
Developed and implemented IT solutions for a major multinational retailer, contributing to the optimization of business operations and system performance.
Unity Developer & VR Researcher
Highlights
Led Unity development for SAFEDUCA, a Virtual Reality simulation teaching road safety to children, contributing to research on VR effectiveness in education.
Implemented a dynamic scenario editor for VIRESTREEP, an EU research project enhancing Vulnerable Road User safety.
Engaged in diverse projects including 3D modeling (Autodesk 3ds Max) and point cloud integration in VR (CloudCompare, Unity).
Data Analyst (AI Computer Vision)
Highlights
Contributed to the creation and training of AI models for computer vision, supporting the development of safer autonomous vehicle technologies.
Chief Technology Officer (CTO) & Lead AI Developer
Summary
Spearheaded end-to-end AI development and infrastructure, leading technical strategy and execution for generative speech and language models.
Highlights
Designed, trained, and deployed advanced text-to-speech and voice models, including: VoicePowered V1 (DiT-based) and VoicePowered V2 (Transformer/Autoregressive) architectures for high-quality voice generation.
Implemented Reinforcement Learning (RL) techniques, guided by subjective audio evaluation ("good ear") and identity matching, to significantly enhance model naturalness, consistency, and long-form audio generation stability.
Developed and integrated core features (podcast mode, voice chat) into the open-source F5-TTS program, contributing directly to the codebase.
Created Custom Voice and Accent LORAs for fine-grained model control.
Engineered and optimized inference pipelines for diverse AI models: Set up and managed local and cloud inference infrastructure, specializing in serverless GPU platforms (Runpod, Replicate) for cost and performance efficiency.
Developed robust inference scripts, functionalities, and APIs.
Contributed to cutting-edge generative AI beyond speech: Performed Llama Finetuning and optimized LLM inference using vLLM.
Worked with and finetuned advanced generative video models (LTXV, CogVideoX, HunyuanVideo, Wan2.1), and developed ComfyUI nodes and workflows.
Integrated and deployed inference for existing Music Models.
Experienced with Image Models
Managed full data lifecycle: dataset creation, hosting, curation, automated filtering, preprocessing (audio, video, text, including EQ, normalization, base64 management), and post-processing.
Implemented MLOps practices including Docker creation and release management, serverless machine setup/maintenance, and training monitoring (loss, learning rate, grad_norm) with Weights & Biases.
Functioned as the primary technical expert, communicating project capabilities and technical details to clients and stakeholders.
Developed novel implementations based on parallel research and problem-solving challenges encountered during model development.
Bachelor's Degree
Informatics Engineering (Computer Science)
Awarded By
LTXV Paris AI Video Hackathon
Collaborated in a team of 3 to develop an award-winning AI video solution, leveraging skills in generative video model finetuning and rapid prototyping.
Awarded By
BeTech Innovative Design Competition, UPM
Native
Native
Docker, Git, Weights & Biases, Serverless (Runpod, Replicate), ComfyUI (Node Development), Conda/Miniconda, Linux/VM Environments.
Audio (EQ, Normalization, Base64), Video, Text, Curation, Filtering.
Unity, Autodesk 3ds Max, Relux, CloudCompare.
Frameworks & Libraries: PyTorch, TensorFlow, Hugging Face (Transformers, Diffusers), vLLM, Librosa, Optuna, Techniques: Deep Learning, Large Language Models (LLMs), LLM Finetuning (Llama), Text-to-Speech (TTS), Automatic Speech Recognition (ASR), Reinforcement Learning (RL), Generative Video, Generative Image Models, Computer Vision, LORAs, Hyperparameter Tuning, Model Merging & Pruning, Architectures: Transformers, CNNs, DiT-based Models, Autoregressive Models.
Python (Advanced), C#, Java, JavaScript (React), C, SQL (MySQL, PostgreSQL), HTML.
Summary
Independently developed and released the leading open-source Text-to-Speech system for Spanish by finetuning the F5-TTS architecture.
Summary
Independently designed, trained, and evaluated a CNN model for early Parkinson's detection from audio, achieving high accuracy and receiving top academic honors.