Juan Pablo Gallego Van Megroot

Juan Pablo Gallego Van Megroot

AI Engineer
Madrid, ES.

About

github.com/jpgallegoar | huggingface.co/jpgallegoar
Highly accomplished and results-driven AI Engineer with intensive, full-time experience specializing in cutting-edge Generative AI, particularly in Speech, Language, and Video modalities. Proven ability to rapidly acquire expertise and drive projects from conception to impact, highlighted by the creation of the leading open-source Spanish TTS model with over 40k downloads (Spanish-F5) which directly led to a CTO role. Seeking to leverage deep hands-on skills in PyTorch, Hugging Face, MLOps, and innovative model development in a challenging AI/ML role.

Work

Deloitte
|

Technology Consultant

Highlights

Developed and implemented IT solutions for a major multinational retailer, contributing to the optimization of business operations and system performance.

CeDInt - Universidad Politécnica de Madrid
|

Unity Developer & VR Researcher

Highlights

Led Unity development for SAFEDUCA, a Virtual Reality simulation teaching road safety to children, contributing to research on VR effectiveness in education.

Implemented a dynamic scenario editor for VIRESTREEP, an EU research project enhancing Vulnerable Road User safety.

Engaged in diverse projects including 3D modeling (Autodesk 3ds Max) and point cloud integration in VR (CloudCompare, Unity).

Perceptive Automata
|

Data Analyst (AI Computer Vision)

Highlights

Contributed to the creation and training of AI models for computer vision, supporting the development of safer autonomous vehicle technologies.

VoicePowered.ai
|

Chief Technology Officer (CTO) & Lead AI Developer

Summary

Spearheaded end-to-end AI development and infrastructure, leading technical strategy and execution for generative speech and language models.

Highlights

Designed, trained, and deployed advanced text-to-speech and voice models, including: VoicePowered V1 (DiT-based) and VoicePowered V2 (Transformer/Autoregressive) architectures for high-quality voice generation.

Implemented Reinforcement Learning (RL) techniques, guided by subjective audio evaluation ("good ear") and identity matching, to significantly enhance model naturalness, consistency, and long-form audio generation stability.

Developed and integrated core features (podcast mode, voice chat) into the open-source F5-TTS program, contributing directly to the codebase.

Created Custom Voice and Accent LORAs for fine-grained model control.

Engineered and optimized inference pipelines for diverse AI models: Set up and managed local and cloud inference infrastructure, specializing in serverless GPU platforms (Runpod, Replicate) for cost and performance efficiency.

Developed robust inference scripts, functionalities, and APIs.

Contributed to cutting-edge generative AI beyond speech: Performed Llama Finetuning and optimized LLM inference using vLLM.

Worked with and finetuned advanced generative video models (LTXV, CogVideoX, HunyuanVideo, Wan2.1), and developed ComfyUI nodes and workflows.

Integrated and deployed inference for existing Music Models.

Experienced with Image Models

Managed full data lifecycle: dataset creation, hosting, curation, automated filtering, preprocessing (audio, video, text, including EQ, normalization, base64 management), and post-processing.

Implemented MLOps practices including Docker creation and release management, serverless machine setup/maintenance, and training monitoring (loss, learning rate, grad_norm) with Weights & Biases.

Functioned as the primary technical expert, communicating project capabilities and technical details to clients and stakeholders.

Developed novel implementations based on parallel research and problem-solving challenges encountered during model development.

Education

Universidad Politécnica de Madrid

Bachelor's Degree

Informatics Engineering (Computer Science)

Awards

1st Place Winner

Awarded By

LTXV Paris AI Video Hackathon

Collaborated in a team of 3 to develop an award-winning AI video solution, leveraging skills in generative video model finetuning and rapid prototyping.

2nd Place Winner

Awarded By

BeTech Innovative Design Competition, UPM

Languages

Spanish

Native

English

Native

Certificates

Cambridge C2 English Proficiency
Microsoft Certified: Azure AI Fundamentals

Skills

MLOps & Tools

Docker, Git, Weights & Biases, Serverless (Runpod, Replicate), ComfyUI (Node Development), Conda/Miniconda, Linux/VM Environments.

Data Processing

Audio (EQ, Normalization, Base64), Video, Text, Curation, Filtering.

Other Software

Unity, Autodesk 3ds Max, Relux, CloudCompare.

AI/ML

Frameworks & Libraries: PyTorch, TensorFlow, Hugging Face (Transformers, Diffusers), vLLM, Librosa, Optuna, Techniques: Deep Learning, Large Language Models (LLMs), LLM Finetuning (Llama), Text-to-Speech (TTS), Automatic Speech Recognition (ASR), Reinforcement Learning (RL), Generative Video, Generative Image Models, Computer Vision, LORAs, Hyperparameter Tuning, Model Merging & Pruning, Architectures: Transformers, CNNs, DiT-based Models, Autoregressive Models.

Programming Languages

Python (Advanced), C#, Java, JavaScript (React), C, SQL (MySQL, PostgreSQL), HTML.

Projects

Spanish-F5 Open Source TTS Model

Summary

Independently developed and released the leading open-source Text-to-Speech system for Spanish by finetuning the F5-TTS architecture.

Academic Final Thesis (Honors - 9.8/10): "Parkinson's Disease Detection Through Voice Recordings using Convolutional Neural Networks (CNNs)."

Summary

Independently designed, trained, and evaluated a CNN model for early Parkinson's detection from audio, achieving high accuracy and receiving top academic honors.