Developed a novel benchmark, inspired by ZeroBench, for complex visual reasoning, integrating an automated question-generation pipeline to train and evaluate vision-language models on GPT-level reasoning tasks.
Contributed to a reinforcement learning-based training framework, specifically designed to enhance models' ability to ground reasoning in visual inputs, reducing reliance on textual cues.
Pioneered the creation of a sophisticated evaluation system for next-generation AI models, setting a new standard for visual-language understanding.