MetroSense: Vision-Language Assistant for Navigation Aid in Urban Metro Systems
→
Summary
Developed a novel web-based vision-language assistance platform to empower visually impaired individuals navigate the complex Delhi Metro system using YOLOv11 object detection model fine-tuned on custom annotated dataset with specific augmentations, achieving 65.1% mAP@50 for identifying environmental elements from real-time image captures.. Integrated LLAMA Vision 3.2 90B model for sophisticated Visual Question Answering, engineered with context-rich, few-shot prompting and optimized decoding parameters to achieve a BERT F1 score of 0.85, delivering semantically accurate, context-aware voice-synthesized responses to user queries for improved safety and autonomy.
