Subtitle Search and Retrieval System
→
Summary
• Built a subtitle text retrieval system combining SimHash, phonetic hashing, and OpenAI embeddings to support multi-strategy search. • Implemented fast approximate search with Redis (SimHash + phonetic) and semantic fallback using pgvector in PostgreSQL. • Developed a preprocessing pipeline for text cleaning, chunking, embedding with Ada, and designed a layered fallback (SimHash → Phonetic → Embedding) for robust query recall.