NanoGPT
Research
Summary
Tech Stack: PyTorch, Python ------------------------------------------ Developed a 128M-parameter GPT model from scratch in PyTorch, exploring core attention mechanisms and positional encoding. Optimized training for memory efficiency using gradient checkpointing and mixed precision, and experimented with sampling strategies like top-k and temperature to improve text coherence.
