NanoChat: Train Your Own GPT-2 Language Model for Under 100

Andrej Karpathy created nanochat to train GPT-2 AI for only 48 dollars. Learn how anyone can build their own AI language model with this open-source project.

This article was produced with AI assistance and reviewed by a named GenZ NewZ editor before publication.

What is NanoChat?

NanoChat is an open-source project by Andrej Karpathy, the renowned AI researcher who previously led Tesla Autopilot and founded OpenAI. According to the project documentation, this experimental harness is designed to be the simplest way to train a large language model from scratch. The project aims to make AI training accessible to anyone with a budget under 100 dollars.

The framework covers all major stages of LLM development: tokenization, pretraining, fine-tuning, evaluation, inference, and even includes a familiar ChatGPT-like web interface for chatting with your trained model. As described in the GitHub repository, it is minimal, hackable, and runs on a single GPU node.

For more on AI innovations, check out our AI News section.

How Cheap is It Really?

Training a GPT-2 sized model in 2019 cost approximately $43,000, as reported by OpenAI when they originally released the model. With nanochat, you can now train a model with GPT-2 level capabilities for only about $48, running for roughly 2 hours on an 8XH100 GPU node. On spot instances, the cost can drop to around $15.

The project maintains a leaderboard tracking what they call the "GPT-2 speedrun" - the wall-clock time required to train a model to GPT-2 grade capability, as measured by the DCLM CORE score. According to the leaderboard data, the current best time is just 2.02 hours, achieved by switching to the NVIDIA ClimbMix dataset. This represents an incredible 80x improvement in efficiency since 2019.

You can learn more about transformer architecture from the original Google research paper titled "Attention Is All You Need."

How Does It Work

The entire pipeline is contained in a single script called runs/speedrun.sh. Users boot up an 8XH100 GPU node from cloud providers like Lambda, run the training script, and then interact with their model through a web UI. According to the setup instructions, the process is designed to be as straightforward as possible.

All hyperparameters are calculated automatically by setting one parameter: depth, the number of layers in the transformer. For GPT-2 capability, you need approximately depth 26. The system automatically adjusts width, heads, learning rate, and other parameters optimally.

Explore more Tech & Games coverage on our site.

Why This Matters

NanoChat democratizes AI education and research. Instead of relying on API access to closed models like ChatGPT, anyone can understand and experiment with the fundamental technology behind modern AI. This represents a massive shift in how we think about AI development.

The barriers to entry have dropped dramatically. Researchers can quickly test modifications by training smaller 12-layer models in about 5 minutes. As noted in the project discussions, this makes it possible for students and hobbyists to learn about LLMs without expensive infrastructure.

Getting Started

To try nanochat, you will need access to a GPU node. Boot one up from cloud providers like Lambda, activate your Python environment, and run: bash runs/speedrun.sh

You may wish to do this in a screen session as this will take approximately 2-3 hours to run. According to the documentation, once it is done, you can talk to it via the ChatGPT-like web UI by running: python -m scripts.chat_web

After training completes, serve the chat interface and start chatting with your very own GPT-2 model! For questions, check the GitHub discussions or join the Discord channel.

The Future of Accessible AI

NanoChat represents a significant milestone in making AI technology transparent and accessible. As hardware costs continue to drop and optimization techniques improve, we can expect even more affordable AI training options. For students, researchers, and hobbyists, this opens up unprecedented opportunities to learn about and experiment with cutting-edge AI technology.

The project demonstrates that with the right tools, understanding complex AI systems does not require a massive budget or corporate resources. It is a testament to how open-source collaboration continues to democratize technology. According to Karpathy, the goal is to make AI education accessible to everyone.

Learn more about Andrej Karpathys work and other AI research on his GitHub profile.

NanoChat: Train Your Own GPT-2 Language Model for Under 100

What is NanoChat?

How Cheap is It Really?

How Does It Work

Why This Matters

Getting Started

The Future of Accessible AI

Comments 0

Leave a comment

GenZ Ai