Build vs Buy AI: When Custom Models Beat Off-the-Shelf
Karpathy's MicroGPT shows building AI from scratch is teachable — but should your business do it? Here's a practical framework for choosing build vs buy in 2026.
Andrej Karpathy just published MicroGPT — a walkthrough of building a small language model from scratch in a few hundred lines of code. It's a brilliant educational resource. It also raises a question that every business leader should be asking in 2026: when does it actually make sense to build your own AI, and when should you just use what's already available?
The Build-vs-Buy Question Is Back
For most of software history, the build-vs-buy debate had a clear answer: buy (or use open source) unless you have a genuine competitive moat in that capability. Custom CRM? Salesforce. Custom ERP? SAP. Custom authentication? Auth0.
AI is scrambling that calculus. OpenAI, Anthropic, and Google offer powerful foundation models via API. But a growing number of companies are discovering that off-the-shelf models have real limitations when applied to specialized domains — and the cost of those limitations, at scale, is significant.
According to a 2025 a16z survey, 60% of enterprises running AI in production reported that generic foundation models required substantial prompt engineering and fine-tuning to reach acceptable accuracy on domain-specific tasks. For many, the fine-tuning cost approached the cost of training a smaller custom model outright.
When Off-the-Shelf Wins
Most businesses should start here. Off-the-shelf models from major providers are:
- Fast to integrate — an API key and a few lines of code, not months of ML engineering
- Continuously improving — you get capability upgrades without maintaining a training pipeline
- Reliable at scale — billion-dollar infrastructure you don't have to operate
- Good enough for general tasks — summarization, classification, code generation, customer support triage
If your use case is general-purpose and your data isn't highly specialized, a well-prompted GPT-4 or Claude call is almost certainly the right answer. The productivity gain from getting started in a week beats the theoretical accuracy gain from a custom model you'll finish in six months.
When Custom Models Start to Make Sense
There are specific conditions where building custom — whether that means fine-tuning, training from scratch, or running open-source models locally — starts to win on the economics:
- High query volume with predictable tasks — If you're running 10 million classifications per month, a fine-tuned smaller model can cost 10× less than API calls to a frontier model
- Proprietary or sensitive data — Healthcare, legal, and financial data that can't leave your infrastructure requires self-hosted models
- Domain-specific accuracy requirements — Medical coding, legal contract review, and industrial defect detection are cases where generic models plateau and custom training delivers the last 15–20% of accuracy that matters
- Latency-critical applications — Real-time inference (sub-100ms) often requires smaller, specialized models running on your own hardware
- Competitive moat — If your AI capability is the product, not just a feature, owning the model matters for defensibility
The Middle Ground: Fine-Tuning
For most companies sitting between "GPT API is good enough" and "we need to train from scratch," fine-tuning is the practical answer. You take an open-source foundation model (Llama 3, Mistral, Qwen) and train it on your domain data for a fraction of the cost of training from scratch.
The economics are increasingly favorable. A fine-tuning run that would have cost $50,000 in 2023 costs under $5,000 today on commodity GPU hardware. The tooling has matured significantly — LoRA, QLoRA, and PEFT frameworks make it accessible to teams without deep ML research backgrounds.
The catch: you still need engineers who understand the process, can evaluate model quality rigorously, and know how to deploy and serve a custom model in production. That's not a small requirement.
How UData Helps
UData helps companies navigate exactly this decision — and then execute on whichever path makes sense. We've built production AI systems using frontier APIs, fine-tuned open-source models for specialized domains, and designed hybrid architectures that route queries intelligently between model tiers based on complexity and cost.
What we bring to the table:
- Honest assessment of whether your use case actually needs custom AI (often it doesn't)
- Fine-tuning pipelines for domain-specific accuracy improvements
- Self-hosted model deployment for data-sensitive environments
- Cost modeling to determine the break-even point between API calls and custom infrastructure
- Dedicated ML engineers embedded in your team for long-term AI development
Conclusion
Karpathy's MicroGPT is a reminder that the fundamentals of AI are more accessible than ever. But accessible doesn't mean every business should be training models. The right framework is simple: start with off-the-shelf, measure where it fails, and only build custom where the gap justifies the investment. Most companies that skip step two end up either over-engineering or under-performing.
If you're at the point of asking the question seriously, you probably need engineers who've made this decision before. That experience is worth a lot more than it looks.