Build vs Buy AI: When Custom Models Beat Off-the-Shelf

Karpathy's MicroGPT shows building AI from scratch is teachable — but should your business do it? Here's a practical framework for choosing build vs buy in 2026.

Dmytro SerebrychSEO & Lead of Production · 5 min read · LinkedIn →

Andrej Karpathy just published MicroGPT — a walkthrough of building a small language model from scratch in a few hundred lines of code. It's a brilliant educational resource. It also raises a question that every business leader should be asking in 2026: when does it actually make sense to build your own AI, and when should you just use what's already available?

The Build-vs-Buy Question Is Back

For most of software history, the build-vs-buy debate had a clear answer: buy (or use open source) unless you have a genuine competitive moat in that capability. Custom CRM? Salesforce. Custom ERP? SAP. Custom authentication? Auth0.

AI is scrambling that calculus. OpenAI, Anthropic, and Google offer powerful foundation models via API. But a growing number of companies are discovering that off-the-shelf models have real limitations when applied to specialized domains — and the cost of those limitations, at scale, is significant.

According to a 2025 a16z survey, 60% of enterprises running AI in production reported that generic foundation models required substantial prompt engineering and fine-tuning to reach acceptable accuracy on domain-specific tasks. For many, the fine-tuning cost approached the cost of training a smaller custom model outright.

When Off-the-Shelf Wins

Most businesses should start here. Off-the-shelf models from major providers are fast to integrate — an API key and a few lines of code, not months of ML engineering. They improve continuously without you maintaining a training pipeline. They run on billion-dollar infrastructure you don't have to operate.

They're also good enough for the majority of general tasks: summarization, classification, code generation, customer support triage. If your use case is general-purpose and your data isn't highly specialized, a well-prompted GPT-4o or Claude call is almost certainly the right answer. The productivity gain from getting started in a week beats the theoretical accuracy gain from a custom model you'll finish in six months.

This is also the lower-risk path. You can prototype, validate business value, and only then decide whether to invest in something more bespoke. Many companies that skipped this step wasted months on custom infrastructure before confirming the use case even worked.

When Custom Models Start to Make Sense

There are specific conditions where building custom — whether that means fine-tuning, training from scratch, or running open-source models locally — starts to win on economics and performance:

High query volume with predictable tasks — If you're running 10 million classifications per month, a fine-tuned smaller model can cost 10× less than API calls to a frontier model
Proprietary or sensitive data — Healthcare, legal, and financial data that can't leave your infrastructure requires self-hosted models
Domain-specific accuracy requirements — Medical coding, legal contract review, and industrial defect detection are cases where generic models plateau and custom training delivers the last 15–20% of accuracy that matters
Latency-critical applications — Real-time inference (sub-100ms) often requires smaller, specialized models running on your own hardware
Competitive moat — If your AI capability is the product, not just a feature, owning the model matters for defensibility

These aren't hypothetical. Companies in legal tech, medical imaging, and industrial automation are already running custom models in production — not because it's fashionable, but because the ROI math is clear. See how this connects to real-world automation projects that UData has delivered.

The Middle Ground: Fine-Tuning

For most companies sitting between "GPT API is good enough" and "we need to train from scratch," fine-tuning is the practical answer. You take an open-source foundation model — Llama 3, Mistral, Qwen — and train it on your domain data for a fraction of the cost of training from scratch.

The economics are increasingly favorable. A fine-tuning run that would have cost $50,000 in 2023 costs under $5,000 today on commodity GPU hardware. The tooling has matured significantly — LoRA, QLoRA, and PEFT frameworks make it accessible to teams without deep ML research backgrounds.

The catch: you still need engineers who understand the process, can evaluate model quality rigorously, and know how to deploy and serve a custom model in production. You also need to build the evaluation harness — without rigorous benchmarking, fine-tuning runs can silently regress on edge cases while appearing to improve on average metrics. That's not a small requirement, and it's where most self-serve fine-tuning efforts stumble.

A Simple Decision Framework

The practical question isn't "should we build AI?" — it's "which layer should we own?" Here's how to think about it:

Scenario	Recommended Approach
General tasks, low volume	Off-the-shelf API
High volume, predictable task	Fine-tuned open-source model
Sensitive data, can't use external APIs	Self-hosted open-source model
Core product differentiator	Custom training or fine-tuning
Real-time, sub-100ms latency	Specialized smaller model on own hardware

How UData Helps

UData helps companies navigate exactly this decision — and then execute on whichever path makes sense. Through our automation and AI services, we've built production AI systems using frontier APIs, fine-tuned open-source models for specialized domains, and designed hybrid architectures that route queries intelligently between model tiers based on complexity and cost.

What we bring to the table:

Honest assessment of whether your use case actually needs custom AI (often it doesn't)
Fine-tuning pipelines for domain-specific accuracy improvements
Self-hosted model deployment for data-sensitive environments
Cost modeling to determine the break-even point between API calls and custom infrastructure
Dedicated ML engineers embedded in your team for long-term AI development

We don't sell a default recommendation. The right answer depends on your volume, your data sensitivity, your latency requirements, and your engineering capacity. We figure that out with you, then build accordingly.

Conclusion

Karpathy's MicroGPT is a reminder that the fundamentals of AI are more accessible than ever. But accessible doesn't mean every business should be training models. The right framework is simple: start with off-the-shelf, measure where it fails, and only build custom where the gap justifies the investment. Most companies that skip step two end up either over-engineering or under-performing.

If you're at the point of asking the question seriously, you probably need engineers who've made this decision before. That experience is worth a lot more than it looks. Talk to UData and let's figure out the right path for your use case.