Sandboxing AI Agents in Production: Security You Can't Skip | UData Blog

Running AI agents locally cuts costs — but without sandboxing, a single prompt injection can compromise your entire system. Here's how to secure AI agents in production.

Dmytro SerebrychSEO & Lead of Production · 5 min read · LinkedIn →

A new open-source project called Agent Safehouse landed on Hacker News this week with a simple premise: AI agents that run locally need native sandboxing, and most teams aren't providing it. The project sparked a thread that exposed a blind spot in how production AI systems are being deployed — security is an afterthought, and the consequences are real.

The Problem With Unsandboxed AI Agents

Local AI agents — systems that run on your own infrastructure, call tools, read files, and execute code — have become a practical option for companies that want automation without cloud API costs or data residency concerns. The economics are compelling. The security posture, for most implementations, is not.

When an AI agent runs without sandboxing, it typically operates with the same system permissions as the process that launched it. That means:

A prompt injection attack in user-supplied input can direct the agent to read files it should never touch
A malicious tool response can instruct the agent to execute arbitrary commands
An LLM hallucination that generates a shell command gets executed against your real filesystem
Credentials present in the environment are visible to every tool the agent calls

According to a 2025 analysis by Trail of Bits, 73% of production AI agent deployments they audited had at least one critical permission boundary violation — where the agent could access resources outside the scope of its intended task. In 41% of cases, this was exploitable via crafted input without any authenticated access.

These aren't theoretical risks. They're the documented failure mode of agentic systems built for capability first and security second.

What Sandboxing Actually Means for AI Agents

Sandboxing an AI agent means constraining what it can access and do, independent of what the LLM decides to try. The constraint is enforced at the OS or runtime level — not in the prompt, and not in the agent's own logic. Agents cannot be trusted to self-limit; the boundary has to be external.

The practical layers of a well-sandboxed agent:

Filesystem Isolation

The agent should only see the files it needs. Chroot jails, Linux namespaces, or container-based filesystem mounts can enforce this. If your document-processing agent has no legitimate reason to read /etc/ or your SSH keys, it should be architecturally impossible for it to do so — not merely unlikely.

Network Policy

Most agents need outbound network access to call APIs or retrieve data. Very few need unrestricted outbound access. Egress filtering via firewall rules or container network policies limits the blast radius if an agent is manipulated into making unexpected external calls. An agent that processes internal documents should not be able to exfiltrate them to an arbitrary endpoint.

Process and Syscall Restrictions

Tools like seccomp (Linux), Seatbelt (macOS — the mechanism Agent Safehouse builds on), and gVisor restrict which system calls a process can make. An agent that never needs to spawn child processes should not be able to. An agent that never writes to disk should be blocked from doing so at the kernel level, not just by convention.

Credential Isolation

Environment variables are a common vector for credential exposure in agentic systems. Secrets should be injected at the tool level, scoped to the minimum necessary permissions, and rotated per-session where possible. The agent itself should never have access to credentials beyond what its currently-executing tool requires.

Execution Timeouts and Resource Limits

Unbounded agent execution is a denial-of-service risk — intentional or not. Hard limits on CPU time, memory, and wall-clock duration ensure that a stuck or manipulated agent cannot consume resources indefinitely. This is basic but frequently missing from production deployments.

The Tradeoff: Sandboxing Adds Complexity

None of this is free. Sandboxing requires understanding exactly what your agent needs to do — which means writing and maintaining an explicit capability manifest. It adds configuration surface area. It can break agent functionality if the constraints are too tight. And it requires engineers who understand both the AI layer and the systems security layer.

That last point is where most teams get stuck. AI engineers and security engineers rarely collaborate closely, and agentic systems sit directly at their intersection. The result is production deployments that are secure by the AI team's definition (the prompt says not to do bad things) but insecure by any meaningful systems standard.

The complexity is real, but it's manageable. The teams that get this right treat sandboxing as part of the architecture from day one — not a hardening pass after the system is already running.

Minimum Viable Security for Production AI Agents

If you have AI agents running in production today without explicit sandboxing, here's a pragmatic starting point:

Containerize immediately: Even a basic Docker container with a non-root user and a read-only filesystem mount is significantly better than running directly on the host. It's not a complete solution, but it eliminates a large class of easy exploits.
Audit tool permissions: List every tool your agent can call and the permissions each one requires. Remove any tool that isn't actively used. Scope credentials to the minimum necessary access.
Add egress filtering: Define the set of external endpoints your agent legitimately calls. Block everything else. This is a one-afternoon configuration change with significant security impact.
Implement execution limits: Set hard timeouts on agent runs. Alert on runs that exceed expected duration. A stuck agent is a signal worth investigating.
Log everything: Every tool call, every external request, every file access. You can't investigate what you didn't record.

How UData Helps

UData builds production AI automation with security as a first-class requirement. We design agentic systems with explicit sandboxing from the start — filesystem isolation, network policies, credential scoping, and execution limits — not as a hardening pass after the fact.

We work with companies that are:

Deploying AI agents in production and need security architecture review before they go live
Running unsandboxed agents today and need a remediation plan that doesn't break existing functionality
Building new agentic automation and want engineers who have secured these systems before
Operating in regulated environments where AI agent security is a compliance requirement, not just a best practice

We bring engineers who understand both the AI layer and the systems security layer — because production agentic systems require both. Learn more about our automation and AI services, see our case studies, or talk to us directly about securing your agentic systems.

Conclusion

The security implications of running them without sandboxing are not theoretical — they are documented, exploitable, and increasingly well-understood by attackers. The tools to sandbox agents exist and are mature. The gap is in applying them deliberately, from the start, as a design constraint rather than an afterthought.

The Agent Safehouse project is a useful reference implementation. But the core principle generalizes: constrain what your agents can do at the system level, because you cannot rely on the LLM to constrain itself. Build that boundary in early, and maintain it as the system grows.