April 4, 20268 min read

Why We Built OnCell

Every AI agent startup duct-tapes 7 services together. We built the primitive that replaces all of them.

Anup Singh

Founder, oncell.ai

I built JustCopy.ai — an AI coding agent with 15,000 users. Each user has a project. The agent reads their code, plans changes, edits files, runs tests, and pushes updates. It works.

What didn't work was the infrastructure underneath it.

The duct tape

To give each user their own environment, I wired together:

ECS          → compute (run the agent)
EFS          → storage (user's project files)
DynamoDB     → state (project metadata, conversation history)
Step Functions → orchestration (multi-step agent loop)
S3           → assets (uploaded files, generated images)
IAM          → isolation (separate each user's data)
Custom code  → crash recovery (retry on failure)

Seven services. Seven bills. Seven failure modes. And it took three months to get it working.

The worst part? The isolation was fake. Every user's files sat on the same EFS volume. Separated by directory paths and IAM policies. One misconfigured policy — one missing path prefix in a role — and User A can read User B's source code.

I talked to other founders building coding agents. Every single one had the same setup. ECS or Lambda or Cloud Run for compute. S3 or EFS for storage. Pinecone or Weaviate for search. Temporal or Step Functions for orchestration. The same duct tape. The same pain.

The real problem

These services were built before AI agents existed. They're general-purpose cloud primitives designed for web apps — stateless request handlers that read from a database and return JSON.

AI agents are different:

Web request:
  Request → read DB → compute → write DB → respond
  Duration: 100ms
  State: in the database

AI agent:
  Request → read files → search codebase → plan changes →
  edit file 1 → edit file 2 → run tests → fix failures →
  edit file 3 → run tests again → commit → push
  Duration: 5-30 minutes
  State: everywhere (files, memory, search index, conversation)

An agent needs to read files fast (not over the network from S3). It needs to search a codebase in milliseconds (not a round-trip to Pinecone). It needs to survive crashes mid-task (not restart from scratch after 20 minutes of work). And it needs to do all of this in isolation — each customer's code must be physically separated from every other customer's code.

No combination of existing services gives you this. You can get close by duct-taping them together. But "close" means slow, fragile, and insecure.

The cell

What if each customer got their own computer? Not a shared Lambda. Not a shared filesystem. Their own isolated compute environment with their code, their database, their search index, and their agent runtime — all on the same machine.

cell = Cell("acme-corp")

# Everything is local to this cell
await cell.shell("git clone https://github.com/acme/app /work")
await cell.search.index("/work/src")
results = await cell.search.query("auth middleware")
await cell.db.set("last_search", results)
await cell.shell("npm test")

That's OnCell. Each customer gets a cell — an isolated environment backed by NVMe storage. Compute, storage, database, vector search, and durable orchestration all share the same disk. No network hops. No integration code. The NVMe is the integration layer.

What makes it different

Local NVMe, not network storage. Your agent reads the customer's codebase at 7 GB/s from local SSD, not 500 MB/s from EFS or 100 MB/s from S3. grep across a 10 GB monorepo in milliseconds, not seconds.

Physical isolation, not IAM policies. Each cell runs in a gVisor sandbox — a userspace kernel that intercepts every system call. Cell A physically cannot access Cell B's filesystem, memory, or network. Not because of a policy. Because of the kernel.

Durable execution, not retry-from-scratch. Every await is a checkpoint. If the machine crashes after your agent has edited 15 files and passed 8 tests, it resumes from test #9. Not from git clone. The LLM tokens for steps 1-8 are not re-spent.

Co-located, not distributed. The agent, the files, the database, and the search index are all on the same NVMe. No serialization between services. No API calls to stitch them together. They share a filesystem.

The economics

Cells auto-pause when idle. A paused cell keeps its NVMe state (repo, index, database) but uses zero CPU or RAM. Cost: $0.001/hr. When the customer comes back, the cell wakes in ~200ms — everything is still there.

The architecture borrows from Google Borg's resource model: overcommit compute on each host, reclaim from idle cells, bin-pack active workloads. Most cells are paused most of the time. A single host runs 50 cells but only 5-8 are active at any moment. This gives us 90% gross margin while charging developers less than they'd pay stitching services together.

What developers do

from oncell import Cell, Step

cell = Cell("acme-corp")

# Clone and index (runs once at setup)
await cell.shell("git clone https://github.com/acme/app /work")
await cell.search.index("/work/src", glob="**/*.ts")

# Run a coding task with durable orchestration
orch = cell.orchestrator("task")
async for event in orch.stream([
    Step("search", lambda: cell.search.query("auth middleware")),
    Step("plan",   lambda ctx: llm("Plan changes", context=ctx["search"])),
    Step("edit",   lambda ctx: cell.shell(ctx["plan"].command)),
    Step("test",   lambda: cell.shell("npm test")),
    Step("push",   lambda: cell.shell("git push")),
]):
    print(event)
    # {"step": "search", "status": "done", "progress": 0.2}
    # {"step": "plan",   "status": "done", "progress": 0.4}
    # ...

Six primitives: cell.shell, cell.store, cell.db, cell.search, cell.journal, cell.orchestrator. Each works standalone. Together, they share the same NVMe. No glue code. No duct tape.

What's next

We're starting with coding agents — the use case where isolation matters most (customer source code) and local performance matters most (grepping monorepos, running test suites). But the cell is a general primitive. Any AI agent that needs per-customer isolation with co-located data can run on OnCell.

If you're building a coding agent and you're tired of the duct tape, we'd love to talk.

Get early access

We're onboarding developers building coding agents.

oncell.ai