April 4, 20266 min read

NVMe Is the Integration Layer

When compute, storage, database, and search share the same disk, integration becomes a non-problem.

Anup Singh

Founder, oncell.ai

The hardest part of building an AI agent isn't the agent logic. It's wiring the infrastructure together. The agent needs to read files, search code, write to a database, run tests, and call an LLM — and every one of these touches a different service over the network.

Traditional agent architecture:

Agent → network → S3 (read files)
Agent → network → Pinecone (search code)
Agent → network → DynamoDB (read/write state)
Agent → network → ECS task (run tests)
Agent → network → OpenAI (call LLM)

Five network hops per agent turn.
Each hop: ~10-50ms latency + serialization + failure modes.

We asked: what if all of these shared the same disk?

The insight

An NVMe SSD on an EC2 i4i instance reads at 7 GB/s with sub-millisecond latency. That's 100x faster than S3 and 14x faster than EBS. When you put the customer's files, the vector index, the database, and the execution journal on the same NVMe drive, something interesting happens: integration becomes a non-problem.

OnCell cell architecture:

NVMe SSD (1.875 TB, 7 GB/s)
├── /work/        customer's repo (files)
├── /index/       vector embeddings (SQLite)
├── /data/        key-value + SQL (SQLite)
└── /journal/     durable execution log (append-only)

Agent process reads/writes all of these as local files.
Zero network hops. Zero serialization. Zero API calls between services.

The agent runs git clone — files land on NVMe. The search engine indexes those same files from NVMe. The database stores metadata on the same NVMe. The journal logs every step to the same NVMe. Everything is a local file operation.

What this means in practice

Search is sub-10ms, not 50ms. The vector index is a SQLite file on the same disk as the code. Query embeddings, read chunks, return results — all local I/O. No network round-trip to Pinecone. No cold start on a managed service.

File operations are instant. grep across a 10 GB monorepo? Milliseconds on NVMe. Seconds on EFS. The difference between an agent that feels responsive and one that feels stuck.

The database and the files are consistent. When the agent writes a file and updates the database, both writes go to the same disk. No distributed transaction. No eventual consistency. No "the file is on S3 but the metadata in DynamoDB hasn't propagated yet."

The journal is free. Durable execution means writing every step result to a log. On a network database, that's a write per step — 10-50ms each. On local NVMe, it's a local append + fsync — microseconds. Durability costs nothing when storage is local.

The composability

Because all primitives share the same filesystem, they compose naturally:

cell = Cell("acme-corp")

# Shell writes files to NVMe
await cell.shell("git clone https://github.com/acme/app /work")

# Search indexes the same files from NVMe
await cell.search.index("/work/src", glob="**/*.ts")

# Search queries return paths on the same NVMe
results = await cell.search.query("auth middleware")
# → [{"path": "src/auth/middleware.ts", "content": "...", "score": 0.94}]

# DB stores references to those same files
await cell.db.set("last_search", results)

# Shell reads the same file search found
content = await cell.store.read("src/auth/middleware.ts")

# Orchestrator ties it all together — journal on the same NVMe
orch = cell.orchestrator("task")
result = await orch.run([
    Step("find",  lambda: cell.search.query("auth")),
    Step("read",  lambda ctx: cell.store.read(ctx["find"][0]["path"])),
    Step("test",  lambda: cell.shell("npm test")),
])

No glue code. No serialization between services. No "pass the S3 URL to the search service which passes the result ID to the database." They all read and write the same files on the same disk.

The speed hierarchy

Reading a 100 MB file:

NVMe (local, oncell)     14ms     ← agent reads this
EBS gp3 (network SSD)    200ms    (14x slower)
EFS (network filesystem)  1,000ms  (71x slower)
S3 (object storage)      500ms    (36x slower, plus TTFB)

Vector search (10K chunks):

SQLite on NVMe (oncell)  3ms      ← agent searches this
Pinecone (managed)       50ms     (17x slower)
Weaviate (self-hosted)   20ms     (7x slower)

Database write:

SQLite on NVMe (oncell)  0.1ms    ← agent writes this
DynamoDB                 10ms     (100x slower)
RDS Postgres             5ms      (50x slower)

These aren't theoretical numbers. They're the difference between an agent that takes 30 seconds per task and one that takes 3 seconds. Multiply by 100 tasks per day per customer, and the user experience gap is enormous.

Why nobody does this

The cloud trained us to separate compute from storage. S3 for files. RDS for databases. ElastiCache for cache. Each behind an API, each independently scalable. This is the right architecture for web apps — stateless compute, durable storage, scale them separately.

It's the wrong architecture for AI agents. Agents are stateful. They read and write constantly during a task. The data they touch (code, index, state) belongs to one customer and is only needed during that customer's session. Separating it across services adds latency, complexity, and failure modes for zero benefit.

Co-locating everything on NVMe only works if you accept one constraint: the data lives where the compute lives. If the host dies, the data on that NVMe is gone. This is why OnCell snapshots to S3 on every pause — the NVMe is the fast path, S3 is the safety net.

The primitive

OnCell exposes six primitives to developers. All six share the same NVMe:

cell.shell(cmd)          Run commands — reads/writes NVMe files
cell.store               Read/write files — directly on NVMe
cell.db                  Key-value + SQL — SQLite on NVMe
cell.search              Vector search — SQLite + embeddings on NVMe
cell.journal             Durable checkpoints — append-only log on NVMe
cell.orchestrator(name)  Multi-step workflows — journal on NVMe

The developer doesn't think about integration. They use the primitives they need. The NVMe does the rest.

Get early access

We're onboarding developers building coding agents.

oncell.ai