Knowledge & RAG·December 26, 2025·8 min read

RAG You Set Up by Dragging a PDF Into a Browser

Standing up a RAG pipeline shouldn't be a project. In Matrix you create a corpus, drag in files, and your agent can search them — no plumbing.

By Matrix Team

Ask anyone who's built one: standing up a rag pipeline is rarely the fun part. You glue a parser to a chunker, a chunker to an embedding API, an embedding API to a vector database, the vector database to a retriever, and the retriever to a prompt template. Then you re-glue half of it the first time someone uploads a PDF instead of a Markdown file. By the time retrieval works, you've built a small distributed system whose only job is "let the model read the docs."

Retrieval shouldn't be a project. In Matrix it's two HTTP calls and a drag-drop — or, if you prefer the dashboard, literally dragging a file into a browser. Here's exactly what happens between the drop and the answer.

The whole pipeline is two POSTs

A Knowledge corpus is a first-class entity, just like an Agent or a Tool. You create one, then you feed it files. That's the entire setup surface.

# 1. Create the corpus
curl -X POST https://your-host/api/orgs/{slug}/knowledge \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "properties": {
      "key": "product-handbook",
      "name": "Product Handbook",
      "kind": "FILES"
    }
  }'

# 2. Drop a file into it (multipart)
curl -X POST https://your-host/api/orgs/{slug}/knowledge/{id}/files \
  -H "Authorization: Bearer $JWT" \
  -F "file=@handbook.pdf"

That second call is the one that does all the work. There is no third call to wire up an index, no embeddings.create loop you maintain, no vector-store schema to migrate. The dashboard route is the same pipeline behind a drop zone: /admin/knowledge → New knowledge → drag a file into the drawer.

.md, .txt, .html, and .pdf all work. The corpus doesn't care which you send.

What the upload actually does

When that multipart POST lands, KnowledgeIngestService runs a fixed, boring, reliable sequence. Boring is the point — every file goes through the same path.

1. Parse

The service picks a parser by file type:

HTML → Jsoup, so you get the readable text, not the tag soup.
PDF → PDFBox.
Everything else → treated as UTF-8 text.

You don't declare the type. The service handles it. Drop a PDF and a Markdown file into the same corpus and they coexist.

2. Chunk

Parsed text is split into ~2,000-character chunks with 200 characters of overlap. The overlap matters: a fact that straddles a chunk boundary still lands intact in at least one chunk, so retrieval doesn't slice a sentence in half and lose the answer. (Both sizes are corpus properties — chunkSize and chunkOverlap — if you want to tune them.)

Chunk size is a real trade-off, and it's worth understanding rather than just accepting the default. Chunks too small and a single retrieved chunk lacks the surrounding context the model needs to reason; chunks too large and each one dilutes the relevant sentence with noise, so cosine similarity gets fuzzy and the right chunk doesn't rank first. The ~2,000-char default sits in the sweet spot for prose documents — handbooks, policies, specs — which is exactly what most corpora are. If your documents are dense and structured (tables, short clauses), tightening chunkSize and keeping the overlap can sharpen retrieval. But the honest advice is: ship the default first, look at what search_knowledge actually returns, and tune only if you can point at a miss.

3. Embed

Each chunk is embedded via text-embedding-005 into a 768-dimensional vector, through the same GeminiEmbeddingBackend that powers agent memory. One embedding model, one code path, for both knowledge and memory.

4. Store

Each chunk becomes a KnowledgeChunk row, and its embedding is written directly onto the underlying :Entity node as a native property.

This last step is the quiet trick. The embedding doesn't go into a separate vector database — it lands on the same Neo4j node as everything else, and it's indexed by the same entity_node_embedding HNSW index that powers memory recall. KnowledgeVectorStore just filters by entityType=KnowledgeChunk at query time.

The consequence: adding RAG added zero new infrastructure. No new index, no new migration, no second datastore to operate, back up, or keep in sync. If you've read Embed-on-Write, Recall-on-Read, this is the same HNSW index doing double duty. One vector index, two features.

As chunks land, the service stamps chunkCount (and entityCount, if GraphRAG is on) back onto the corpus so you can see ingestion progress.

Attach it, and the tool wires itself

Here's where the "no pipeline" claim earns its keep. You don't write a retriever. You don't add a search function to your agent. You attach the corpus.

In the agent drawer, open the Knowledge picker and select the corpus. That's it. The moment an agent has any Knowledge attached, AgentToolSurface auto-attaches a tool:

search_knowledge(knowledge_key, query, top_k)

The tool's description enumerates the corpora the agent can search, so the model knows what's available and which knowledge_key to pass. There's no glue between "I uploaded docs" and "the agent can search them" — composition handles it. (That auto-wiring is its own story: see Auto-Wired Retrieval.)

When the model calls it, retrieval is corpus-scoped and exact-cosine ranked: the query is embedded, compared against the chunks in that corpus, and the top matches come back as ranked chunks. Each result carries a sourceRef, so the agent can cite where a fact came from instead of asserting it from thin air.

Corpus-scoping matters more than it sounds. An agent with three corpora — a handbook, a policy doc, a product spec — searches the one the model named, not a global pool where the handbook's chunks crowd out the policy doc's. Retrieval stays sharp because it stays scoped. In a flat global index, the corpus with the most chunks tends to dominate the top-k for every query simply because it has more shots on goal; scoping by knowledge_key removes that bias entirely. The model decides which body of knowledge a question belongs to, then searches only there.

It also means the model can be deliberate about where it looks. A question about a refund clause is a policy-doc question; a question about a config flag is a product-spec question. Because the tool description enumerates the corpora, the model can pick the right one — and if it picks wrong, it can call search_knowledge again with a different knowledge_key rather than wading through one giant undifferentiated pile.

A note on where the work happens

One operational detail worth internalizing if you run Matrix on Cloud Run: file processing is transient. Parsing, chunking, and the temporary file handling happen in /tmp, which is per-container and ephemeral — it vanishes when the container recycles.

That's fine, because nothing durable lives there. The output of ingestion — the KnowledgeChunk rows and their embeddings — is written to Neo4j, which is the durable store. /tmp is scratch space for the parse step, not a cache you depend on. Upload, ingest, persist to the graph; the working files were never the product.

The rule of thumb across the platform: durable data lives in Neo4j, /tmp is scratch. Knowledge ingestion follows it.

Why this shape

Most teams treat retrieval as a separate subsystem because their tools force them to — a vector database over here, an embedding service over there, an orchestration layer holding them together. Matrix collapses that because of one design decision made early: everything is an entity on one graph, and the vector index is part of the graph.

So a Knowledge corpus is an entity. A chunk is an entity with an embedding. The agent's retriever is a tool that composes in automatically. There's no subsystem to stand up because there's no subsystem — it's the same primitives you already use for everything else, pointed at documents.

When retrieval isn't enough — when "find the relevant chunks" misses facts that only emerge from how entities relate across chunks — you flip one flag and ingestion also builds an entity/relation graph. That's GraphRAG in One Flag, and it rides this exact same upload path: same POST /knowledge/{id}/files, one extra extraction pass per chunk.

Takeaway

A working rag pipeline in Matrix is: POST /knowledge to create a corpus, POST /knowledge/{id}/files to drop in a PDF (or .md, .txt, .html), attach the corpus to an agent. Ingestion parses (Jsoup / PDFBox / UTF-8), chunks at ~2,000 chars with 200 overlap, embeds with text-embedding-005 (768d), and stores chunks plus embeddings on the same Neo4j HNSW index your memory already uses — no new infra, no migration. The agent auto-gets a corpus-scoped search_knowledge tool with cited results. You wrote no retriever, no chunker, no embedding loop.

Get retrieval running in five minutes

Spin up a workspace, create a corpus, and drag in the first PDF you have lying around. Then attach it to an agent and ask it a question only that document can answer.

Create a workspace and try it on the live platform.
Read docs/ARCHITECTURE.md — the Knowledge + built-in toolbox section — for the full ingestion and retrieval internals.
Then turn it up: GraphRAG in One Flag, and see how the auto-wired retrieval surface composes the search tool with everything else your agent does.

If you can drag a PDF into a browser, you can ship retrieval.

#rag pipeline#embeddings#pdf#retrieval

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Create a workspace Read more articles