Instead of embedding documents and doing similarity search, build a tree from the document and let an LLM navigate it level by level, like scanning a table of contents.
Source: @TheVixhal | Original post | Date: March 26, 2026
The Idea
Traditional RAG embeds document chunks into vectors and retrieves by similarity. Vectorless RAG skips embeddings entirely. Instead, it parses a document into a hierarchical tree (sections, subsections, leaves), summarizes each node bottom-up, then at query time an LLM navigates the tree top-down to find the right leaf.
This mirrors how humans search: open the table of contents, find the right chapter, drill into the section, read the relevant paragraph.
How It Works
Index Time (runs once)
- Parse into tree. Send document to LLM, ask it to split into top-level sections. For any section over ~300 words, split again into subsections. Short sections become leaves. Long sections become inner nodes with children.
- Summarize bottom-up. Post-order traversal. Leaves get summaries of their raw text. Inner nodes get summaries built from their children’s summaries. Root ends up with a summary of the whole document.
- Serialize to JSON. Save the tree. Build once, reuse forever.
Query Time (runs per question)
- Navigate top-down. Start at root. Show the LLM the summaries of all children, ask “which branch most likely contains the answer?” Move to that child. Repeat until hitting a leaf.
- Generate answer. Pass the leaf’s raw text as context to the LLM with the original question.
Tree Structure
root (summary of entire document)
|-- Section A (leaf, short content)
|-- Section B (inner node, long content split further)
| |-- Subsection B.1 (leaf)
| |-- Subsection B.2 (leaf)
|-- Section C (leaf, short content)
Each node stores: title, raw content (leaves only), LLM-generated summary, depth, children.
When to Use This vs Vector RAG vs GraphRAG
| Approach | Retrieval Method | Best For |
|---|---|---|
| Vector RAG | Embedding similarity search | Large corpora, fuzzy matching, multi-document retrieval |
| GraphRAG | Graph traversal + embeddings | Relational domains, dependency chains, coverage questions |
| Vectorless/Tree RAG | LLM-guided tree navigation | Single structured documents, hierarchical content, low-infra setups |
Vectorless RAG works well when:
- The document has natural hierarchical structure (manuals, policies, textbooks)
- You want zero infrastructure beyond an LLM API (no vector DB, no embedding pipeline)
- The corpus is small enough that tree indexing with LLM calls is feasible
- You need deterministic, explainable retrieval paths (you can trace exactly which branches were chosen)
Vectorless RAG struggles when:
- The corpus spans many documents without clear hierarchy
- Questions need synthesis across multiple unrelated sections
- The LLM picks wrong branches due to vague summaries (cascading errors)
- Documents are very large, making the tree deep and navigation expensive in LLM calls
Key Tradeoffs
Strengths. No embedding model needed. No vector DB to maintain. Retrieval path is fully explainable (you can log which branches were chosen at each level). Index is a plain JSON file.
Weaknesses. Every navigation step is an LLM call, so query latency scales with tree depth. Wrong branch selection at any level means wrong answer (no fallback). Summaries must be high quality or the whole system degrades. Not designed for cross-section synthesis.
Implementation Notes
The reference implementation uses ~200 lines of Python. Key design decisions:
- Subsection threshold: 300 words. Below this, sections stay as leaves. Above, they get split further.
- Summarization model: Can use a cheaper/faster model (e.g. gpt-4o-mini) since summaries are short.
- Navigation model: Also works with a smaller model since the task is just “pick a number.”
- Index format: Plain JSON. No database required.
Connection to Compound Engineering
This pattern is interesting as a zero-infrastructure retrieval layer. For internal tools or single-document Q&A (company handbook, product spec, legal policy), it eliminates the vector DB dependency entirely. The tradeoff is more LLM calls at query time, but for infrequent queries on well-structured documents, that can be worth it.
It also demonstrates a broader principle: sometimes the simplest retrieval method is just structured navigation with good summaries, not learned representations.
Related
- GraphRAG for Production Engineer Agents – Graph-structured alternative for relational domains
- Progressive Disclosure of Context – Same idea of loading only what’s relevant, level by level
- Hierarchical Context Patterns – Organizing context in layers
- Information Theory for Coding Agents – Signal efficiency in retrieval
