Vectorless RAG: Hierarchical Tree Retrieval Without Embeddings

James Phoenix

Instead of embedding documents and doing similarity search, build a tree from the document and let an LLM navigate it level by level, like scanning a table of contents.

Source: @TheVixhal | Original post | Date: March 26, 2026

The Idea

Traditional RAG embeds document chunks into vectors and retrieves by similarity. Vectorless RAG skips embeddings entirely. Instead, it parses a document into a hierarchical tree (sections, subsections, leaves), summarizes each node bottom-up, then at query time an LLM navigates the tree top-down to find the right leaf.

This mirrors how humans search: open the table of contents, find the right chapter, drill into the section, read the relevant paragraph.

How It Works

Index Time (runs once)

Parse into tree. Send document to LLM, ask it to split into top-level sections. For any section over ~300 words, split again into subsections. Short sections become leaves. Long sections become inner nodes with children.
Summarize bottom-up. Post-order traversal. Leaves get summaries of their raw text. Inner nodes get summaries built from their children’s summaries. Root ends up with a summary of the whole document.
Serialize to JSON. Save the tree. Build once, reuse forever.

Query Time (runs per question)

Navigate top-down. Start at root. Show the LLM the summaries of all children, ask “which branch most likely contains the answer?” Move to that child. Repeat until hitting a leaf.
Generate answer. Pass the leaf’s raw text as context to the LLM with the original question.

Tree Structure

root (summary of entire document)
  |-- Section A (leaf, short content)
  |-- Section B (inner node, long content split further)
  |     |-- Subsection B.1 (leaf)
  |     |-- Subsection B.2 (leaf)
  |-- Section C (leaf, short content)

Each node stores: title, raw content (leaves only), LLM-generated summary, depth, children.

When to Use This vs Vector RAG vs GraphRAG

Approach	Retrieval Method	Best For
Vector RAG	Embedding similarity search	Large corpora, fuzzy matching, multi-document retrieval
GraphRAG	Graph traversal + embeddings	Relational domains, dependency chains, coverage questions
Vectorless/Tree RAG	LLM-guided tree navigation	Single structured documents, hierarchical content, low-infra setups

Vectorless RAG works well when:

The document has natural hierarchical structure (manuals, policies, textbooks)
You want zero infrastructure beyond an LLM API (no vector DB, no embedding pipeline)
The corpus is small enough that tree indexing with LLM calls is feasible
You need deterministic, explainable retrieval paths (you can trace exactly which branches were chosen)

Vectorless RAG struggles when:

The corpus spans many documents without clear hierarchy
Questions need synthesis across multiple unrelated sections
The LLM picks wrong branches due to vague summaries (cascading errors)
Documents are very large, making the tree deep and navigation expensive in LLM calls

Key Tradeoffs

Strengths. No embedding model needed. No vector DB to maintain. Retrieval path is fully explainable (you can log which branches were chosen at each level). Index is a plain JSON file.

Weaknesses. Every navigation step is an LLM call, so query latency scales with tree depth. Wrong branch selection at any level means wrong answer (no fallback). Summaries must be high quality or the whole system degrades. Not designed for cross-section synthesis.

Implementation Notes

The reference implementation uses ~200 lines of Python. Key design decisions:

Subsection threshold: 300 words. Below this, sections stay as leaves. Above, they get split further.
Summarization model: Can use a cheaper/faster model (e.g. gpt-4o-mini) since summaries are short.
Navigation model: Also works with a smaller model since the task is just “pick a number.”
Index format: Plain JSON. No database required.

Connection to Compound Engineering

This pattern is interesting as a zero-infrastructure retrieval layer. For internal tools or single-document Q&A (company handbook, product spec, legal policy), it eliminates the vector DB dependency entirely. The tradeoff is more LLM calls at query time, but for infrequent queries on well-structured documents, that can be worth it.

Udemy Bestseller

Learn Prompt Engineering

My O'Reilly book adapted for hands-on learning. Build production-ready prompts with practical exercises.

★ 4.5/5 rating

306,000+ learners

View Course

It also demonstrates a broader principle: sometimes the simplest retrieval method is just structured navigation with good summaries, not learned representations.

GraphRAG for Production Engineer Agents – Graph-structured alternative for relational domains
Progressive Disclosure of Context – Same idea of loading only what’s relevant, level by level
Hierarchical Context Patterns – Organizing context in layers
Information Theory for Coding Agents – Signal efficiency in retrieval