RAGOpen SourceFreeActiveLocal hardware· intermediate · ~30 min setup

Vectorless RAG with PageIndex

Build high-accuracy RAG without embeddings, chunking, or a vector DB.

by Shilpa Mitra· verified 10d ago· v1.0.0

Run this workflow

See exactly what it produces before you build it.

Single-document or small-corpus QA where you want strong accuracy without the ops complexity of embeddings + a vector DB.

pageindex@0.4ollama@0.5deepseek-v4

PageIndex parses your PDF into a tree.

from pageindex import PageIndex
idx = PageIndex.from_pdf('./corpus.pdf')
idx.save('./pageindex_cache/')

Ask a question via the index.

from pageindex import PageIndex
idx = PageIndex.load('./pageindex_cache/')
print(idx.query('What was Q3 revenue?', model='ollama/deepseek-v4'))

Last passed: verified 10d ago

financebench-q3rubrictimeout 120s · max $0.1
Judge: claude-sonnet-4-5 Rubric: PASS if (1) the answer contains a specific dollar figure, (2) it cites a page or section, and (3) the figure matches the ground truth ±0.1B.

98.7% on FinanceBench in the original benchmark.