RAGOpen SourceFreeActiveLocal hardware· intermediate · ~30 min setup
Vectorless RAG with PageIndex
Build high-accuracy RAG without embeddings, chunking, or a vector DB.
Run this workflow
See exactly what it produces before you build it.
Intended Use
Single-document or small-corpus QA where you want strong accuracy without the ops complexity of embeddings + a vector DB.
Not for
- Corpora over ~10K documents (no index sharding)
- Sub-second retrieval requirements
The Stack
Tested Against
pageindex@0.4ollama@0.5deepseek-v4Side effects & data flow
- Network
- none, local only
- Writes
- ./pageindex_cache/
- Credentials
- none required
Steps
- 1
Install PageIndex
Install the PageIndex library.
pip install pageindex - 2
Build the index
PageIndex parses your PDF into a tree.
from pageindex import PageIndex idx = PageIndex.from_pdf('./corpus.pdf') idx.save('./pageindex_cache/') - 3
Query
Ask a question via the index.
from pageindex import PageIndex idx = PageIndex.load('./pageindex_cache/') print(idx.query('What was Q3 revenue?', model='ollama/deepseek-v4'))
Eval, 1 fixture
Last passed: verified 10d agofinancebench-q3rubrictimeout 120s · max $0.1Judge: claude-sonnet-4-5 Rubric: PASS if (1) the answer contains a specific dollar figure, (2) it cites a page or section, and (3) the figure matches the ground truth ±0.1B.
Results
98.7% on FinanceBench in the original benchmark.