RAGOpen SourceFreeActiveLocal hardware· intermediate · ~30 min setup

Vectorless RAG with PageIndex

Build high-accuracy RAG without embeddings, chunking, or a vector DB.

by Shilpa Mitra· verified 10d ago· v1.0.0

Run this workflow

See exactly what it produces before you build it.

Intended Use

Single-document or small-corpus QA where you want strong accuracy without the ops complexity of embeddings + a vector DB.

Not for

  • Corpora over ~10K documents (no index sharding)
  • Sub-second retrieval requirements

The Stack

Tested Against

pageindex@0.4ollama@0.5deepseek-v4

Side effects & data flow

Network
none, local only
Writes
./pageindex_cache/
Credentials
none required

Steps

  1. 1

    Install PageIndex

    Install the PageIndex library.

    pip install pageindex
  2. 2

    Build the index

    PageIndex parses your PDF into a tree.

    from pageindex import PageIndex
    idx = PageIndex.from_pdf('./corpus.pdf')
    idx.save('./pageindex_cache/')
  3. 3

    Query

    Ask a question via the index.

    from pageindex import PageIndex
    idx = PageIndex.load('./pageindex_cache/')
    print(idx.query('What was Q3 revenue?', model='ollama/deepseek-v4'))

Eval, 1 fixture

Last passed: verified 10d ago
  • financebench-q3rubrictimeout 120s · max $0.1

    Judge: claude-sonnet-4-5 Rubric: PASS if (1) the answer contains a specific dollar figure, (2) it cites a page or section, and (3) the figure matches the ground truth ±0.1B.

Results

98.7% on FinanceBench in the original benchmark.