Chat with 1,000-page documents — without losing context — PDFPilot4U Blog

ChatGPT Image Jun 25, 2026, 02 43 06 PM ## The naive approach

Throw the whole PDF into the context window. GPT-4 gives you 128k tokens, which is roughly 300 pages. Beyond that, you truncate and pray.

That approach fails at 500+ pages. It also wastes money — you're paying for tokens that rarely contribute to the answer.

What we do instead

1. Extract + chunk at ingest time

When you upload a PDF, we:

Parse every page with PyMuPDF
Split into sentence-aware chunks of ~512 tokens
Respect page boundaries so citations stay accurate
Add 25% overlap between chunks for continuity

For a 1,000-page document, that's ~1,500-2,500 chunks.

2. Embed + store in pgvector

Each chunk gets a 1,536-dim OpenAI embedding. We store them in Postgres with a HNSW index — vector similarity search runs in 2-5ms.

3. Hierarchical retrieval at query time

When you ask a question:

Expand your query into 3-5 sub-queries using an LLM (catches paraphrase)
Retrieve top-10 chunks for each sub-query (30-50 chunks)
Deduplicate + re-rank by relevance score
Return top-10 final chunks with page numbers

This beats naive top-k by ~15% recall in our eval.

4. Generate with Claude Sonnet

The final answer is generated by Claude with the retrieved context. Our prompt is explicit about citations:

Answer based ONLY on the provided excerpts. Cite every claim as [Document Name, Page X]. If the answer isn't in the context, say so.

5. Verify citations before rendering

A Python regex extracts [Doc, Page X] markers from Claude's output. For each, we check that the referenced context actually exists in what we retrieved. Hallucinated citations are dropped, not rendered.

Persistent memory

Your chat remembers the last 20 messages, capped so context stays fresh. Files attached to the conversation persist across turns — if you uploaded at message 1, the agent can reference "the contract from earlier" at message 15.

The result

1,000-page contracts: ~8 seconds to first token, 98% citation accuracy
Technical books: works end-to-end if the table of contents is extractable
Scanned-only PDFs: runs OCR first, then the same pipeline

Try it at /chat. Drop a big doc and see.

What we do instead

1. Extract + chunk at ingest time

When you upload a PDF, we:

Parse every page with PyMuPDF

Split into sentence-aware chunks of ~512 tokens

Respect page boundaries so citations stay accurate

Add 25% overlap between chunks for continuity

For a 1,000-page document, that's ~1,500-2,500 chunks.

2. Embed + store in pgvector

Each chunk gets a 1,536-dim OpenAI embedding. We store them in Postgres with a HNSW index — vector similarity search runs in 2-5ms.

3. Hierarchical retrieval at query time

When you ask a question:

Expand your query into 3-5 sub-queries using an LLM (catches paraphrase)

Retrieve top-10 chunks for each sub-query (30-50 chunks)

Deduplicate + re-rank by relevance score

Return top-10 final chunks with page numbers

This beats naive top-k by ~15% recall in our eval.

4. Generate with Claude Sonnet

The final answer is generated by Claude with the retrieved context. Our prompt is explicit about citations:

Answer based ONLY on the provided excerpts. Cite every claim as [Document Name, Page X]. If the answer isn't in the context, say so.

5. Verify citations before rendering

PDFPilot4U

Chat with 1,000-page documents — without losing context

What we do instead

1. Extract + chunk at ingest time

2. Embed + store in pgvector

3. Hierarchical retrieval at query time

4. Generate with Claude Sonnet

5. Verify citations before rendering

Persistent memory

The result

Get the next one in your inbox.

How Can an Online Word Counter Help You Write Better Content?

What Is AI Data Extraction from PDFs? A Complete Guide

Ask PDF: Chat With Your PDFs and Get Instant Answers

Chat with 1,000-page documents — without losing context

What we do instead

1. Extract + chunk at ingest time

2. Embed + store in pgvector

3. Hierarchical retrieval at query time

4. Generate with Claude Sonnet

5. Verify citations before rendering

Persistent memory

The result

Get the next one in your inbox.

How Can an Online Word Counter Help You Write Better Content?

What Is AI Data Extraction from PDFs? A Complete Guide

Ask PDF: Chat With Your PDFs and Get Instant Answers