RAG Strategies

Opentrace supports four retrieval strategies that determine how the system finds relevant document chunks when you ask a question. Each strategy offers different trade-offs between speed, accuracy, and comprehensiveness.

Basic (Vector Search)

The simplest and fastest strategy. Your question is converted into an embedding vector and compared against all document chunk embeddings using cosine similarity.

Pros	Cons
Fast — single search pass	Misses keyword-exact matches
Great for semantic/conceptual queries	Less effective for precise terminology

Hybrid

Combines vector similarity search with full-text keyword search (PostgreSQL tsvector), then merges results using Reciprocal Rank Fusion (RRF).

Pros	Cons
Catches both semantic and exact keyword matches	Slightly slower (two searches + fusion)
Configurable vector/keyword weights	More parameters to tune

Default weights: vector_weight: 0.7, keyword_weight: 0.3. Adjust these in the project's RAG settings.

Multi-Query Vector

The LLM generates N variations of your original query, performs a vector search for each variation, then fuses all results using RRF. This casts a wider semantic net.

Pros	Cons
Catches different phrasings and angles	Slower — N searches + LLM call
Excellent for complex or ambiguous questions	Higher API cost (more embeddings)

Multi-Query Hybrid

The most comprehensive strategy. Generates N query variations and runs hybrid search (vector + keyword) for each, then fuses everything with RRF.

Pros	Cons
Maximum recall — best for important questions	Slowest and most expensive
Combines all search modalities	May return redundant results

Tip

Which strategy should I use? Start with Basic for speed. Switch to Hybrid if you need exact keyword matching. Use Multi-Query variants for complex research questions where thoroughness matters more than speed.

Reciprocal Rank Fusion (RRF)

When combining results from multiple search methods, Opentrace uses RRF — a ranking algorithm that merges multiple ranked lists into a single list. Each result's score is calculated as:

score = Σ (weight / (k + rank))

Where k is a constant (typically 60) that prevents top-ranked results from dominating. This produces a balanced ranking that respects both vector similarity and keyword relevance.

Final Context Selection

After retrieval, the top chunks (limited by final_context_size) are selected and structured into three categories:

Texts — plain text passages
Tables — HTML table content
Images — base64 image data

These are then passed to the LLM along with your question and citation metadata.

Was this page helpful?

PreviousIngestion Pipeline NextAgent Types