Opentrace supports four retrieval strategies that determine how the system finds relevant document chunks when you ask a question. Each strategy offers different trade-offs between speed, accuracy, and comprehensiveness.
The simplest and fastest strategy. Your question is converted into an embedding vector and compared against all document chunk embeddings using cosine similarity.
| Pros | Cons |
|---|---|
| Fast — single search pass | Misses keyword-exact matches |
| Great for semantic/conceptual queries | Less effective for precise terminology |
Combines vector similarity search with full-text keyword search (PostgreSQL tsvector), then merges results using Reciprocal Rank Fusion (RRF).
| Pros | Cons |
|---|---|
| Catches both semantic and exact keyword matches | Slightly slower (two searches + fusion) |
| Configurable vector/keyword weights | More parameters to tune |
Default weights: vector_weight: 0.7, keyword_weight: 0.3. Adjust these in the project's RAG settings.
The LLM generates N variations of your original query, performs a vector search for each variation, then fuses all results using RRF. This casts a wider semantic net.
| Pros | Cons |
|---|---|
| Catches different phrasings and angles | Slower — N searches + LLM call |
| Excellent for complex or ambiguous questions | Higher API cost (more embeddings) |
The most comprehensive strategy. Generates N query variations and runs hybrid search (vector + keyword) for each, then fuses everything with RRF.
| Pros | Cons |
|---|---|
| Maximum recall — best for important questions | Slowest and most expensive |
| Combines all search modalities | May return redundant results |
Which strategy should I use? Start with Basic for speed. Switch to Hybrid if you need exact keyword matching. Use Multi-Query variants for complex research questions where thoroughness matters more than speed.
When combining results from multiple search methods, Opentrace uses RRF — a ranking algorithm that merges multiple ranked lists into a single list. Each result's score is calculated as:
score = Σ (weight / (k + rank))
Where k is a constant (typically 60) that prevents top-ranked results from dominating. This produces a balanced ranking that respects both vector similarity and keyword relevance.
After retrieval, the top chunks (limited by final_context_size) are selected and structured into three categories:
These are then passed to the LLM along with your question and citation metadata.