Knowledge Base

The Knowledge Base is where you build the foundation for your AI assistant. It consists of all the documents and web pages you add to a project, which are then processed and made searchable.

Document Sources

You can add content to your knowledge base in two ways:

File Upload

Upload files directly from your computer using drag-and-drop or the file picker. Supported formats:

PDF — including scanned documents (extracted via OCR)
DOCX / DOC — Microsoft Word documents
HTML / HTM — static web pages

Files are uploaded to AWS S3 via a presigned URL, then processed asynchronously by the ingestion pipeline.

Web URLs

Paste any public URL and Opentrace will crawl the page using ScrapingBee, extracting all visible content. This is useful for adding blog posts, documentation pages, or any publicly accessible web content.

How Content Becomes Searchable

After adding a document or URL, it goes through the ingestion pipeline:

Partitioning — extracting text, tables, and images from the raw document
Chunking — splitting content into manageable pieces (max 3,000 characters)
Summarising — generating AI summaries for chunks containing tables or images
Vectorization — creating 1,536-dimensional vector embeddings for semantic search

Once complete, the document's chunks are stored in PostgreSQL with pgvector and are immediately searchable via the chat interface.

Managing Your Knowledge Base

The Knowledge Base has two tabs:

Documents Tab — view all uploaded files and URLs, their processing status, and click to inspect individual chunks
Settings Tab — configure RAG strategy, embedding model, search parameters, and reranking options

Real-Time Processing Status

After uploading, each document displays a live status indicator that polls every 2 seconds:

uploading → queued → partitioning → chunking → summarising → vectorization → completed

You can click on any document to see detailed information about each processing stage.

Was this page helpful?

PreviousProjects NextIngestion Pipeline