After uploading a file or adding a URL, your content goes through a multi-stage processing pipeline. This page explains each status and what it means.
| Stage | Status | Description |
|---|---|---|
| 1 | uploading | File is being uploaded to S3 storage. For URLs, the page is being crawled. |
| 2 | queued | Upload complete. A Celery background task has been created and is waiting for a worker. |
| 3 | partitioning | The Unstructured library is parsing the document to extract text, tables, and images. |
| 4 | chunking | Extracted content is being split into semantically coherent chunks (max 3,000 characters each). |
| 5 | summarising | Chunks containing tables or images are being summarised by GPT-4o for better search results. |
| 6 | vectorization | Chunks are being converted to 1,536-dimensional embedding vectors for similarity search. |
| 7 | completed | All processing is done. The document is now fully searchable. |
| Status | Description |
|---|---|
failed | An error occurred during processing. The error message is stored and can be viewed by clicking the document. |
The frontend polls the document status every 2 seconds while processing is active. This means you'll see the status badge update automatically — no page refresh needed.
During the summarisation stage, the UI shows progress like “Processing chunk 3 of 12…” for granular visibility.
If the embedding step fails (e.g., due to OpenAI rate limits), the pipeline retries with exponential backoff:
failed