--- layout: 'page' uri: '/configuration/semantic-search' position: 2 slug: 'configuration-semantic-search' parent: 'configuration' navTitle: 'Semantic search' title: 'Semantic search configuration' description: 'OpenAI embedding variables that power the vectorize command and semantic search — API key, base URL, embedding model, and chunk size.' --- # Semantic search configuration Documan can embed your documentation so it can be searched by meaning, not just keywords. These variables configure the OpenAI embeddings used by the [`vectorize`](/getting-started/commands) command and by semantic search at runtime — both the [MCP server](/ai-integration/mcp-setup) and the [AI chat](/configuration/ai-chat) assistant rely on them. **Important:** Any change to environment variables requires a server restart to take effect. `DOCUMAN_OPENAI_API_KEY` is required for semantic search; the rest have sensible defaults. ## DOCUMAN_OPENAI_API_KEY **Required for:** Semantic search (vectorize + MCP search) **Default:** None Your OpenAI API key. Used for two purposes: 1. **Vectorize command** — generates embeddings for all documents (one-time or after changes) 2. **MCP search at runtime** — converts each search query into an embedding to find relevant documents The key must be available both during `vectorize` and at runtime when serving MCP search requests. **Example:** ```bash DOCUMAN_OPENAI_API_KEY=sk-... ``` ## DOCUMAN_OPENAI_BASE_URL **Required for:** Custom or self-hosted embeddings endpoint **Default:** `https://api.openai.com/v1` Base URL for the embeddings API. Override it to route embedding requests through a proxy, Azure OpenAI, or a self-hosted OpenAI-compatible embeddings server. The endpoint must implement the OpenAI `/embeddings` API. **Example:** ```bash DOCUMAN_OPENAI_BASE_URL=https://your-proxy.example/openai/v1 ``` ## DOCUMAN_OPENAI_EMBEDDING_MODEL **Required for:** Vectorize command **Default:** `text-embedding-3-small` The OpenAI embedding model to use for vectorization. Available options: | Model | Dimensions | Notes | |--------------------------|------------|-------------------------------------| | `text-embedding-3-small` | 1536 | Recommended, best price/performance | | `text-embedding-3-large` | 3072 | Highest quality | | `text-embedding-ada-002` | 1536 | Legacy model | The `text-embedding-3-small` model is extremely cost-effective — $1 covers approximately 62,500 pages of text. See [OpenAI Embeddings pricing](https://platform.openai.com/docs/guides/embeddings#embedding-models) for current rates. **Example:** ```bash DOCUMAN_OPENAI_EMBEDDING_MODEL=text-embedding-3-large ``` **Note:** Changing the embedding model requires re-running `./documan vectorize` to regenerate all embeddings. ## DOCUMAN_CHUNK_MAX_LEN **Required for:** Vectorize command **Default:** `250` (minimum: 100) Maximum length (in characters) of each text chunk during vectorization. Chunks are semantic units used for vector search. - **Smaller chunks** = More precise search results, finer granularity - **Larger chunks** = More context per result, but less precise matching **Example:** ```bash DOCUMAN_CHUNK_MAX_LEN=500 ``` ## Related configuration - [General](/configuration/general) — project name, paths, port, AI discovery surfaces, and license key - [AI chat](/configuration/ai-chat) — the Anthropic-powered chat assistant --- [← General](/configuration/general.md) | [AI chat →](/configuration/ai-chat.md)