---
layout: 'page'
uri: '/configuration/ai-chat'
position: 3
slug: 'configuration-ai-chat'
parent: 'configuration'
navTitle: 'AI chat'
title: 'AI chat configuration'
description: 'Configure the AI chat assistant — a grounded, multi-agent chat with your documentation that answers with verified citations. Full reference for all DOCUMAN_CHAT_* environment variables.'
---

# AI chat configuration

Documan ships an optional **AI chat assistant** — a "chat with your documentation"
experience that answers visitor questions grounded in your pages, with verified
citations back to the source. A lead agent works over the same documentation tools
your [MCP server](/ai-integration/mcp-setup) exposes, and can escalate a broad,
multi-part question to several research sub-agents running in parallel. For a feature
overview, see [AI chat](/ai-integration/ai-chat).

The assistant is **off by default** and uses its **own Anthropic API key**,
completely independent of the OpenAI key used for [semantic search](/configuration/semantic-search).
All settings are environment variables, and any change requires a server restart to
take effect.


## Enabling the assistant

Three variables are **required** to turn the chat on. Until all three are set, the
chat endpoints stay disabled and the UI shows nothing.


### DOCUMAN_CHAT_ENABLED

**Required.** Master switch. Set to `true` to enable the chat assistant.

```bash
DOCUMAN_CHAT_ENABLED=true
```


### DOCUMAN_CHAT_ANTHROPIC_API_KEY

**Required.** A dedicated Anthropic API key for the chat. This is **separate** from
`DOCUMAN_OPENAI_API_KEY` (used only for embeddings and search) — the chat never
shares it.

```bash
DOCUMAN_CHAT_ANTHROPIC_API_KEY=sk-ant-...
```


### DOCUMAN_CHAT_DEFAULT_MODEL

**Required.** The fallback Anthropic model used for any agent role not overridden
in `DOCUMAN_CHAT_AGENTS_JSON`.

```bash
DOCUMAN_CHAT_DEFAULT_MODEL=claude-sonnet-4-6
```


## Models and reasoning

### DOCUMAN_CHAT_DEFAULT_EFFORT

**Default:** `high`

Fallback reasoning effort, one of `off`, `low`, `medium`, `high`, `xhigh`, `max`.
Higher effort means deeper reasoning at higher cost and latency.


### DOCUMAN_CHAT_AGENTS_JSON

**Default:** none (every role uses the defaults above)

Per-role model configuration as a JSON map of `role -> {model, effort, maxTokens,
maxIterations}`. The roles are `lead` (the orchestrator) and `research` (the
sub-agents); a `default` entry applies to any role you omit.

A common setup runs a strong lead and a cheaper research tier:

```bash
DOCUMAN_CHAT_AGENTS_JSON={"lead":{"model":"claude-opus-4-8","effort":"high","maxTokens":32000,"maxIterations":8},"research":{"model":"claude-haiku-4-5","effort":"low","maxTokens":8000,"maxIterations":4}}
```

Research fan-out costs roughly 10–15× the tokens of a single answer, so a cheaper
research model is recommended.


## Endpoints

By default the chat talks to the public Anthropic API. You can point it at a proxy,
gateway, or a self-hosted Anthropic-compatible endpoint.


### DOCUMAN_CHAT_ANTHROPIC_BASE_URL

**Default:** `https://api.anthropic.com/v1`

Base URL for the Anthropic Messages API. Override it to route chat traffic through
a proxy or a compatible gateway.

```bash
DOCUMAN_CHAT_ANTHROPIC_BASE_URL=https://your-gateway.example/anthropic/v1
```

The embeddings endpoint used for semantic search has its own override,
`DOCUMAN_OPENAI_BASE_URL` — see [Semantic search](/configuration/semantic-search).


### DOCUMAN_CHAT_ANTHROPIC_VERSION

**Default:** `2023-06-01`

The `anthropic-version` header. Override only if a newer API version is required.


### DOCUMAN_CHAT_ANTHROPIC_BETA

**Default:** none

Optional `anthropic-beta` header value(s) for enabling Anthropic beta features.


## System prompt

### DOCUMAN_CHAT_SYSTEM_PROMPT_PATH

**Default:** none (a built-in prompt is used)

Path to an optional Markdown file with your own system prompt — set the assistant's
persona, tone, and scope (for example, "only answer questions about Product X;
politely decline anything else"). Your prompt is **prepended** to a
server-controlled instruction block that enforces grounding and citations, so a
custom prompt can never weaken those guarantees.

```bash
DOCUMAN_CHAT_SYSTEM_PROMPT_PATH=/etc/documan/chat-prompt.md
```


## Limits and budgets

These cap cost, concurrency, and abuse on a public endpoint. The defaults are safe
for a low-traffic docs site; raise or lower them to fit your traffic and budget.

| Variable | Default | Purpose |
|---|---|---|
| `DOCUMAN_CHAT_MAX_SUBAGENTS` | `3` | Max research sub-agents per run (`0` disables fan-out — single-agent mode). |
| `DOCUMAN_CHAT_MAX_CONCURRENT` | `4` | Global cap on concurrent in-flight LLM calls. |
| `DOCUMAN_CHAT_RATE_LIMIT_PER_MIN` | `10` | Chat requests allowed per minute, per visitor IP. |
| `DOCUMAN_CHAT_RUN_TIMEOUT_SECONDS` | `90` | Hard wall-clock timeout for a single run. |
| `DOCUMAN_CHAT_TOKEN_BUDGET` | `120000` | Hard token budget for a single run. |
| `DOCUMAN_CHAT_DAILY_TOKEN_BUDGET` | `5000000` | Aggregate daily token budget across all runs — a kill-switch that temporarily disables chat once exceeded. |
| `DOCUMAN_CHAT_MAX_INPUT_CHARS` | `4000` | Maximum length of a single visitor question. |

**Example:**

```bash
DOCUMAN_CHAT_MAX_SUBAGENTS=3
DOCUMAN_CHAT_DAILY_TOKEN_BUDGET=2000000
```


## Security notes

- The chat uses a **separate Anthropic key** and never shares the OpenAI embeddings key.
- Answers are **grounded**: the assistant may only cite pages it actually read, and citation titles come from the server, never the model.
- Hidden and unpublished pages are **never** served to the assistant.
- On a public, anonymous endpoint the **daily token budget** is the backstop against runaway cost — keep it set.


## Related configuration

- [General](/configuration/general) — project name, paths, port, AI discovery surfaces, and license key
- [Semantic search](/configuration/semantic-search) — OpenAI embeddings for `vectorize` and semantic search

---

[← Semantic search](/configuration/semantic-search.md) | [🚢 Deployment →](/deployment.md)