272 lines
8.5 KiB
Markdown
272 lines
8.5 KiB
Markdown
# Redmine Semantic Index
|
|
|
|
Local semantic index service for a recent Redmine Helpdesk sample. V1 uses
|
|
OpenAI `text-embedding-3-small` embeddings and Qdrant vectors, with Redmine as
|
|
the first source adapter.
|
|
|
|
For deploy, validation, and rollback steps, see
|
|
`docs/semantic_index_deployment_runbook.md`.
|
|
|
|
## Configuration
|
|
|
|
Copy `.env.example` to `.env` and set local secrets there. Do not commit `.env`.
|
|
|
|
Required for live use:
|
|
|
|
- `OPENAI_API_KEY`
|
|
- `QDRANT_URL`
|
|
- `REDMINE_URL`
|
|
- `REDMINE_API_KEY`
|
|
|
|
Optional:
|
|
|
|
- `QDRANT_API_KEY`
|
|
- `QDRANT_COLLECTION`
|
|
- `REDMINE_PROJECT_IDENTIFIER`
|
|
- `REDMINE_SAMPLE_LIMIT`
|
|
- `SEMANTIC_INDEX_API_KEY`
|
|
|
|
## HTTP
|
|
|
|
Install runtime dependencies in your chosen environment:
|
|
|
|
```sh
|
|
pip install openai qdrant-client fastapi uvicorn
|
|
```
|
|
|
|
Run:
|
|
|
|
```sh
|
|
uvicorn semantic_index.app:app --host 127.0.0.1 --port 8787
|
|
```
|
|
|
|
Endpoints:
|
|
|
|
- `GET /health`
|
|
- `POST /sources/redmine/backfill-sample`
|
|
- `POST /search`
|
|
- `GET /documents/{id}`
|
|
- `GET /projects`
|
|
|
|
If `SEMANTIC_INDEX_API_KEY` is set, pass `Authorization: Bearer <key>`.
|
|
|
|
Search response shape is shared by HTTP, MCP, and the Python client:
|
|
|
|
```json
|
|
{
|
|
"query": "candidate follow up",
|
|
"filters": {"project_identifier": "hiring", "limit": 5},
|
|
"results": [
|
|
{
|
|
"id": "redmine:issue:123:chunk:0",
|
|
"score": 0.72,
|
|
"snippet": "Candidate follow up...",
|
|
"payload": {},
|
|
"citation": {
|
|
"source": "redmine",
|
|
"doc_type": "issue",
|
|
"issue_id": 123,
|
|
"project_identifier": "hiring",
|
|
"url": "http://redmine/issues/123"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
HTTP examples:
|
|
|
|
```sh
|
|
curl -sS -H "Authorization: Bearer $SEMANTIC_INDEX_API_KEY" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"query":"candidate follow up","project_identifier":"hiring","limit":5}' \
|
|
http://127.0.0.1:8787/search
|
|
|
|
curl -sS -H "Authorization: Bearer $SEMANTIC_INDEX_API_KEY" \
|
|
http://127.0.0.1:8787/projects
|
|
```
|
|
|
|
## Python Client
|
|
|
|
Use the client in-process when running from this repo/environment:
|
|
|
|
```python
|
|
from semantic_index.client import SemanticIndexClient
|
|
|
|
client = SemanticIndexClient.local()
|
|
results = client.search("callum@safetagtracking.com", project_identifier="customer-service", limit=5)
|
|
document = client.get_document(results["results"][0]["id"])
|
|
```
|
|
|
|
Use HTTP mode from another local program:
|
|
|
|
```python
|
|
from semantic_index.client import SemanticIndexClient
|
|
|
|
client = SemanticIndexClient(base_url="http://127.0.0.1:8787", api_key="...")
|
|
results = client.search("candidate follow up", project_identifier="hiring", limit=5)
|
|
```
|
|
|
|
## Backfill
|
|
|
|
Refresh the configured Redmine sample from the command line:
|
|
|
|
```sh
|
|
python3 -m semantic_index --backfill-redmine-sample --limit 50
|
|
```
|
|
|
|
When `REDMINE_PROJECT_IDENTIFIER` is set, the rebuild deletes and replaces only
|
|
indexed Redmine documents for that project. Without a project identifier, it
|
|
rebuilds the Redmine source sample for the collection.
|
|
|
|
Refresh a balanced multi-project sample:
|
|
|
|
```sh
|
|
python3 -m semantic_index --backfill-redmine-projects \
|
|
--projects customer-service,hiring,todo-jason,sales-inbox,business-development,dock-scheduling,prep-standardization \
|
|
--per-project-limit 100
|
|
```
|
|
|
|
Use project-specific limits when Customer Service should stay larger than the
|
|
internal project sample:
|
|
|
|
```sh
|
|
python3 -m semantic_index --backfill-redmine-projects \
|
|
--project-limits customer-service=500,hiring=200,todo-jason=200,sales-inbox=100,business-development=100,dock-scheduling=100,prep-standardization=100
|
|
```
|
|
|
|
Multi-project backfill rebuilds each project scope independently. Non-Helpdesk
|
|
projects are indexed as ordinary Redmine issues and journals; they are not
|
|
expected to have Helpdesk contact metadata.
|
|
|
|
## Rolling Refresh
|
|
|
|
Use rolling refresh for routine updates after an initial backfill:
|
|
|
|
```sh
|
|
python3 -m semantic_index --refresh-redmine-projects \
|
|
--project-limits customer-service=500,hiring=200,todo-jason=200,sales-inbox=100,business-development=100,dock-scheduling=100,prep-standardization=100 \
|
|
--dry-run
|
|
```
|
|
|
|
Dry-run reports what would change without calling OpenAI or writing to Qdrant.
|
|
Remove `--dry-run` to apply the refresh.
|
|
|
|
The refresh maps each recent Redmine issue to stable document IDs, reads the
|
|
existing Qdrant payloads for that issue, and compares `source_hash` values.
|
|
Only new or changed documents are embedded and upserted. Unchanged documents
|
|
are left alone, and stale documents for refreshed issues are deleted without
|
|
embedding. Use `--force-rebuild` only when you explicitly want to re-embed
|
|
matching documents.
|
|
|
|
The default local state file is `.cache/semantic_index/refresh_state.json`.
|
|
After a successful refresh, later runs skip issues older than the previous
|
|
success timestamp minus `--overlap-minutes` unless `--force-rebuild` is used.
|
|
Override it with:
|
|
|
|
```sh
|
|
python3 -m semantic_index --refresh-redmine-projects \
|
|
--project-limits customer-service=500 \
|
|
--state-path /tmp/semantic-refresh-state.json
|
|
```
|
|
|
|
The HTTP endpoint exposes the same behavior:
|
|
|
|
```sh
|
|
curl -sS -X POST http://127.0.0.1:8787/sources/redmine/refresh \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"project_limits":{"customer-service":500},"dry_run":true}'
|
|
```
|
|
|
|
For production-style operation, use the wrapper script. It defaults to dry-run
|
|
and writes timestamped logs under `.cache/semantic_index/logs`:
|
|
|
|
```sh
|
|
semantic_index/refresh.sh
|
|
semantic_index/refresh.sh --apply
|
|
```
|
|
|
|
For a quick smoke check of the wrapper path:
|
|
|
|
```sh
|
|
SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=5' semantic_index/refresh.sh
|
|
```
|
|
|
|
Override project limits, state path, or log location through environment
|
|
variables:
|
|
|
|
```sh
|
|
SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=500,hiring=200' \
|
|
SEMANTIC_INDEX_LOG_DIR=/var/log/semantic-index \
|
|
SEMANTIC_INDEX_STATE_PATH=/var/lib/semantic-index/refresh_state.json \
|
|
semantic_index/refresh.sh --apply
|
|
```
|
|
|
|
Do not schedule `--force-rebuild`. Force rebuilds should stay manual because
|
|
they intentionally re-embed unchanged documents.
|
|
|
|
## MCP Stdio
|
|
|
|
```sh
|
|
python3 -m semantic_index --mcp-stdio
|
|
```
|
|
|
|
Tools:
|
|
|
|
- `semantic_search`
|
|
- `semantic_get_document`
|
|
- `semantic_list_projects`
|
|
- `semantic_backfill_redmine_sample`
|
|
- `semantic_refresh_redmine`
|
|
|
|
For agent workflows, list projects first when the user has not named a project,
|
|
search broadly or with `project_identifier` when known, then call
|
|
`semantic_get_document` for any promising result. Treat returned citations and
|
|
Redmine URLs as the authoritative references. Backfill tools are operational and
|
|
should not be part of normal search behavior.
|
|
|
|
## Inspection CLI
|
|
|
|
Use the inspect commands before larger backfills to see what is already indexed
|
|
or preview what Redmine would produce without writing to Qdrant.
|
|
|
|
```sh
|
|
python3 -m semantic_index inspect count --source redmine --project customer-service
|
|
python3 -m semantic_index inspect list --limit 20 --source redmine --project customer-service
|
|
python3 -m semantic_index inspect search "order status" --limit 5 --project customer-service
|
|
python3 -m semantic_index inspect search "customer@example.com" --limit 5 --project customer-service
|
|
python3 -m semantic_index inspect show redmine:issue:39778:chunk:0
|
|
python3 -m semantic_index inspect preview-redmine --limit 10 --project customer-service
|
|
python3 -m semantic_index inspect audit --source redmine --project customer-service --limit 500
|
|
python3 -m semantic_index inspect compare-redmine --project customer-service --limit 20
|
|
python3 -m semantic_index inspect smoke-search --project customer-service
|
|
```
|
|
|
|
`count`, `list`, `show`, and `preview-redmine` do not call OpenAI.
|
|
`search` embeds the query text. List/search output shows snippets by default;
|
|
pass `--full-text` when you need the full indexed text.
|
|
`audit` summarizes indexed document coverage without calling OpenAI.
|
|
`compare-redmine` previews live Redmine chunks and compares them to indexed
|
|
Qdrant documents without writing to Qdrant. `smoke-search` runs known search
|
|
checks and calls OpenAI for query embeddings. Pass `--json` to `audit`,
|
|
`compare-redmine`, or `smoke-search` for machine-readable output.
|
|
For mixed project samples, run `audit` without `--project` to see project-level
|
|
counts and Helpdesk-contact coverage separately from ordinary internal issues.
|
|
|
|
For Helpdesk tickets, Redmine issue ingestion expects
|
|
`/issues/:id.json?include=journals,helpdesk` to return `helpdesk_ticket`
|
|
metadata with an expanded contact. See
|
|
`docs/redmine_issue_api_helpdesk_include.md` for the Redmine API patch notes.
|
|
|
|
## Qdrant
|
|
|
|
For local Docker-hosted Qdrant:
|
|
|
|
```sh
|
|
docker run -p 6333:6333 -p 6334:6334 -v qdrant_storage:/qdrant/storage qdrant/qdrant
|
|
```
|
|
|
|
Create snapshots with Qdrant's snapshot API or mounted storage tooling before
|
|
destructive maintenance. The default collection name is
|
|
`redmine_semantic_sample`.
|