8.5 KiB
Redmine Semantic Index
Local semantic index service for a recent Redmine Helpdesk sample. V1 uses
OpenAI text-embedding-3-small embeddings and Qdrant vectors, with Redmine as
the first source adapter.
For deploy, validation, and rollback steps, see
docs/semantic_index_deployment_runbook.md.
Configuration
Copy .env.example to .env and set local secrets there. Do not commit .env.
Required for live use:
OPENAI_API_KEYQDRANT_URLREDMINE_URLREDMINE_API_KEY
Optional:
QDRANT_API_KEYQDRANT_COLLECTIONREDMINE_PROJECT_IDENTIFIERREDMINE_SAMPLE_LIMITSEMANTIC_INDEX_API_KEY
HTTP
Install runtime dependencies in your chosen environment:
pip install openai qdrant-client fastapi uvicorn
Run:
uvicorn semantic_index.app:app --host 127.0.0.1 --port 8787
Endpoints:
GET /healthPOST /sources/redmine/backfill-samplePOST /searchGET /documents/{id}GET /projects
If SEMANTIC_INDEX_API_KEY is set, pass Authorization: Bearer <key>.
Search response shape is shared by HTTP, MCP, and the Python client:
{
"query": "candidate follow up",
"filters": {"project_identifier": "hiring", "limit": 5},
"results": [
{
"id": "redmine:issue:123:chunk:0",
"score": 0.72,
"snippet": "Candidate follow up...",
"payload": {},
"citation": {
"source": "redmine",
"doc_type": "issue",
"issue_id": 123,
"project_identifier": "hiring",
"url": "http://redmine/issues/123"
}
}
]
}
HTTP examples:
curl -sS -H "Authorization: Bearer $SEMANTIC_INDEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query":"candidate follow up","project_identifier":"hiring","limit":5}' \
http://127.0.0.1:8787/search
curl -sS -H "Authorization: Bearer $SEMANTIC_INDEX_API_KEY" \
http://127.0.0.1:8787/projects
Python Client
Use the client in-process when running from this repo/environment:
from semantic_index.client import SemanticIndexClient
client = SemanticIndexClient.local()
results = client.search("callum@safetagtracking.com", project_identifier="customer-service", limit=5)
document = client.get_document(results["results"][0]["id"])
Use HTTP mode from another local program:
from semantic_index.client import SemanticIndexClient
client = SemanticIndexClient(base_url="http://127.0.0.1:8787", api_key="...")
results = client.search("candidate follow up", project_identifier="hiring", limit=5)
Backfill
Refresh the configured Redmine sample from the command line:
python3 -m semantic_index --backfill-redmine-sample --limit 50
When REDMINE_PROJECT_IDENTIFIER is set, the rebuild deletes and replaces only
indexed Redmine documents for that project. Without a project identifier, it
rebuilds the Redmine source sample for the collection.
Refresh a balanced multi-project sample:
python3 -m semantic_index --backfill-redmine-projects \
--projects customer-service,hiring,todo-jason,sales-inbox,business-development,dock-scheduling,prep-standardization \
--per-project-limit 100
Use project-specific limits when Customer Service should stay larger than the internal project sample:
python3 -m semantic_index --backfill-redmine-projects \
--project-limits customer-service=500,hiring=200,todo-jason=200,sales-inbox=100,business-development=100,dock-scheduling=100,prep-standardization=100
Multi-project backfill rebuilds each project scope independently. Non-Helpdesk projects are indexed as ordinary Redmine issues and journals; they are not expected to have Helpdesk contact metadata.
Rolling Refresh
Use rolling refresh for routine updates after an initial backfill:
python3 -m semantic_index --refresh-redmine-projects \
--project-limits customer-service=500,hiring=200,todo-jason=200,sales-inbox=100,business-development=100,dock-scheduling=100,prep-standardization=100 \
--dry-run
Dry-run reports what would change without calling OpenAI or writing to Qdrant.
Remove --dry-run to apply the refresh.
The refresh maps each recent Redmine issue to stable document IDs, reads the
existing Qdrant payloads for that issue, and compares source_hash values.
Only new or changed documents are embedded and upserted. Unchanged documents
are left alone, and stale documents for refreshed issues are deleted without
embedding. Use --force-rebuild only when you explicitly want to re-embed
matching documents.
The default local state file is .cache/semantic_index/refresh_state.json.
After a successful refresh, later runs skip issues older than the previous
success timestamp minus --overlap-minutes unless --force-rebuild is used.
Override it with:
python3 -m semantic_index --refresh-redmine-projects \
--project-limits customer-service=500 \
--state-path /tmp/semantic-refresh-state.json
The HTTP endpoint exposes the same behavior:
curl -sS -X POST http://127.0.0.1:8787/sources/redmine/refresh \
-H 'Content-Type: application/json' \
-d '{"project_limits":{"customer-service":500},"dry_run":true}'
For production-style operation, use the wrapper script. It defaults to dry-run
and writes timestamped logs under .cache/semantic_index/logs:
semantic_index/refresh.sh
semantic_index/refresh.sh --apply
For a quick smoke check of the wrapper path:
SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=5' semantic_index/refresh.sh
Override project limits, state path, or log location through environment variables:
SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=500,hiring=200' \
SEMANTIC_INDEX_LOG_DIR=/var/log/semantic-index \
SEMANTIC_INDEX_STATE_PATH=/var/lib/semantic-index/refresh_state.json \
semantic_index/refresh.sh --apply
Do not schedule --force-rebuild. Force rebuilds should stay manual because
they intentionally re-embed unchanged documents.
MCP Stdio
python3 -m semantic_index --mcp-stdio
Tools:
semantic_searchsemantic_get_documentsemantic_list_projectssemantic_backfill_redmine_samplesemantic_refresh_redmine
For agent workflows, list projects first when the user has not named a project,
search broadly or with project_identifier when known, then call
semantic_get_document for any promising result. Treat returned citations and
Redmine URLs as the authoritative references. Backfill tools are operational and
should not be part of normal search behavior.
Inspection CLI
Use the inspect commands before larger backfills to see what is already indexed or preview what Redmine would produce without writing to Qdrant.
python3 -m semantic_index inspect count --source redmine --project customer-service
python3 -m semantic_index inspect list --limit 20 --source redmine --project customer-service
python3 -m semantic_index inspect search "order status" --limit 5 --project customer-service
python3 -m semantic_index inspect search "customer@example.com" --limit 5 --project customer-service
python3 -m semantic_index inspect show redmine:issue:39778:chunk:0
python3 -m semantic_index inspect preview-redmine --limit 10 --project customer-service
python3 -m semantic_index inspect audit --source redmine --project customer-service --limit 500
python3 -m semantic_index inspect compare-redmine --project customer-service --limit 20
python3 -m semantic_index inspect smoke-search --project customer-service
count, list, show, and preview-redmine do not call OpenAI.
search embeds the query text. List/search output shows snippets by default;
pass --full-text when you need the full indexed text.
audit summarizes indexed document coverage without calling OpenAI.
compare-redmine previews live Redmine chunks and compares them to indexed
Qdrant documents without writing to Qdrant. smoke-search runs known search
checks and calls OpenAI for query embeddings. Pass --json to audit,
compare-redmine, or smoke-search for machine-readable output.
For mixed project samples, run audit without --project to see project-level
counts and Helpdesk-contact coverage separately from ordinary internal issues.
For Helpdesk tickets, Redmine issue ingestion expects
/issues/:id.json?include=journals,helpdesk to return helpdesk_ticket
metadata with an expanded contact. See
docs/redmine_issue_api_helpdesk_include.md for the Redmine API patch notes.
Qdrant
For local Docker-hosted Qdrant:
docker run -p 6333:6333 -p 6334:6334 -v qdrant_storage:/qdrant/storage qdrant/qdrant
Create snapshots with Qdrant's snapshot API or mounted storage tooling before
destructive maintenance. The default collection name is
redmine_semantic_sample.