# Redmine Semantic Index Local semantic index service for a recent Redmine Helpdesk sample. V1 uses OpenAI `text-embedding-3-small` embeddings and Qdrant vectors, with Redmine as the first source adapter. For deploy, validation, and rollback steps, see `docs/semantic_index_deployment_runbook.md`. ## Configuration Copy `.env.example` to `.env` and set local secrets there. Do not commit `.env`. Required for live use: - `OPENAI_API_KEY` - `QDRANT_URL` - `REDMINE_URL` - `REDMINE_API_KEY` Optional: - `QDRANT_API_KEY` - `QDRANT_COLLECTION` - `REDMINE_PROJECT_IDENTIFIER` - `REDMINE_SAMPLE_LIMIT` - `SEMANTIC_INDEX_API_KEY` ## HTTP Install runtime dependencies in your chosen environment: ```sh pip install openai qdrant-client fastapi uvicorn ``` Run: ```sh uvicorn semantic_index.app:app --host 127.0.0.1 --port 8787 ``` Endpoints: - `GET /health` - `POST /sources/redmine/backfill-sample` - `POST /search` - `GET /documents/{id}` - `GET /projects` If `SEMANTIC_INDEX_API_KEY` is set, pass `Authorization: Bearer `. Search response shape is shared by HTTP, MCP, and the Python client: ```json { "query": "candidate follow up", "filters": {"project_identifier": "hiring", "limit": 5}, "results": [ { "id": "redmine:issue:123:chunk:0", "score": 0.72, "snippet": "Candidate follow up...", "payload": {}, "citation": { "source": "redmine", "doc_type": "issue", "issue_id": 123, "project_identifier": "hiring", "url": "http://redmine/issues/123" } } ] } ``` HTTP examples: ```sh curl -sS -H "Authorization: Bearer $SEMANTIC_INDEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{"query":"candidate follow up","project_identifier":"hiring","limit":5}' \ http://127.0.0.1:8787/search curl -sS -H "Authorization: Bearer $SEMANTIC_INDEX_API_KEY" \ http://127.0.0.1:8787/projects ``` ## Python Client Use the client in-process when running from this repo/environment: ```python from semantic_index.client import SemanticIndexClient client = SemanticIndexClient.local() results = client.search("callum@safetagtracking.com", project_identifier="customer-service", limit=5) document = client.get_document(results["results"][0]["id"]) ``` Use HTTP mode from another local program: ```python from semantic_index.client import SemanticIndexClient client = SemanticIndexClient(base_url="http://127.0.0.1:8787", api_key="...") results = client.search("candidate follow up", project_identifier="hiring", limit=5) ``` ## Backfill Refresh the configured Redmine sample from the command line: ```sh python3 -m semantic_index --backfill-redmine-sample --limit 50 ``` When `REDMINE_PROJECT_IDENTIFIER` is set, the rebuild deletes and replaces only indexed Redmine documents for that project. Without a project identifier, it rebuilds the Redmine source sample for the collection. Refresh a balanced multi-project sample: ```sh python3 -m semantic_index --backfill-redmine-projects \ --projects customer-service,hiring,todo-jason,sales-inbox,business-development,dock-scheduling,prep-standardization \ --per-project-limit 100 ``` Use project-specific limits when Customer Service should stay larger than the internal project sample: ```sh python3 -m semantic_index --backfill-redmine-projects \ --project-limits customer-service=500,hiring=200,todo-jason=200,sales-inbox=100,business-development=100,dock-scheduling=100,prep-standardization=100 ``` Multi-project backfill rebuilds each project scope independently. Non-Helpdesk projects are indexed as ordinary Redmine issues and journals; they are not expected to have Helpdesk contact metadata. ## Rolling Refresh Use rolling refresh for routine updates after an initial backfill: ```sh python3 -m semantic_index --refresh-redmine-projects \ --project-limits customer-service=500,hiring=200,todo-jason=200,sales-inbox=100,business-development=100,dock-scheduling=100,prep-standardization=100 \ --dry-run ``` Dry-run reports what would change without calling OpenAI or writing to Qdrant. Remove `--dry-run` to apply the refresh. The refresh maps each recent Redmine issue to stable document IDs, reads the existing Qdrant payloads for that issue, and compares `source_hash` values. Only new or changed documents are embedded and upserted. Unchanged documents are left alone, and stale documents for refreshed issues are deleted without embedding. Use `--force-rebuild` only when you explicitly want to re-embed matching documents. The default local state file is `.cache/semantic_index/refresh_state.json`. After a successful refresh, later runs skip issues older than the previous success timestamp minus `--overlap-minutes` unless `--force-rebuild` is used. Override it with: ```sh python3 -m semantic_index --refresh-redmine-projects \ --project-limits customer-service=500 \ --state-path /tmp/semantic-refresh-state.json ``` The HTTP endpoint exposes the same behavior: ```sh curl -sS -X POST http://127.0.0.1:8787/sources/redmine/refresh \ -H 'Content-Type: application/json' \ -d '{"project_limits":{"customer-service":500},"dry_run":true}' ``` For production-style operation, use the wrapper script. It defaults to dry-run and writes timestamped logs under `.cache/semantic_index/logs`: ```sh semantic_index/refresh.sh semantic_index/refresh.sh --apply ``` For a quick smoke check of the wrapper path: ```sh SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=5' semantic_index/refresh.sh ``` Override project limits, state path, or log location through environment variables: ```sh SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=500,hiring=200' \ SEMANTIC_INDEX_LOG_DIR=/var/log/semantic-index \ SEMANTIC_INDEX_STATE_PATH=/var/lib/semantic-index/refresh_state.json \ semantic_index/refresh.sh --apply ``` Do not schedule `--force-rebuild`. Force rebuilds should stay manual because they intentionally re-embed unchanged documents. ## MCP Stdio ```sh python3 -m semantic_index --mcp-stdio ``` Tools: - `semantic_search` - `semantic_get_document` - `semantic_list_projects` - `semantic_backfill_redmine_sample` - `semantic_refresh_redmine` For agent workflows, list projects first when the user has not named a project, search broadly or with `project_identifier` when known, then call `semantic_get_document` for any promising result. Treat returned citations and Redmine URLs as the authoritative references. Backfill tools are operational and should not be part of normal search behavior. ## Inspection CLI Use the inspect commands before larger backfills to see what is already indexed or preview what Redmine would produce without writing to Qdrant. ```sh python3 -m semantic_index inspect count --source redmine --project customer-service python3 -m semantic_index inspect list --limit 20 --source redmine --project customer-service python3 -m semantic_index inspect search "order status" --limit 5 --project customer-service python3 -m semantic_index inspect search "customer@example.com" --limit 5 --project customer-service python3 -m semantic_index inspect show redmine:issue:39778:chunk:0 python3 -m semantic_index inspect preview-redmine --limit 10 --project customer-service python3 -m semantic_index inspect audit --source redmine --project customer-service --limit 500 python3 -m semantic_index inspect compare-redmine --project customer-service --limit 20 python3 -m semantic_index inspect smoke-search --project customer-service ``` `count`, `list`, `show`, and `preview-redmine` do not call OpenAI. `search` embeds the query text. List/search output shows snippets by default; pass `--full-text` when you need the full indexed text. `audit` summarizes indexed document coverage without calling OpenAI. `compare-redmine` previews live Redmine chunks and compares them to indexed Qdrant documents without writing to Qdrant. `smoke-search` runs known search checks and calls OpenAI for query embeddings. Pass `--json` to `audit`, `compare-redmine`, or `smoke-search` for machine-readable output. For mixed project samples, run `audit` without `--project` to see project-level counts and Helpdesk-contact coverage separately from ordinary internal issues. For Helpdesk tickets, Redmine issue ingestion expects `/issues/:id.json?include=journals,helpdesk` to return `helpdesk_ticket` metadata with an expanded contact. See `docs/redmine_issue_api_helpdesk_include.md` for the Redmine API patch notes. ## Qdrant For local Docker-hosted Qdrant: ```sh docker run -p 6333:6333 -p 6334:6334 -v qdrant_storage:/qdrant/storage qdrant/qdrant ``` Create snapshots with Qdrant's snapshot API or mounted storage tooling before destructive maintenance. The default collection name is `redmine_semantic_sample`.