Add semantic-index service, deployment assets, and tests

This commit is contained in:
Jason Thistlethwaite
2026-05-04 09:50:03 -04:00
parent faad70872b
commit b305544f63
42 changed files with 5059 additions and 0 deletions
@@ -0,0 +1,182 @@
# Semantic Index Pre-Deployment Validation
Validation date: `2026-04-25`
This records the current LAN pre-deployment checks for the semantic index. It
does not include secrets.
## Deploy Unit
Semantic-index deployable files are documented in:
- `dist/semantic-index-v1-predeployment-20260425T150000Z.MANIFEST.md`
- `docs/semantic_index_deployment_runbook.md`
Current known unrelated worktree changes are outside the semantic-index deploy
unit and should not be mixed into the semantic-index release package:
- `redMCP/README.md`
- `redMCP/app/McpDispatcher.php`
- `redMCP/app/RedmineClient.php`
- `redMCP/composer.json`
- `redMCP/bin/test-redmine-structure.php`
- `TODO.md`
## Local Verification
Passed:
```sh
.venv/bin/python -m py_compile semantic_index/*.py
.venv/bin/python -m unittest discover -s tests/semantic_index
bash -n semantic_index/refresh.sh
```
Observed semantic test result:
```text
Ran 65 tests in 1.041s
OK
```
## LAN Redmine Preview
Passed:
```sh
.venv/bin/python -m semantic_index inspect preview-redmine \
--project customer-service \
--limit 5
```
Observed:
- Helpdesk issue chunks include contact id, name, email, and company metadata.
- Issue `39779` includes Callum Mackeonis and `callum@safetagtracking.com`.
- Journals are present as separate indexed documents.
- Contact documents are present as separate indexed documents.
## Qdrant Audit
Passed:
```sh
.venv/bin/python -m semantic_index inspect audit --source redmine --limit 5000 --json
```
Observed:
```text
total_documents=2947
doc_type contact=714
doc_type issue=1208
doc_type journal=1025
project business-development=66
project customer-service=1684
project dock-scheduling=63
project hiring=409
project prep-standardization=25
project sales-inbox=192
project todo-jason=508
contact_metadata=2232
helpdesk_contact_metadata=2232/2232
attachments=0
```
## HTTP Validation
Passed:
```sh
curl -sS http://127.0.0.1:8787/health
```
Observed:
```json
{"status":"ok"}
```
Unauthenticated `/projects` correctly returned unauthorized when
`SEMANTIC_INDEX_API_KEY` was configured.
Authenticated `/projects` passed and returned the expected seven projects:
```text
business-development
customer-service
dock-scheduling
hiring
prep-standardization
sales-inbox
todo-jason
```
HTTP search passed:
```sh
semantic_index/search.sh "goods return" customer-service 3
```
Observed:
- Top result was `redmine:issue:39779:chunk:0`.
- Citation included project `customer-service`.
- Citation included contact id `1890`, contact name, contact email, and Redmine
URL.
## Refresh Validation
Passed safe dry-run smoke check:
```sh
SEMANTIC_INDEX_PROJECT_LIMITS='customer-service=5' semantic_index/refresh.sh
```
Observed:
```text
mode=dry-run
issues=5
scanned_issues=5
detail_fetched_issues=0
skipped_issues=5
would_embed_documents=0
embedded_documents=0
```
This confirms the refresh state prefilter skips old issues before Redmine detail
fetch and before embedding.
## Qdrant Validation
Read-only collection check passed:
```text
collection=redmine_semantic_sample
status=green
vector_size=1536
distance=Cosine
points_count=2947
update_queue.length=0
```
Read-only snapshot listing endpoint responded successfully:
```text
/collections/redmine_semantic_sample/snapshots
result=[]
```
No snapshot was created during this validation.
## Remaining Pre-Deployment Items
- Decide final target host paths for logs and refresh state.
- Decide service manager shape: manual `uvicorn`, systemd service, or another
supervisor.
- Create or confirm a Qdrant snapshot immediately before production backfill.
- Package only the semantic-index deploy unit, keeping unrelated `redMCP`
worktree changes out of the release.
- Keep scheduled refresh disabled until manual dry-run and `--apply` logs are
reviewed on the target host.