Hybrid Retrieval (SPEC-05)
The retrieval surface that every research consumer calls. Rule-based classifier + 4 routes + token-budgeted serializer.
Classifier decision tree
Hard constraints:
- p95 latency < 5ms
- Deterministic (same query → same route)
- Auditable (rule_id logged in
memory.retrieval.hybrid)
80-query labeled fixture (apps/research/memory-store/services/tests/fixtures/classifier_test_set.json) with ≥ 90% accuracy.
The 4 routes
| Route | Strategy | When used |
|---|---|---|
| GRAPH (STRUCTURAL) | $graphLookup up to depth 4 over memory_relationships | Entity-to-entity walks, patient-context queries |
| AGGREGATION | $match on taxonomy_category + shallow 1-hop $graphLookup + count-sort | "How many patients with X have Y?" |
| TEXT (EXPLANATORY) | Atlas Search BM25 + 1-hop entity enrichment | "Why does metformin affect HbA1c?" |
| HYBRID | graph ∪ BM25 over memory_claims with dedupe + BM25 limit 15 | Default + 2-entity queries |
Context serializer
Renders two-tier output with token budgets and PHI redaction:
=== PATIENT GRAPH ===
Patient: PAT_001 → Condition: Type 2 Diabetes → Medication: Metformin
Patient: PAT_001 → Lab: HbA1c (7.2%)
...
=== SUPPORTING TEXT ===
[redacted] (claim text with pii_fields stripped)
Clinical note excerpt 1...
Clinical note excerpt 2...
...
Budgets (tokens):
- graph tier: 400
- text tier: 300–800 depending on route
- Total capped per route
Truncation with truncated: true flag in response. Uses tiktoken for exact counts, whitespace fallback when unavailable.
HybridRetriever service
Composes classify → dispatch → serialize → audit:
class HybridRetriever:
async def retrieve(
self, query: str, *,
patient_id: str | None = None,
depth: int = 3,
force_route: str | None = None, # bypass classifier
actor_id: str | None = None,
auth_source: str | None = None,
) -> RetrievalResult:
...
Response shape (RetrievalResult):
{
"context": AssembledContext, # two-tier rendered
"query_type": QueryType,
"route": RetrievalStrategy, # graph / aggregation / text / hybrid
"classifier_rule_id": str,
"classifier_confidence": float,
"retrieval_latency_ms": float,
"graph_edges_count": int,
"text_chunks_count": int,
"entities_referenced": list[str], # NOT logged — count-only in audit
"consent_filtered_count": int,
}
HTTP endpoint
POST /api/memory/retrieve
Authorization: Bearer <service-account-jwt>
Request:
{
"query": "What medications is patient on?",
"patient_id": "PAT_001", // FR-013d: required for non-admin
"depth": 3, // 1-4 per @Field(ge=1, le=4)
"force_route": null // optional bypass
}
Response 200: RetrievalResult
Response 400: Pydantic validation
Response 403: patient_id missing for non-admin
Response 422: invalid force_route or depth
Response 429: rate-limit (100/min per IP default)
Response 503: HYBRID_RETRIEVAL_ENABLED=false or Mongo error
Rate-limit: @limiter.limit("100/minute") via slowapi.
SDK
from patientrx_memory_sdk import MemoryClient, AsyncMemoryClient
async with AsyncMemoryClient(base_url=..., token=...) as client:
result = await client.retrieve(
query="medications for PAT_001",
patient_id="PAT_001",
depth=3,
)
# Handle MemoryRetrievalError on 503 → legacy BM25 fallback
Audit
Every retrieval emits memory.retrieval.hybrid:
{
"action": "memory.retrieval.hybrid",
"source": "memory-store",
"patient_id": "PAT_001",
"authSource": "standing",
"payload": {
"query_type": "HYBRID",
"route": "hybrid",
"classifier_rule_id": "DEF-1",
"classifier_confidence": 0.85,
"retrieval_latency_ms": 42.0,
"graph_edges_count": 12,
"text_chunks_count": 5,
"total_tokens": 1247,
"consent_filtered_count": 0,
"entities_referenced_count": 42, # count-only per FR-SPEC-05-015
"depth": 3,
"force_route_used": false,
"outcome": "success"
}
}
The entities_referenced list is never logged. Downstream consumers that need entity ids store them envelope-encrypted in llm_call_logs.metadata.