Skip to main content

Hybrid Retrieval (SPEC-05)

The retrieval surface that every research consumer calls. Rule-based classifier + 4 routes + token-budgeted serializer.

Classifier decision tree

Hard constraints:

  • p95 latency < 5ms
  • Deterministic (same query → same route)
  • Auditable (rule_id logged in memory.retrieval.hybrid)

80-query labeled fixture (apps/research/memory-store/services/tests/fixtures/classifier_test_set.json) with ≥ 90% accuracy.

The 4 routes

RouteStrategyWhen used
GRAPH (STRUCTURAL)$graphLookup up to depth 4 over memory_relationshipsEntity-to-entity walks, patient-context queries
AGGREGATION$match on taxonomy_category + shallow 1-hop $graphLookup + count-sort"How many patients with X have Y?"
TEXT (EXPLANATORY)Atlas Search BM25 + 1-hop entity enrichment"Why does metformin affect HbA1c?"
HYBRIDgraph ∪ BM25 over memory_claims with dedupe + BM25 limit 15Default + 2-entity queries

Context serializer

Renders two-tier output with token budgets and PHI redaction:

=== PATIENT GRAPH ===
Patient: PAT_001 → Condition: Type 2 Diabetes → Medication: Metformin
Patient: PAT_001 → Lab: HbA1c (7.2%)
...

=== SUPPORTING TEXT ===
[redacted] (claim text with pii_fields stripped)
Clinical note excerpt 1...
Clinical note excerpt 2...
...

Budgets (tokens):

  • graph tier: 400
  • text tier: 300–800 depending on route
  • Total capped per route

Truncation with truncated: true flag in response. Uses tiktoken for exact counts, whitespace fallback when unavailable.

HybridRetriever service

Composes classify → dispatch → serialize → audit:

class HybridRetriever:
async def retrieve(
self, query: str, *,
patient_id: str | None = None,
depth: int = 3,
force_route: str | None = None, # bypass classifier
actor_id: str | None = None,
auth_source: str | None = None,
) -> RetrievalResult:
...

Response shape (RetrievalResult):

{
"context": AssembledContext, # two-tier rendered
"query_type": QueryType,
"route": RetrievalStrategy, # graph / aggregation / text / hybrid
"classifier_rule_id": str,
"classifier_confidence": float,
"retrieval_latency_ms": float,
"graph_edges_count": int,
"text_chunks_count": int,
"entities_referenced": list[str], # NOT logged — count-only in audit
"consent_filtered_count": int,
}

HTTP endpoint

POST /api/memory/retrieve
Authorization: Bearer <service-account-jwt>

Request:
{
"query": "What medications is patient on?",
"patient_id": "PAT_001", // FR-013d: required for non-admin
"depth": 3, // 1-4 per @Field(ge=1, le=4)
"force_route": null // optional bypass
}

Response 200: RetrievalResult
Response 400: Pydantic validation
Response 403: patient_id missing for non-admin
Response 422: invalid force_route or depth
Response 429: rate-limit (100/min per IP default)
Response 503: HYBRID_RETRIEVAL_ENABLED=false or Mongo error

Rate-limit: @limiter.limit("100/minute") via slowapi.

SDK

from patientrx_memory_sdk import MemoryClient, AsyncMemoryClient

async with AsyncMemoryClient(base_url=..., token=...) as client:
result = await client.retrieve(
query="medications for PAT_001",
patient_id="PAT_001",
depth=3,
)
# Handle MemoryRetrievalError on 503 → legacy BM25 fallback

Audit

Every retrieval emits memory.retrieval.hybrid:

{
"action": "memory.retrieval.hybrid",
"source": "memory-store",
"patient_id": "PAT_001",
"authSource": "standing",
"payload": {
"query_type": "HYBRID",
"route": "hybrid",
"classifier_rule_id": "DEF-1",
"classifier_confidence": 0.85,
"retrieval_latency_ms": 42.0,
"graph_edges_count": 12,
"text_chunks_count": 5,
"total_tokens": 1247,
"consent_filtered_count": 0,
"entities_referenced_count": 42, # count-only per FR-SPEC-05-015
"depth": 3,
"force_route_used": false,
"outcome": "success"
}
}

The entities_referenced list is never logged. Downstream consumers that need entity ids store them envelope-encrypted in llm_call_logs.metadata.