Hybrid Retrieval (SPEC-05)

The retrieval surface that every research consumer calls. Rule-based classifier + 4 routes + token-budgeted serializer.

Classifier decision tree

Hard constraints:

p95 latency < 5ms
Deterministic (same query → same route)
Auditable (rule_id logged in memory.retrieval.hybrid)

80-query labeled fixture (apps/research/memory-store/services/tests/fixtures/classifier_test_set.json) with ≥ 90% accuracy.

The 4 routes

Route	Strategy	When used
GRAPH (STRUCTURAL)	`$graphLookup` up to depth 4 over `memory_relationships`	Entity-to-entity walks, patient-context queries
AGGREGATION	`$match` on `taxonomy_category` + shallow 1-hop `$graphLookup` + count-sort	"How many patients with X have Y?"
TEXT (EXPLANATORY)	Atlas Search BM25 + 1-hop entity enrichment	"Why does metformin affect HbA1c?"
HYBRID	graph ∪ BM25 over `memory_claims` with dedupe + BM25 limit 15	Default + 2-entity queries

Context serializer

Renders two-tier output with token budgets and PHI redaction:

=== PATIENT GRAPH ===
Patient: PAT_001 → Condition: Type 2 Diabetes → Medication: Metformin
Patient: PAT_001 → Lab: HbA1c (7.2%)
...

=== SUPPORTING TEXT ===
[redacted] (claim text with pii_fields stripped)
Clinical note excerpt 1...
Clinical note excerpt 2...
...

Budgets (tokens):

graph tier: 400
text tier: 300–800 depending on route
Total capped per route

Truncation with truncated: true flag in response. Uses tiktoken for exact counts, whitespace fallback when unavailable.

HybridRetriever service

Composes classify → dispatch → serialize → audit:

class HybridRetriever:
    async def retrieve(
        self, query: str, *,
        patient_id: str | None = None,
        depth: int = 3,
        force_route: str | None = None,  # bypass classifier
        actor_id: str | None = None,
        auth_source: str | None = None,
    ) -> RetrievalResult:
        ...

Response shape (RetrievalResult):

{
  "context": AssembledContext,   # two-tier rendered
  "query_type": QueryType,
  "route": RetrievalStrategy,     # graph / aggregation / text / hybrid
  "classifier_rule_id": str,
  "classifier_confidence": float,
  "retrieval_latency_ms": float,
  "graph_edges_count": int,
  "text_chunks_count": int,
  "entities_referenced": list[str],  # NOT logged — count-only in audit
  "consent_filtered_count": int,
}

HTTP endpoint

POST /api/memory/retrieve
Authorization: Bearer <service-account-jwt>

Request:
{
  "query": "What medications is patient on?",
  "patient_id": "PAT_001",        // FR-013d: required for non-admin
  "depth": 3,                     // 1-4 per @Field(ge=1, le=4)
  "force_route": null             // optional bypass
}

Response 200: RetrievalResult
Response 400: Pydantic validation
Response 403: patient_id missing for non-admin
Response 422: invalid force_route or depth
Response 429: rate-limit (100/min per IP default)
Response 503: HYBRID_RETRIEVAL_ENABLED=false or Mongo error

Rate-limit: @limiter.limit("100/minute") via slowapi.

SDK

from patientrx_memory_sdk import MemoryClient, AsyncMemoryClient

async with AsyncMemoryClient(base_url=..., token=...) as client:
    result = await client.retrieve(
        query="medications for PAT_001",
        patient_id="PAT_001",
        depth=3,
    )
    # Handle MemoryRetrievalError on 503 → legacy BM25 fallback

Audit

Every retrieval emits memory.retrieval.hybrid:

{
  "action": "memory.retrieval.hybrid",
  "source": "memory-store",
  "patient_id": "PAT_001",
  "authSource": "standing",
  "payload": {
    "query_type": "HYBRID",
    "route": "hybrid",
    "classifier_rule_id": "DEF-1",
    "classifier_confidence": 0.85,
    "retrieval_latency_ms": 42.0,
    "graph_edges_count": 12,
    "text_chunks_count": 5,
    "total_tokens": 1247,
    "consent_filtered_count": 0,
    "entities_referenced_count": 42,  # count-only per FR-SPEC-05-015
    "depth": 3,
    "force_route_used": false,
    "outcome": "success"
  }
}

The entities_referenced list is never logged. Downstream consumers that need entity ids store them envelope-encrypted in llm_call_logs.metadata.

Classifier decision tree​

The 4 routes​

Context serializer​

HybridRetriever service​

HTTP endpoint​

SDK​

Audit​