Knowledge Lifecycle — Add / Change / Delete
The knowledge graph (entities, relationships, claims) is append-only by design — "delete" is expressed as a state transition (retracted / archived), never a hard destroy. This preserves the audit chain (FR-023, Principle IV) while still letting operators pull bad data out of circulation.
Three mechanisms are available, ordered by typical use:
- SDK (
MemoryClientfrompatientrx-memory-sdk) — how agents + research-engine services write. - HTTP (memory-store REST API, port 6401) — how apps/api and custom integrations write.
- MCP tools — read-only; cannot add, change, or delete.
Every write emits an audit event via AuditClient.append() (hash-chained, append-only).
Write paths at a glance
1. Add
Add an entity (concept, problem, intervention, outcome, population)
HTTP: POST /api/memory/entities
{
"entity_type": "problem",
"name": "Type 2 diabetes with stage 3 CKD",
"taxonomy_category": "problem",
"ontology_codes": [{"system": "ICD-10", "code": "E11.22"}],
"patientId": "<uuid or null for public ontology>",
"origin": "human"
}
SDK: ingest-batch shape (MemoryClient.ingest_batch([entity], [], [])) — the SDK does not expose a single-entity create_entity() method; batch with one item is the normal path.
Auth: @require_scope('memory.write') + consent check. Patient-scoped entities require either patient-self, standing consent, or admin bypass.
Audit: memory.entity.create — payload carries entity id + type + patientId; never the raw name.
Add a claim (evidence statement about an entity)
HTTP: POST /api/memory/claims
{
"entity_id": "<entity uuid>",
"claim_text": "Evidence from NEJM 2024 study of 1,200 patients ...",
"supports": ["<research_source id>"],
"pii_fields": [],
"patientId": null,
"decay_at": "2026-10-22T00:00:00Z",
"supersedes": "<optional existing claim id to mark superseded atomically>"
}
Lifecycle state on insert: pending (the Critic will promote to active asynchronously — see agents).
Auth: @require_scope('knowledge.write') + entity referential-integrity check (422 if entity_id doesn't exist).
Audit: memory.claim.create — or memory.claim.supersede when supersedes is set.
Add a relationship (e.g., intervention --treats-> problem)
HTTP: POST /api/memory/relationships
{
"source_id": "<entity uuid>",
"target_id": "<entity uuid>",
"rel_type": "treats",
"attributes": {"evidence_class": "RCT"},
"patientId": null
}
Auth: @require_scope('memory.relationship.write') + referential-integrity on both ids.
Audit: memory.relationship.created.
Add in bulk (ingest)
SDK: MemoryClient.ingest_batch(entities, claims, relationships, chunk_size=500) — tolerates partial failure (FR-043-005), returns per-item outcomes.
HTTP: POST /api/memory/ingest
Audit: one memory.ingest.batch event with counts only; per-item events still emit.
2. Change
There are two flavors of change: content (rare, only when the content was wrong at write time) and lifecycle state (common — promote, supersede, retract).
Change lifecycle state
Every entity and every claim has a lifecycle_state field whose transitions are machine-validated. Invalid transitions return HTTP 422.
HTTP (claim): PATCH /api/memory/claims/{claim_id}/state
{
"target_state": "superseded",
"supersedes": ["<new claim id>"]
}
HTTP (entity): PATCH /api/memory/entities/{entity_id}/state
{"target_state": "retracted"}
Claim transitions (enforced in claim_repository.validate_claim_transition()):
Entity transitions (enforced in entity_repository.validate_transition()):
| From \ To | active | superseded | retracted | archived |
|---|---|---|---|---|
| pending | ✓ | — | ✓ | ✓ |
| active | — | ✓ | ✓ | ✓ |
| superseded | — | — | — | ✓ |
| retracted | — | — | — | ✓ |
| archived | — | — | — | (terminal) |
Auth on PATCH claims/{id}/state: @require_scope('memory.claim.admin') — admin-only. A non-admin caller receives 403 + a memory.govern.unauthorized_access_attempt audit event.
Auth on PATCH entities/{id}/state: memory.write + entity ACL check. Retirement attempts (active → retracted / archived) additionally emit memory.govern.retire_attempt with {actor_id, entity_id, action_detail: "approved"|"denied"}.
Audit: memory.claim.state_transition / memory.entity.state_transition — payload includes from_state, to_state, actor_id (salted hash), patientId, and the supersedes list when set.
Change content
When a claim's text is wrong (as opposed to its conclusion being outdated), emit a new claim and mark the old one superseded atomically:
POST /api/memory/claims
{
"entity_id": "<same entity id>",
"claim_text": "<corrected text>",
"supports": ["<source ids>"],
"supersedes": "<old claim id>"
}
This produces two audit rows — memory.claim.create for the new claim and memory.claim.supersede for the old — both hash-chained so the correction is tamper-evident.
Direct PATCH of claim_text_encrypted or entity name is not exposed. If truly needed (e.g., a PHI accident), route through /hipaa-incident-response — the incident response skill records the justification, performs a compensating write, and documents the chain in the incident log.
3. Delete
There is no hard delete. "Delete" is always one of:
| Intent | Target state | Emits |
|---|---|---|
| Rejected by Critic / bad evidence | retracted | memory.claim.state_transition (to retracted) |
| Superseded by newer claim | superseded + new claim | memory.claim.supersede |
| Broken provenance (source retracted) | retracted | Librarian memory.govern.broken_provenance |
| Decayed / eroded over time | archived | Replicator memory.govern.eroded |
| Operator pulled from circulation | retracted → archived | memory.entity.state_transition x2 |
| Quarantined claim rejected by operator | retracted | Operator drain — memory.claim.state_transition |
Why no hard delete?
- Principle IV (Audit by Default) — the audit chain (
seq,prevHash,eventHashvia RFC 8785 JCS + SHA-256) is append-only. A hard delete would break the chain and trigger a Sev 0 alert from the hourlyAuditVerificationJob. - Supersession traceability —
get_claim_chain()walks backwards throughsupersedeslinks; a missing parent is treated as tamper. - HIPAA 164.316(b) — 6-year retention of audit-relevant artifacts.
Retention eventually disposes: entities + claims ride the patient's data-retention policy; correlator / replicator / librarian findings have a 180-day TTL. Disposition is logged as memory.govern.retention_disposed, never as a raw DB delete.
4. Confidence & freshness — how they're determined
Confidence and freshness are the two signals that decide whether a claim keeps driving retrieval or gets pulled for review. They are computed by four cooperating mechanisms — two per-claim scorers and two background jobs.
4.1 Confidence score
Claim.confidence_score is a 0.0..1.0 float persisted on the document. Two code paths write it:
- At ingest — the Researcher sets an initial
confidence_scorebased on the source and its own retrieval-route score. Stored inmemory_claims.confidence_scoreatPOST /api/memory/claimstime. - At Critic promotion — the Critic agent (
apps/research/research-engine/services/agents/critic/experiment.py) prompts the LLM against the claim + its cited sources and returns aCriticVerdictwhose score reshapes into one of:promote/supersede/reject/quarantine. The LLM's numeric confidence is passed through to drive the verdict.
Tie-break rule — when two claims conflict on the same entity, CriticJob._lower_confidence(a, b) (apps/research/memory-store/services/jobs/critic.py) picks the lower-confidence one as the loser and emits memory.claim.state_transition to superseded. Ties fall back to t_created (older wins).
Shipped today (spec 063, PR #189):
confidence_scorestored on every claim and read byCriticJob._lower_confidence()for pairwise tie-breaks.- Critic LLM verdict drives promote / supersede / reject / quarantine.
- Source authority tiers —
SourceTierenum + canonicalSOURCE_TIER_MAPinpatientrx_contracts.enums. Tier-1 (NEJM · JAMA · Cochrane · NCCN), tier-2 (PubMed · Elsevier · Wiley · LWW · ClinicalTrials.gov · ClinVar · DrugBank), tier-3 (patient-note · open-web · clinical-narrative). Unknown connector → tier-3 (fail-safe low). Critic prompt rule #8 weights tier-1 > tier-2 > tier-3 and forbids tier-3 → tier-1 supersession. - Critic-verdict score persists onto the claim —
PromotionPipeline._apply_verdictforwardsverdict.scoreasconfidence_scoreon promote + supersede via the memory-store PATCH endpoint, applied atomically with the state-transition transaction.
4.2 Freshness score
Freshness today is a boolean per-field staleness check, not a numeric score. It lives in apps/research/memory-store/services/completeness/computer.py:
def _is_stale(claim, freshness_days: int, now: datetime) -> bool:
age_days = (now - claim.t_created).days
return age_days > freshness_days
freshness_days is declared per field, per screen profile (Spec 016 Screen Profile freshnessRules). Example: a vitals reading might carry freshness_days: 1 on an inpatient screen and freshness_days: 30 on a clinic follow-up screen.
The completeness scorer folds stale fields into a 0..1 screen-level score:
screen_completeness = (present_fields + 0.5 × stale_fields) / total_required_fields
So a stale field is half-present — it still exists, but it's discounted until a refresh lands.
What determines staleness per claim:
| Input | Source |
|---|---|
claim.t_created | write time (Mongo field) |
claim.t_invalid | supersede / retract time (Mongo field) |
freshness_days | screen profile YAML (apps/api/src/modules/screen-profiles/registry/profiles/*.ts) |
lifecycle_state gate | only active claims enter the staleness check; superseded / retracted / archived are skipped |
Shipped today: per-field boolean staleness per screen profile; half-weight in the completeness score.
Planned (follow-up): a numeric claim-level freshness score (0..1) combining source age, source authority tier, and supersession depth into a single value. Today the boolean staleness check + FreshnessConfidenceJob's age-based gate (see §4.4) together cover the operational need; the single-number scorer is a reporting/UI convenience, not a gap in enforcement.
4.3 Decay / TTL job
apps/research/memory-store/services/jobs/decay.py — runs on a scheduler configured in jobs_config.yml:
- Interval: 3600s (1 hour) by default
- Scan:
memory_claims.find({"decay_at": {"$lt": now}, "lifecycle_state": {"$in": ["active", "pending"]}}) - Action:
- PHI-bearing claims →
claim_repo.anonymize_expired()(per-field redaction perpii_fields[], not a delete, per FR-042-012) - Hypotheses + non-PHI relationships → hard delete (they carry no audit weight)
- PHI-bearing claims →
- Audit events:
memory.job.decay.started/.anonymized/.deleted/.completed/.error(FR-042-009)
decay_at is set at claim ingest time by the Researcher — a clinical-narrative claim might carry a 2-year decay_at; a research-article claim might carry 10 years. It is a privacy-retention horizon, not a confidence decay.
4.4 Re-research triggers — Replicator + FreshnessConfidenceJob
Two complementary jobs decide when an active claim gets re-surfaced to the Critic.
Replicator (verdict-driven, weekly)
apps/research/research-engine/services/workers/replicator_job.py — runs weekly on a sample of aged active claims:
-
Re-fetches the cited sources (via the 12 trusted-source connectors).
-
Scores each claim as one of
still_supported/eroded/retracted_source/broken_provenance. -
On
erodedorretracted_source→ calls_enqueue_recritique_task()which writes a newresearch_tasksrow with:{
"agent_role": "critic",
"research_question": "<claim text>",
"entity_context": ["<entity ids>"],
"metadata": {"source": "replicator", "original_claim_id": "..."}
} -
The next Critic tick dequeues and re-evaluates; typical outcome is a supersede-or-retract transition.
Audit event: memory.govern.eroded / memory.govern.retracted_source + research.task.enqueued.
FreshnessConfidenceJob (threshold-driven, hourly)
apps/research/research-engine/services/workers/freshness_confidence_job.py — spec 063 FR-063-003. Runs hourly, scans memory_claims for:
state == "active"
AND (confidence_score < confidence_threshold OR t_created < now - max_age_sec)
AND freshness_recheck_requested_at NOT in backoff window
For each match it enqueues a Critic re-task via TaskQueueManager (same shape as the Replicator's re-task payload) with metadata.trigger = "freshness_confidence" and trigger_reason in {below_confidence, aged, below_confidence_and_aged}. The job then stamps freshness_recheck_requested_at = now on the claim so perpetually-marginal claims don't re-enqueue every tick.
Defaults (configurable in services/config/worker_config.yml):
| Setting | Default | Meaning |
|---|---|---|
freshness_confidence_enabled | false (opt-in) | Flip to true after reviewing thresholds |
freshness_confidence_interval_sec | 3600 (1h) | Tick cadence |
freshness_confidence_threshold | 0.5 | Claims with confidence_score strictly below trigger a re-check |
freshness_confidence_max_age_sec | 7,776,000 (90d) | Age at which even a high-confidence claim gets pro-forma re-checked |
freshness_confidence_recheck_backoff_sec | 604,800 (7d) | Per-claim cool-off between re-enqueues |
freshness_confidence_sample_limit | 50 | Per-tick cap to bound LLM cost |
Audit events: research.freshness_confidence.tick (per-tick summary with counts + thresholds) + research.freshness_confidence.enqueued (per claim, with reason + confidence_score + task_id). Metadata-only — no claim text or PHI in either payload.
Shipped today (spec 063): both the verdict-driven Replicator path and the threshold-driven FreshnessConfidenceJob. The job is opt-in (freshness_confidence_enabled: false by default) so operators can review settings before turning it on.
4.5 Supersession & compaction
apps/research/memory-store/services/repositories/claim_repository.py:
supersede_transaction()— atomic Motor transaction: inserts new claim + flips old claim'slifecycle_statetosuperseded+ setst_invalid = now+ appends the new claim id to the old claim'ssuperseded_by[].get_claim_chain(claim_id)— walkssupersedes[]backward through the chain so the UI can show the full provenance trail of a corrected claim.- No separate compaction job —
t_invalidis the soft-tombstone; claims remain queryable for 6-year audit retention, filtered out of default retrieval bylifecycle_state="active".
4.6 What "confidence & freshness" means at query time
The retrieval surface (POST /api/memory/retrieve, Spec 060) filters on:
lifecycle_state == "active"— superseded / retracted / archived never surface.acl.roles+ResolvedScope— consent + role gates.decay_at > now— the serializer drops any claim whose decay horizon passed but the decay job hasn't swept yet.- Implicit freshness — the caller passes a
screen_profileand the retriever uses itsfreshnessRulesto mark stale fields in the response metadata so the UI can show a "data from N days ago" chip.
No numeric confidence threshold is applied at query time. Enforcement happens asynchronously via FreshnessConfidenceJob (§4.4), which re-enqueues marginal claims for Critic re-evaluation so the next retrieval sees either a refreshed confidence_score or a transition to superseded/retracted.
Shipped vs. planned — honest status
| Signal | Status | File |
|---|---|---|
confidence_score persisted on claim | ✅ shipped | memory_claims.confidence_score |
| Critic LLM verdict → promote / supersede / reject / quarantine | ✅ shipped | research-engine/agents/critic/experiment.py |
Critic verdict.score persists onto the claim post-promotion | ✅ shipped (spec 063) | research-engine/workers/promotion_pipeline.py |
| Source authority tiers (NEJM tier-1, PubMed tier-2, patient-note tier-3) | ✅ shipped (spec 063) | patientrx_contracts/enums.py |
Per-field _is_stale boolean | ✅ shipped | memory-store/completeness/computer.py |
| Numeric 0..1 freshness scorer | 📋 planned | — (boolean + FreshnessConfidenceJob cover enforcement; scorer is UI convenience) |
| TTL decay job (hourly, anonymizes PHI) | ✅ shipped | memory-store/jobs/decay.py |
Replicator re-research on eroded / retracted_source | ✅ shipped | research-engine/workers/replicator_job.py |
| Auto-trigger re-research on low-freshness / low-confidence | ✅ shipped (spec 063, opt-in) | research-engine/workers/freshness_confidence_job.py |
Supersession atomic txn + t_invalid tombstone | ✅ shipped | memory-store/repositories/claim_repository.py |
End-to-end example — retract a claim
Required consents, roles, and scopes — cheat sheet
| Operation | HTTP | Scope | Role gate | Audit |
|---|---|---|---|---|
| Add entity | POST /entities | memory.write | writer | memory.entity.create |
| Add claim | POST /claims | knowledge.write | writer | memory.claim.create / supersede |
| Add relationship | POST /relationships | memory.relationship.write | writer | memory.relationship.created |
| Bulk ingest | POST /ingest | memory.ingest.batch | writer | memory.ingest.batch |
| Change entity state | PATCH /entities/{id}/state | memory.write | writer | memory.entity.state_transition (+ memory.govern.retire_attempt on retire) |
| Change claim state | PATCH /claims/{id}/state | memory.claim.admin | admin only | memory.claim.state_transition |
| Supersede claim | POST /claims w/ supersedes | knowledge.write | writer | memory.claim.supersede |
| Retract / archive | state transition (no hard delete) | as above | as above | as above |
| Read / retrieve | POST /retrieve, GET /entities/{id} | memory.read / memory.graph.read | consent-scoped | memory.retrieval.hybrid (count-only) |
| MCP tools | Claude Code | read-only | — | memory.mcp.tool_invoked |
Never do this
- Never call
MongoDB.deleteOne()onmemory_*collections from application code — always go through a state transition. The change-stream pipelines (Critic promotion, Replicator decay) assume every disappearance is preceded by astate_transitionaudit row. - Never mutate
claim_text_encryptedor entitynamein place — always supersede via a new claim, so the correction is in the audit chain and callers readingget_claim_chain()can see the correction. - Never skip the ACL / consent check by using the MCP or the repository directly — the HTTP / SDK path is the only one that runs
AclEnforcer+ResolvedScope+AuditClientin the correct order. - Never emit a write from the Researcher agent without a
supports[]source — the Critic will retract it at the next SLA tick; better to fail fast. - Never bypass the state machine with a direct
updateOne({lifecycle_state: ...})—validate_transition()is the only legal gate.
Where to read more
- Schemas + ER diagram: Knowledge graph →
- Critic + Librarian + Replicator behaviors: Agents →
- Retrieval (read path): Hybrid retrieval →
- Audit chain mechanics: Audit chain →
- Consent model: Consent model →