Skip to main content

Knowledge Lifecycle — Add / Change / Delete

The knowledge graph (entities, relationships, claims) is append-only by design — "delete" is expressed as a state transition (retracted / archived), never a hard destroy. This preserves the audit chain (FR-023, Principle IV) while still letting operators pull bad data out of circulation.

Three mechanisms are available, ordered by typical use:

  1. SDK (MemoryClient from patientrx-memory-sdk) — how agents + research-engine services write.
  2. HTTP (memory-store REST API, port 6401) — how apps/api and custom integrations write.
  3. MCP tools — read-only; cannot add, change, or delete.

Every write emits an audit event via AuditClient.append() (hash-chained, append-only).

Write paths at a glance

1. Add

Add an entity (concept, problem, intervention, outcome, population)

HTTP: POST /api/memory/entities

{
"entity_type": "problem",
"name": "Type 2 diabetes with stage 3 CKD",
"taxonomy_category": "problem",
"ontology_codes": [{"system": "ICD-10", "code": "E11.22"}],
"patientId": "<uuid or null for public ontology>",
"origin": "human"
}

SDK: ingest-batch shape (MemoryClient.ingest_batch([entity], [], [])) — the SDK does not expose a single-entity create_entity() method; batch with one item is the normal path.

Auth: @require_scope('memory.write') + consent check. Patient-scoped entities require either patient-self, standing consent, or admin bypass.

Audit: memory.entity.create — payload carries entity id + type + patientId; never the raw name.

Add a claim (evidence statement about an entity)

HTTP: POST /api/memory/claims

{
"entity_id": "<entity uuid>",
"claim_text": "Evidence from NEJM 2024 study of 1,200 patients ...",
"supports": ["<research_source id>"],
"pii_fields": [],
"patientId": null,
"decay_at": "2026-10-22T00:00:00Z",
"supersedes": "<optional existing claim id to mark superseded atomically>"
}

Lifecycle state on insert: pending (the Critic will promote to active asynchronously — see agents).

Auth: @require_scope('knowledge.write') + entity referential-integrity check (422 if entity_id doesn't exist).

Audit: memory.claim.create — or memory.claim.supersede when supersedes is set.

Add a relationship (e.g., intervention --treats-> problem)

HTTP: POST /api/memory/relationships

{
"source_id": "<entity uuid>",
"target_id": "<entity uuid>",
"rel_type": "treats",
"attributes": {"evidence_class": "RCT"},
"patientId": null
}

Auth: @require_scope('memory.relationship.write') + referential-integrity on both ids.

Audit: memory.relationship.created.

Add in bulk (ingest)

SDK: MemoryClient.ingest_batch(entities, claims, relationships, chunk_size=500) — tolerates partial failure (FR-043-005), returns per-item outcomes.

HTTP: POST /api/memory/ingest

Audit: one memory.ingest.batch event with counts only; per-item events still emit.

2. Change

There are two flavors of change: content (rare, only when the content was wrong at write time) and lifecycle state (common — promote, supersede, retract).

Change lifecycle state

Every entity and every claim has a lifecycle_state field whose transitions are machine-validated. Invalid transitions return HTTP 422.

HTTP (claim): PATCH /api/memory/claims/{claim_id}/state

{
"target_state": "superseded",
"supersedes": ["<new claim id>"]
}

HTTP (entity): PATCH /api/memory/entities/{entity_id}/state

{"target_state": "retracted"}

Claim transitions (enforced in claim_repository.validate_claim_transition()):

Entity transitions (enforced in entity_repository.validate_transition()):

From \ Toactivesupersededretractedarchived
pending
active
superseded
retracted
archived(terminal)

Auth on PATCH claims/{id}/state: @require_scope('memory.claim.admin') — admin-only. A non-admin caller receives 403 + a memory.govern.unauthorized_access_attempt audit event.

Auth on PATCH entities/{id}/state: memory.write + entity ACL check. Retirement attempts (active → retracted / archived) additionally emit memory.govern.retire_attempt with {actor_id, entity_id, action_detail: "approved"|"denied"}.

Audit: memory.claim.state_transition / memory.entity.state_transition — payload includes from_state, to_state, actor_id (salted hash), patientId, and the supersedes list when set.

Change content

When a claim's text is wrong (as opposed to its conclusion being outdated), emit a new claim and mark the old one superseded atomically:

POST /api/memory/claims
{
"entity_id": "<same entity id>",
"claim_text": "<corrected text>",
"supports": ["<source ids>"],
"supersedes": "<old claim id>"
}

This produces two audit rowsmemory.claim.create for the new claim and memory.claim.supersede for the old — both hash-chained so the correction is tamper-evident.

Direct PATCH of claim_text_encrypted or entity name is not exposed. If truly needed (e.g., a PHI accident), route through /hipaa-incident-response — the incident response skill records the justification, performs a compensating write, and documents the chain in the incident log.

3. Delete

There is no hard delete. "Delete" is always one of:

IntentTarget stateEmits
Rejected by Critic / bad evidenceretractedmemory.claim.state_transition (to retracted)
Superseded by newer claimsuperseded + new claimmemory.claim.supersede
Broken provenance (source retracted)retractedLibrarian memory.govern.broken_provenance
Decayed / eroded over timearchivedReplicator memory.govern.eroded
Operator pulled from circulationretractedarchivedmemory.entity.state_transition x2
Quarantined claim rejected by operatorretractedOperator drain — memory.claim.state_transition

Why no hard delete?

  • Principle IV (Audit by Default) — the audit chain (seq, prevHash, eventHash via RFC 8785 JCS + SHA-256) is append-only. A hard delete would break the chain and trigger a Sev 0 alert from the hourly AuditVerificationJob.
  • Supersession traceabilityget_claim_chain() walks backwards through supersedes links; a missing parent is treated as tamper.
  • HIPAA 164.316(b) — 6-year retention of audit-relevant artifacts.

Retention eventually disposes: entities + claims ride the patient's data-retention policy; correlator / replicator / librarian findings have a 180-day TTL. Disposition is logged as memory.govern.retention_disposed, never as a raw DB delete.

4. Confidence & freshness — how they're determined

Confidence and freshness are the two signals that decide whether a claim keeps driving retrieval or gets pulled for review. They are computed by four cooperating mechanisms — two per-claim scorers and two background jobs.

4.1 Confidence score

Claim.confidence_score is a 0.0..1.0 float persisted on the document. Two code paths write it:

  1. At ingest — the Researcher sets an initial confidence_score based on the source and its own retrieval-route score. Stored in memory_claims.confidence_score at POST /api/memory/claims time.
  2. At Critic promotion — the Critic agent (apps/research/research-engine/services/agents/critic/experiment.py) prompts the LLM against the claim + its cited sources and returns a CriticVerdict whose score reshapes into one of: promote / supersede / reject / quarantine. The LLM's numeric confidence is passed through to drive the verdict.

Tie-break rule — when two claims conflict on the same entity, CriticJob._lower_confidence(a, b) (apps/research/memory-store/services/jobs/critic.py) picks the lower-confidence one as the loser and emits memory.claim.state_transition to superseded. Ties fall back to t_created (older wins).

Shipped today (spec 063, PR #189):

  • confidence_score stored on every claim and read by CriticJob._lower_confidence() for pairwise tie-breaks.
  • Critic LLM verdict drives promote / supersede / reject / quarantine.
  • Source authority tiersSourceTier enum + canonical SOURCE_TIER_MAP in patientrx_contracts.enums. Tier-1 (NEJM · JAMA · Cochrane · NCCN), tier-2 (PubMed · Elsevier · Wiley · LWW · ClinicalTrials.gov · ClinVar · DrugBank), tier-3 (patient-note · open-web · clinical-narrative). Unknown connector → tier-3 (fail-safe low). Critic prompt rule #8 weights tier-1 > tier-2 > tier-3 and forbids tier-3 → tier-1 supersession.
  • Critic-verdict score persists onto the claimPromotionPipeline._apply_verdict forwards verdict.score as confidence_score on promote + supersede via the memory-store PATCH endpoint, applied atomically with the state-transition transaction.

4.2 Freshness score

Freshness today is a boolean per-field staleness check, not a numeric score. It lives in apps/research/memory-store/services/completeness/computer.py:

def _is_stale(claim, freshness_days: int, now: datetime) -> bool:
age_days = (now - claim.t_created).days
return age_days > freshness_days

freshness_days is declared per field, per screen profile (Spec 016 Screen Profile freshnessRules). Example: a vitals reading might carry freshness_days: 1 on an inpatient screen and freshness_days: 30 on a clinic follow-up screen.

The completeness scorer folds stale fields into a 0..1 screen-level score:

screen_completeness = (present_fields + 0.5 × stale_fields) / total_required_fields

So a stale field is half-present — it still exists, but it's discounted until a refresh lands.

What determines staleness per claim:

InputSource
claim.t_createdwrite time (Mongo field)
claim.t_invalidsupersede / retract time (Mongo field)
freshness_daysscreen profile YAML (apps/api/src/modules/screen-profiles/registry/profiles/*.ts)
lifecycle_state gateonly active claims enter the staleness check; superseded / retracted / archived are skipped

Shipped today: per-field boolean staleness per screen profile; half-weight in the completeness score.

Planned (follow-up): a numeric claim-level freshness score (0..1) combining source age, source authority tier, and supersession depth into a single value. Today the boolean staleness check + FreshnessConfidenceJob's age-based gate (see §4.4) together cover the operational need; the single-number scorer is a reporting/UI convenience, not a gap in enforcement.

4.3 Decay / TTL job

apps/research/memory-store/services/jobs/decay.py — runs on a scheduler configured in jobs_config.yml:

  • Interval: 3600s (1 hour) by default
  • Scan: memory_claims.find({"decay_at": {"$lt": now}, "lifecycle_state": {"$in": ["active", "pending"]}})
  • Action:
    • PHI-bearing claims → claim_repo.anonymize_expired() (per-field redaction per pii_fields[], not a delete, per FR-042-012)
    • Hypotheses + non-PHI relationships → hard delete (they carry no audit weight)
  • Audit events: memory.job.decay.started / .anonymized / .deleted / .completed / .error (FR-042-009)

decay_at is set at claim ingest time by the Researcher — a clinical-narrative claim might carry a 2-year decay_at; a research-article claim might carry 10 years. It is a privacy-retention horizon, not a confidence decay.

4.4 Re-research triggers — Replicator + FreshnessConfidenceJob

Two complementary jobs decide when an active claim gets re-surfaced to the Critic.

Replicator (verdict-driven, weekly)

apps/research/research-engine/services/workers/replicator_job.py — runs weekly on a sample of aged active claims:

  1. Re-fetches the cited sources (via the 12 trusted-source connectors).

  2. Scores each claim as one of still_supported / eroded / retracted_source / broken_provenance.

  3. On eroded or retracted_source → calls _enqueue_recritique_task() which writes a new research_tasks row with:

    {
    "agent_role": "critic",
    "research_question": "<claim text>",
    "entity_context": ["<entity ids>"],
    "metadata": {"source": "replicator", "original_claim_id": "..."}
    }
  4. The next Critic tick dequeues and re-evaluates; typical outcome is a supersede-or-retract transition.

Audit event: memory.govern.eroded / memory.govern.retracted_source + research.task.enqueued.

FreshnessConfidenceJob (threshold-driven, hourly)

apps/research/research-engine/services/workers/freshness_confidence_job.py — spec 063 FR-063-003. Runs hourly, scans memory_claims for:

state == "active"
AND (confidence_score < confidence_threshold OR t_created < now - max_age_sec)
AND freshness_recheck_requested_at NOT in backoff window

For each match it enqueues a Critic re-task via TaskQueueManager (same shape as the Replicator's re-task payload) with metadata.trigger = "freshness_confidence" and trigger_reason in {below_confidence, aged, below_confidence_and_aged}. The job then stamps freshness_recheck_requested_at = now on the claim so perpetually-marginal claims don't re-enqueue every tick.

Defaults (configurable in services/config/worker_config.yml):

SettingDefaultMeaning
freshness_confidence_enabledfalse (opt-in)Flip to true after reviewing thresholds
freshness_confidence_interval_sec3600 (1h)Tick cadence
freshness_confidence_threshold0.5Claims with confidence_score strictly below trigger a re-check
freshness_confidence_max_age_sec7,776,000 (90d)Age at which even a high-confidence claim gets pro-forma re-checked
freshness_confidence_recheck_backoff_sec604,800 (7d)Per-claim cool-off between re-enqueues
freshness_confidence_sample_limit50Per-tick cap to bound LLM cost

Audit events: research.freshness_confidence.tick (per-tick summary with counts + thresholds) + research.freshness_confidence.enqueued (per claim, with reason + confidence_score + task_id). Metadata-only — no claim text or PHI in either payload.

Shipped today (spec 063): both the verdict-driven Replicator path and the threshold-driven FreshnessConfidenceJob. The job is opt-in (freshness_confidence_enabled: false by default) so operators can review settings before turning it on.

4.5 Supersession & compaction

apps/research/memory-store/services/repositories/claim_repository.py:

  • supersede_transaction() — atomic Motor transaction: inserts new claim + flips old claim's lifecycle_state to superseded + sets t_invalid = now + appends the new claim id to the old claim's superseded_by[].
  • get_claim_chain(claim_id) — walks supersedes[] backward through the chain so the UI can show the full provenance trail of a corrected claim.
  • No separate compaction job — t_invalid is the soft-tombstone; claims remain queryable for 6-year audit retention, filtered out of default retrieval by lifecycle_state="active".

4.6 What "confidence & freshness" means at query time

The retrieval surface (POST /api/memory/retrieve, Spec 060) filters on:

  1. lifecycle_state == "active" — superseded / retracted / archived never surface.
  2. acl.roles + ResolvedScope — consent + role gates.
  3. decay_at > now — the serializer drops any claim whose decay horizon passed but the decay job hasn't swept yet.
  4. Implicit freshness — the caller passes a screen_profile and the retriever uses its freshnessRules to mark stale fields in the response metadata so the UI can show a "data from N days ago" chip.

No numeric confidence threshold is applied at query time. Enforcement happens asynchronously via FreshnessConfidenceJob (§4.4), which re-enqueues marginal claims for Critic re-evaluation so the next retrieval sees either a refreshed confidence_score or a transition to superseded/retracted.

Shipped vs. planned — honest status

SignalStatusFile
confidence_score persisted on claim✅ shippedmemory_claims.confidence_score
Critic LLM verdict → promote / supersede / reject / quarantine✅ shippedresearch-engine/agents/critic/experiment.py
Critic verdict.score persists onto the claim post-promotion✅ shipped (spec 063)research-engine/workers/promotion_pipeline.py
Source authority tiers (NEJM tier-1, PubMed tier-2, patient-note tier-3)✅ shipped (spec 063)patientrx_contracts/enums.py
Per-field _is_stale boolean✅ shippedmemory-store/completeness/computer.py
Numeric 0..1 freshness scorer📋 planned— (boolean + FreshnessConfidenceJob cover enforcement; scorer is UI convenience)
TTL decay job (hourly, anonymizes PHI)✅ shippedmemory-store/jobs/decay.py
Replicator re-research on eroded / retracted_source✅ shippedresearch-engine/workers/replicator_job.py
Auto-trigger re-research on low-freshness / low-confidence✅ shipped (spec 063, opt-in)research-engine/workers/freshness_confidence_job.py
Supersession atomic txn + t_invalid tombstone✅ shippedmemory-store/repositories/claim_repository.py

End-to-end example — retract a claim

Required consents, roles, and scopes — cheat sheet

OperationHTTPScopeRole gateAudit
Add entityPOST /entitiesmemory.writewritermemory.entity.create
Add claimPOST /claimsknowledge.writewritermemory.claim.create / supersede
Add relationshipPOST /relationshipsmemory.relationship.writewritermemory.relationship.created
Bulk ingestPOST /ingestmemory.ingest.batchwritermemory.ingest.batch
Change entity statePATCH /entities/{id}/statememory.writewritermemory.entity.state_transition (+ memory.govern.retire_attempt on retire)
Change claim statePATCH /claims/{id}/statememory.claim.adminadmin onlymemory.claim.state_transition
Supersede claimPOST /claims w/ supersedesknowledge.writewritermemory.claim.supersede
Retract / archivestate transition (no hard delete)as aboveas aboveas above
Read / retrievePOST /retrieve, GET /entities/{id}memory.read / memory.graph.readconsent-scopedmemory.retrieval.hybrid (count-only)
MCP toolsClaude Coderead-onlymemory.mcp.tool_invoked

Never do this

  • Never call MongoDB.deleteOne() on memory_* collections from application code — always go through a state transition. The change-stream pipelines (Critic promotion, Replicator decay) assume every disappearance is preceded by a state_transition audit row.
  • Never mutate claim_text_encrypted or entity name in place — always supersede via a new claim, so the correction is in the audit chain and callers reading get_claim_chain() can see the correction.
  • Never skip the ACL / consent check by using the MCP or the repository directly — the HTTP / SDK path is the only one that runs AclEnforcer + ResolvedScope + AuditClient in the correct order.
  • Never emit a write from the Researcher agent without a supports[] source — the Critic will retract it at the next SLA tick; better to fail fast.
  • Never bypass the state machine with a direct updateOne({lifecycle_state: ...})validate_transition() is the only legal gate.

Where to read more