Data Model
One Health persists to a single MongoDB Atlas cluster (patientrx DB). Collections split into primary-tier (core product — patients, encounters, notes, consent, audit) and research-tier (knowledge graph, agents, decision graph).
Primary-tier collections
| Collection | Key fields | PHI encrypted? |
|---|---|---|
users | (idpProvider, idpSubject) | ✓ display name, email |
patients | mrn, dob, addr, phone, insurance | ✓ all |
encounters | patientId, type, observations | ✓ narrative |
notes | encounterId, text, attachments | ✓ text, attachments |
care_teams | patientId, members[], delegation_type | partial |
referrals | patientId, from_provider, to_specialist, status, reason_text | ✓ reason_text |
consents | patientId, scope, granted_at, revoked_at | no (metadata only) |
audit_events | seq, prevHash, eventHash, actor, action, payload | per-action |
organizations | type, members[], affiliations[] | no |
family_groups | patientId, members[], minor_accounts[] | partial |
notifications | recipientId, senderId, type, body | ✓ body |
ehr_imports | external_id, fhir_resource_type, imported_at | per-field |
refresh_sessions | user_id, issued_at, last_used | no (opaque ids) |
Research-tier collections
| Collection | Schema | Purpose |
|---|---|---|
memory_entities | Entity | Patient / provider / hospital / foundation / organization nodes + taxonomy_category |
memory_relationships | Relationship | Typed edges (generalization_of, mechanism_for, contradiction_of, combination_of, refinement_of, amplification_of) |
memory_claims | Claim | Facts with lifecycle_state + supports[] + pii_fields[] |
research_sources | — | 12 trusted-source registry |
research_tasks | ResearchTask | Queue items (3 priority tiers) |
orchestration_candidates | CandidateHypothesis | 6-phase lifecycle |
orchestration_runs | OrchestrationRun | Full DETECTED → DISPATCHED → COMPLETED flow |
correlator_findings | — | Pattern-match results (180d TTL) |
replicator_findings | — | Aged-claim re-research |
librarian_findings | — | Provenance validation |
experiment_traces | Experiment | Chain-of-thought + retrieval context (encrypted) |
decision_graph_nodes | — | 15 node types |
candidate_outcomes | — | HCP decisions → DPO training |
llm_call_logs | — | LLM call history, envelope-encrypted content, 30-day anonymization |
retrieval_audit | — | Per-request classifier + route + latency (count-only entities) |
Enum catalog
Canonical definitions in packages/patientrx-contracts/patientrx_contracts/enums.py:
EntityType · LifecycleState · Origin · PriorityTier · ResearchDepth · TaskStatus · CandidateStatus · RunStatus · ConfidenceLevel · ModelTier · CriticVerdict · StakeholderDecision · TransformationType · AmplificationType · QualityTier · RoutingReason · DetectorType · DecisionNodeType · ContributionType · QueryType · RetrievalStrategy
Indexes
Every PHI-serving collection has a composite (patientId, updatedAt) index. Audit_events has unique index on seq. Retrieval routes use taxonomy_category index (idx_taxonomy_state). Full-text uses Atlas Search indexes entities_fts and claims_fts.
Encryption envelope
Every PHI field stored in MongoDB carries its own envelope. KEK rotation (via /hipaa-rotate-key) re-wraps all DEKs without touching ciphertext. No plaintext PHI ever lands in logs, backups (via codec), or eval exports.