Medical Knowledge Representation
Creating an open-data, validated knowledge graph mapping disease concepts to medications using standardized medical terminologies
Mission: Create a high-quality, explainable knowledge graph of disease-medication relationships for use in GraphRAG-powered medical AI systems where transparency, validation, and consistency are critical. Synapse MKR prioritizes validated knowledge engineering of relationships—the biggest missing piece from biomedical ontologies—starting with disease-medication relationships.
Synapse MKR addresses a fundamental challenge in medical AI: the need for structured, verifiable knowledge rather than opaque neural network predictions. By mapping standardized disease concepts to medications with explicit relationship types, we enable:
Traditional large language models (LLMs) face critical limitations in healthcare:
GraphRAG (Graph-based Retrieval Augmented Generation) solves these issues by:
Here are a few ways this knowledge graph could power better healthcare.
"Remember your medications, rediscover your conditions"
Reverse medication-to-condition lookup is only possible with structured knowledge graphs. LLMs alone cannot reliably infer which conditions a patient likely has from their medication list without hallucination risk.
"See how your treatments connect to your health"
Interactive visual showing patient's conditions with clear lines connecting to each medication. Color-coded by relationship type:
Interaction: Tap a medication → highlights what it treats. Tap a condition → highlights all its treatments.
Split-screen visualization showing comprehensive treatment overview:
Provides explicit, auditable reasoning for each medication-condition link. "Explainable AI" - physicians see WHY connections exist with cited evidence, not black-box predictions. Same graph, two interfaces, different audiences.
"Safe prescribing for complex patients"
65% of Medicare patients have 2+ chronic conditions. Each additional condition exponentially increases drug-disease interaction risk. LLMs struggle with simultaneous multi-disease reasoning.
Before prescribing Drug X for Disease Y, GraphRAG queries patient's complete condition list against Synapse MKR's CONTRAINDICATED_IN relationships:
Knowledge graphs excel at multi-hop reasoning. Checking contraindications across multiple diseases simultaneously is computationally efficient with graph traversal (milliseconds) versus LLM reasoning (seconds, with hallucination risk).
"Take the right pill at the right time"
Patients on 5+ medications struggle to remember which pills are truly critical versus which have flexible timing. All medications feel equally urgent, leading to either anxiety or poor adherence.
GraphRAG categorizes medications by their relationship type using Synapse MKR:
Relationship types (TREATS_CURATIVE vs MANAGES_CHRONIC) provide inherent prioritization logic. This is native to knowledge graphs but would require complex prompting and fine-tuning with pure LLM approaches.
"Simplify complex medication lists"
Patients on 10+ medications often have therapeutic duplication, unnecessary drugs, or orphan medications (no clear indication). Manual reconciliation during hospital admission or care transitions is time-consuming and error-prone.
GraphRAG maps each medication to patient's documented conditions using Synapse MKR, then identifies issues:
Knowledge graphs excel at finding "what's connected" and "what's missing." Identifying orphan medications (drugs without mapped conditions) is a straightforward graph query but complex for LLMs without structured reasoning.
Every treatment recommendation includes a traceable path: Disease → [Relationship Type + Evidence] → Medication. Clinicians can audit AI reasoning.
FDA-approved indications only. Explicit contraindication tracking. Version-controlled knowledge base supports reproducibility for regulatory submissions.
Update 10 edges for new guidelines vs. retraining 70B parameter models. Changes propagate instantly across all queries.
Built on standardized medical terminologies—the same standards used in EPIC, Cerner, and other major EHR systems.
Automated consistency checks catch contradictions (e.g., drug both treats and contraindicated in same disease). Human review workflow for safety-critical relationships.
Graph traversal handles complex queries: "Find medications for patient with HTN + CKD + Diabetes that don't require dose adjustment."
Graph updates cost <$5K vs. $100K-500K for LLM retraining. Queries run in milliseconds with O(log n) indexed lookups, especially useful in on-premises and low-power devices.
Enables pharmacovigilance tracking, treatment pattern analysis, gap identification, and hypothesis generation for drug repurposing.
Multi-model AI validation pipeline with targeted human expert review enables efficient scaling while maintaining regulatory compliance and clinical accuracy standards.
Initial validation demonstrates the viability of this multi-model LLM methodology for generating high-quality medical knowledge graphs. Preliminary review by a clinical informatics MD/PhD shows accuracy and coverage of granular relationship types and usage estimates substantially exceeding existing open data resources.
230 Disease-medication relationships generated
50 Priority disease concepts covered
Phase 1 Schema design — Complete ✓
Phase 2 Multi-LLM generation — Complete ✓
Phase 3 Automated code validation — In Progress
Relationships span high-impact disease categories across organ systems:
HTN, HLD, CAD, HFrEF, HFpEF, AF, Stable angina, DVT/PE, PAD
T2DM, T1DM, Hypothyroidism, Hyperthyroidism, Osteoporosis, Obesity
Asthma, COPD, Pneumonia, Acute bronchitis, Allergic rhinitis
MDD, GAD, Bipolar disorder, ADHD, Insomnia, PTSD
GERD, IBS, Constipation, Crohn's disease, Ulcerative colitis
UTI, Strep pharyngitis, Influenza, Cellulitis, Acute otitis media
Osteoarthritis, Rheumatoid arthritis, Gout, Low back pain
Migraine, Epilepsy, Parkinson's disease, Essential tremor
CKD, Anemia, BPH
Unique identifier for this node instance
SNOMED-CT concept identifier
UMLS Concept Unique Identifier for cross-terminology mapping
Full clinical name (e.g., "Essential hypertension")
Common abbreviation used in clinical documentation (e.g., "HTN", "T2DM")
SNOMED-CT semantic type (e.g., "disorder", "finding")
Unique identifier for this node instance
RxNorm concept identifier at ingredient level (IN or MIN for combinations)
UMLS Concept Unique Identifier
Generic drug name (e.g., "Metformin", "Lisinopril")
True if medication contains multiple active ingredients
WHO Anatomical Therapeutic Chemical classification code
Therapeutic class (e.g., "ACE Inhibitor", "Biguanide")
Brief description of how the drug works
| Type | Definition | Examples |
|---|---|---|
PREVENTS |
Prophylactic use to prevent disease occurrence | Aspirin for MI prevention, Vaccines for infectious disease |
TREATS_CURATIVE |
Intent is disease resolution or elimination of causative agent | Antibiotics for bacterial infections, Antivirals for acute viral illness |
MANAGES_CHRONIC |
Disease modification or maintenance therapy (no cure expected) | Insulin for diabetes, Statins for hyperlipidemia, ACE-I for hypertension |
TREATS_SYMPTOMATIC |
Symptom palliation without disease modification | NSAIDs for arthritis pain, Antiemetics for nausea |
FIRST_LINE - Guideline-recommended initial therapySECOND_LINE - Recommended after first-line failureADJUNCTIVE - Used in combination with other therapiesCONTRAINDICATED_IN - Absolute contraindicationCAUTION_IN - Relative contraindication or monitoring requiredREQUIRES_DOSE_ADJUSTMENT - Dosing modification needed (renal/hepatic/age-based)Unique identifier for this relationship
One of the relationship types listed above
Whether this indication is FDA-approved (all edges in current dataset are TRUE)
Source of evidence (e.g., "DailyMed", "FDA Label", "AHA/ACC 2017 Guidelines")
Identifier for evidence source (e.g., SPL setid, DOI)
Estimated clinical usage frequency (1 = highest)
One of: pending, approved, rejected, needs_revision
Confidence score for this relationship
Version number (increments on modification)
Whether this is the current active version
Flat file format suitable for review in Excel/Google Sheets and import into databases.
edge_id,disease_sctid,disease_cui,disease_name,disease_abbrev,med_rxcui,med_cui,med_name,relationship_type,modifiers,fda_approved,evidence_source,estimated_rank,review_status,version,is_current,atc_code,drug_class,moa
550e8400-e29b-41d4-a716-446655440001,38341003,C0020538,Essential hypertension,HTN,29046,C0020649,Lisinopril,MANAGES_CHRONIC,FIRST_LINE,TRUE,AHA/ACC 2017,88,pending,1,TRUE,C09AA03,ACE Inhibitor,ACE inhibition
550e8400-e29b-41d4-a716-446655440002,38341003,C0020538,Essential hypertension,HTN,2599,C0025894,Losartan,MANAGES_CHRONIC,FIRST_LINE,TRUE,AHA/ACC 2017,82,pending,1,TRUE,C09CA01,ARB,Angiotensin II receptor antagonism
550e8400-e29b-41d4-a716-446655440003,38341003,C0020538,Essential hypertension,HTN,17767,C0004147,Amlodipine,MANAGES_CHRONIC,FIRST_LINE,TRUE,AHA/ACC 2017,84,pending,1,TRUE,C08CA01,Dihydropyridine Calcium Channel Blocker,Blocks L-type calcium channels
550e8400-e29b-41d4-a716-446655440159,44054006,C0011860,Type 2 diabetes mellitus,T2DM,6809,C0025598,Metformin,MANAGES_CHRONIC,FIRST_LINE,TRUE,ADA 2024,96,pending,1,TRUE,A10BA02,Biguanide,Decreases hepatic glucose production
550e8400-e29b-41d4-a716-446655440008,55822004,C0020473,Hyperlipidemia,HLD,83367,C0004147,Atorvastatin,MANAGES_CHRONIC,FIRST_LINE,TRUE,ACC/AHA 2018,94,pending,1,TRUE,C10AA05,Statin,HMG-CoA reductase inhibition
Structured Linked Data format following schema.org conventions, suitable for Neo4j, RDF triple stores, and semantic web applications.
{
"@context": "http://schema.org/",
"@graph": [
{
"@type": "MedicalCondition",
"@id": "snomed:38341003",
"identifier": "38341003",
"sameAs": "umls:C0020538",
"name": "Essential hypertension",
"alternateName": "HTN",
"codeValue": "C0020538",
"codingSystem": "UMLS"
},
{
"@type": "Drug",
"@id": "rxnorm:29046",
"identifier": "29046",
"sameAs": "umls:C0065374",
"name": "Lisinopril",
"drugClass": "ACE Inhibitor",
"mechanismOfAction": "Angiotensin-converting enzyme inhibition",
"code": {
"@type": "MedicalCode",
"codeValue": "C09AA03",
"codingSystem": "ATC"
}
},
{
"@type": "TherapeuticRelationship",
"@id": "uuid:550e8400-e29b-41d4-a716-446655440000",
"source": "snomed:38341003",
"target": "rxnorm:29046",
"relationshipType": "MANAGES_CHRONIC",
"modifier": ["FIRST_LINE"],
"evidenceSource": "AHA/ACC 2017",
"evidenceLevel": "A",
"usageRank": 1,
"reviewStatus": "approved",
"version": 1,
"isCurrent": true,
"dateCreated": "2026-02-03T00:00:00Z"
}
]
}
Direct graph database loading script for Neo4j or compatible graph databases.
// Create Disease Node
CREATE (d:Disease {
node_id: '550e8400-e29b-41d4-a716-446655440100',
snomedct_id: '38341003',
umls_cui: 'C0020538',
preferred_term: 'Essential hypertension',
clinical_abbreviation: 'HTN',
is_active: true,
snomed_version: '2025-01',
last_updated: datetime()
})
// Create Medication Node
CREATE (m:Medication {
node_id: '550e8400-e29b-41d4-a716-446655440200',
rxcui: '29046',
umls_cui: 'C0065374',
ingredient_name: 'Lisinopril',
is_combination: false,
atc_code: 'C09AA03',
drug_class: 'ACE Inhibitor',
mechanism_of_action: 'Angiotensin-converting enzyme inhibition',
rxnorm_version: '2025-01',
is_active: true
})
// Create Relationship
CREATE (d)-[r:MANAGES_CHRONIC {
edge_id: '550e8400-e29b-41d4-a716-446655440000',
modifiers: ['FIRST_LINE'],
fda_approved: true,
evidence_source: 'AHA/ACC 2017',
estimated_usage_rank: 1,
review_status: 'approved',
mapping_confidence: 0.98,
version: 1,
is_current: true,
created_date: datetime()
}]->(m)
// Create Indexes
CREATE INDEX ON :Disease(snomedct_id);
CREATE INDEX ON :Disease(umls_cui);
CREATE INDEX ON :Disease(clinical_abbreviation);
CREATE INDEX ON :Medication(rxcui);
CREATE INDEX ON :Medication(umls_cui);
We will employ a multi-model LLM approach with human oversight to ensure accuracy and clinical validity. The Synapse MKR methodology is designed to scale efficiently while maintaining high quality through automated validation and targeted expert review.
Development of comprehensive knowledge graph schema optimized for disease-medication relationships. Defines node types (Disease, Medication), relationship types (TREATS_CURATIVE, PREVENTS, MANAGES_CHRONIC, TREATS_SYMPTOMATIC, CONTRAINDICATED_IN, REQUIRES_DOSE_ADJUSTMENT), and metadata fields aligned with standardized medical terminologies.
Three-model validation pipeline for initial relationship generation:
• Model 1 (Claude Sonnet 4.5 Thinking): Primary relationship generation with focus on clinical accuracy and evidence-based medicine. Generates candidate disease-medication pairs with relationship types, modifiers, and clinical rationale.
• Model 2 (Gemini 3 Pro): Independent cross-validation and correction. Validates relationship types, identifies missing critical medications, flags potential contraindications, and corrects errors from Model 1.
• Model 3 (GPT 5.2 Thinking): Final arbitration and enrichment. Reconciles disagreements between Models 1 and 2, adds missing metadata (ATC codes, drug classes, mechanisms of action), and assigns confidence scores based on consensus.
Automated Consistency Checks: Validation rules ensure no duplicate relationships, verify relationship hierarchy logic, check contraindication structure, and identify orphaned nodes.
Quality Assurance Criteria: All relationships must trace to FDA labeling or major clinical guidelines (AHA/ACC, IDSA, ADA, GINA, GOLD, etc.). Relationships require agreement from ≥2 of 3 LLMs or explicit human override. Safety-critical contraindications flagged for mandatory expert review.
LLM-based validation of medical codes (SNOMED-CT, RxNorm, UMLS CUIs, ATC codes) grounded in authoritative vocabularies. Automated verification ensures code-concept alignment and corrects common mapping errors.
Multi-LLM validation and discovery of relationships grounded in vetted medical sources and vocabularies. Evidence sources include:
• MED-RT
• DailyMed (FDA-approved drug labeling)
• Drugs@FDA (FDA approval database)
• PubChem (NIH chemical database)
• PubMed Central open-access subset
• ClinicalTrials.gov
• Major society clinical guidelines (AHA/ACC, IDSA, ADA, etc.)
• UMLS knowledge sources and vocabularies
Clinical specialists from respective domains review and validate relationships within their expertise. Experts adjudicate flagged conflicts, validate safety-critical contraindications, and approve final relationship set for their specialty area.
Expansion of validated methodology to increase coverage of current relationship type (disease-medication) and extension to additional clinical relationship types (drug-drug interactions, disease-disease comorbidities, medication-lab test relationships, etc.).
Real-world validation of knowledge representation in GraphRAG systems. Clinical utility studies measuring impact on diagnostic accuracy, treatment appropriateness, and clinical decision support performance.
Development Status:
Known Gaps:
⚠️ NOT FOR CLINICAL USE
This dataset is provided for research, education, and development purposes only. It is explicitly NOT intended for clinical decision-making, patient care, or any medical treatment decisions.
Requirements for Any Use:
The dataset is provided "AS IS" without warranty of any kind, express or implied, including but not limited to warranties of accuracy, completeness, merchantability, or fitness for a particular purpose.
The creators, contributors, and distributors of this dataset shall not be liable for any direct, indirect, incidental, special, consequential, or exemplary damages arising from the use of this data, including but not limited to medical errors, patient harm, or regulatory non-compliance.
By accessing or using this dataset, you acknowledge that:
Users must ensure compliance with:
For questions about appropriate use, validation requirements, or regulatory compliance, please consult qualified legal counsel and clinical experts in your jurisdiction.
The initial dataset from Phase 2 (multi-model LLM generation) is now available for research and development use.
File Available:
synapse_mkr_phase2_20260203.zip - Complete dataset package containing:
Coming Soon:
By downloading this dataset, you acknowledge and agree to the following:
The ZIP file contains all necessary documentation and license files
License: CC BY 4.0 (Creative Commons Attribution)
What this means:
Last Updated: 2026-02-03