Skip to main content
All posts
Engineering3 min read

Building Patient Support: How We Created 2,792 Patients From a Knowledge Graph

How PatientSupport.AI uses PrimeKG and a cohort factory to power patient support with medically-grounded synthetic personas.

PatientSupport Engineering

Engineering Team

·
Building Patient Support: How We Created 2,792 Patients From a Knowledge Graph

The Pipeline

Generating a synthetic patient that feels real requires more than a language model and a prompt. It requires structured medical data, demographic grounding, and a system that ensures no two patients tell the same story.

Here's how we do it.

Step 1: Disease Normalization

When a user types "diabetes" into the onboard flow, we don't just search for "diabetes." We normalize it against PrimeKG's ontology of 17,080 diseases, mapping to the correct MONDO identifier and pulling its full comorbidity subgraph.

This means "Type 2 diabetes" resolves to its known associations: obesity, hypertension, hyperlipidemia, depression, peripheral neuropathy, diabetic retinopathy — each with prevalence weights from the knowledge graph.

Step 2: Demographic Sampling

Once we have the disease and its comorbidity network, we sample demographics using CDC NHANES 2021-2023 prevalence tables. These tables are triple-weighted:

  • Age band prevalence — how common is this condition in each decade of life?
  • Sex-specific modification — does prevalence differ by gender?
  • Race-specific modifiers — how does prevalence vary across ethnic groups?
This ensures our generated patients reflect real-world demographic patterns, not random distributions.

Step 3: The Five-Role Cohort Factory

Every support group contains exactly five personas, each assigned a structural role:

The Mirror (closest demographic match) shares the user's age range, race, and geography. Their job is validation — "I'm going through the same thing."

The Veteran (longest disease duration) has lived with the condition for years. They offer hard-won practical knowledge.

The Navigator (research-oriented) carries an extra comorbidity from the PrimeKG subgraph. They've done the homework.

The Ally (emotional support) leads with empathy. They ask questions more than they give advice.

The Specialist (condition expert) references a specific medication from the drug-disease edges in PrimeKG. They know the details.

Step 4: Backstory Generation

Each persona gets a backstory that integrates their medical data with life context:

  • Occupation matched to age, gender, and geography
  • Family situation appropriate to their demographics
  • Daily routine shaped by their conditions
  • Communication style that reflects their personality role
  • Barriers to care specific to their insurance and location
  • A hidden layer — something they won't share until trust is built
These backgrounds are deterministic (same seed = same output) and have been QA-audited across 17 consistency rules with zero violations across all 2,792 profiles.

Step 5: Verification

Before any persona is served to a user, it passes through a verification pipeline that checks:

  • Distributional accuracy — does the cohort's demographic mix match CDC prevalence?
  • Clinical concordance — are the comorbidities and medications consistent with the primary condition?
  • Privacy — does the profile inadvertently match a real person?
  • Equity — are underserved populations proportionally represented?

The Numbers

  • 17,080 diseases in PrimeKG
  • 177 conditions with pre-seeded personas
  • 2,792 total pre-seeded patients
  • 6 ethnic groups represented proportionally
  • 5 structural roles per support group
  • 0 real patient data used
PrimeKGknowledge graphpatient supportsynthetic patientscohort factoryCDC NHANES

Talk to patients who understand

AI-generated support groups built from real medical data. 30 seconds to start.

Get Started