The Problem With Real Patient Data
Health research has a representation problem. Clinical trials skew white, male, and urban. Electronic health records reflect the biases of the systems that created them. And when we try to build tools for underserved communities, we often lack the data to do it well.
This isn't just an academic concern. It translates directly into worse outcomes for Black, Indigenous, Hispanic, and rural populations — groups that are simultaneously most affected by chronic disease and least represented in the data we use to address it.
What Synthetic Patients Actually Are
A synthetic patient is an AI-generated health profile built from real medical data sources — not from any individual person's records. Each profile is constructed by combining:
- Disease relationships from PrimeKG (Harvard Dataverse), which maps 17,080 diseases to their comorbidities, drugs, and genetic factors
- Demographic prevalence data from CDC NHANES 2021-2023, triple-weighted for accuracy
- Clinical knowledge from leading hospital systems covering symptoms, causes, and risk factors
Why This Matters for Equity
Traditional health data has blind spots. Synthetic data lets us fill them deliberately.
When we generate a support group for someone with Type 2 diabetes, we don't just create five generic patients. We build a diverse cohort that reflects the actual demographic landscape of that condition:
- A 58-year-old Black woman in rural Mississippi managing diabetes alongside hypertension and depression
- A 34-year-old Hispanic man in Los Angeles navigating the condition without insurance
- A 72-year-old Indigenous elder dealing with diabetes complications on a reservation two hours from the nearest specialist
The Five-Role System
Every support group in PatientSupport.AI includes five distinct roles, each serving a different psychological function:
- The Mirror — Someone demographically similar to the user, validating their experience
- The Veteran — Someone who's lived with the condition longer, offering practical wisdom
- The Navigator — Someone who's researched everything, sharing what they've learned
- The Ally — Someone who leads with emotional attunement, making the user feel heard
- The Specialist — Someone with deep condition-specific knowledge
What We Don't Do
Synthetic patients are not:
- Medical advice. They share experiences, not prescriptions.
- Real people. We're transparent about this — always.
- Replacements for therapy. They're companions, not clinicians.
- Black boxes. Every profile links back to its data source.
Looking Forward
We currently serve 2,792 pre-seeded synthetic patients across 177 conditions. Each one has a rich background — an occupation, a family, daily routines, coping mechanisms, barriers to care — that makes their conversation feel real without using any real person's data.
The question isn't whether synthetic patients are as good as real ones. It's whether they can reach people that the current system doesn't. We believe they can.