After most medical appointments, patients leave with instructions they partially understood, questions they forgot to ask, and a diagnosis they will Google before they reach the parking lot. This is not a failure of individual doctors — it is a structural problem. The average primary care visit lasts 18 minutes. The average patient has 2.5 conditions. The math does not work.
So patients turn to the internet, where the information landscape ranges from peer-reviewed research to wellness influencers selling supplements. Google's own research has shown that health-related searches are among the most common queries worldwide, and that the quality of results varies dramatically. For a patient trying to understand what "stage 2 chronic kidney disease" means for their daily life, the gap between a Mayo Clinic overview and a fear-mongering blog post is the difference between informed self-management and midnight panic.
AI is entering this space with the promise of better answers. Some of that promise is real. Much of it is not. This post sorts through what AI health tools can actually do, where they fail, and why the architecture behind the AI matters more than the chatbot interface in front of it.
The Health Information Gap Is Real
The problem AI is attempting to solve is well-documented:
- Health literacy is low. According to the National Assessment of Adult Literacy, only 12% of American adults have proficient health literacy. The majority cannot reliably interpret a prescription label, understand a clinical trial consent form, or evaluate the quality of a health information source.
- Appointments are too short. The Journal of General Internal Medicine has published multiple studies showing that patients retain less than half of what their physician tells them during a visit. Complex diagnoses and treatment plans fare even worse.
- Search engines are not curated. A 2023 study in JAMA Network Open found that the quality of AI-generated health information varied substantially by topic and model, with some queries producing accurate, helpful responses and others generating plausible-sounding misinformation.
- Misinformation spreads faster. The WHO has called health misinformation an "infodemic." Social media platforms amplify unverified health claims at a scale and speed that fact-checking cannot match.
What AI Health Tools Can Do Today
AI health tools fall into several categories, each with different capabilities and limitations:
Symptom Checkers
Tools like Ada Health, Buoy Health, and Babylon (now eMed) use structured decision trees or AI models to help patients assess symptoms before seeking care. A 2023 systematic review in BMJ Open found that the best symptom checkers had triage accuracy comparable to phone-based nurse triage lines — useful as a first filter, not as a diagnostic tool.
Medical Chatbots
General-purpose AI chatbots (ChatGPT, Claude, Gemini) are increasingly used for health questions. A 2024 study in Nature Medicine found that GPT-4 passed the USMLE medical licensing exam and could answer many patient questions with reasonable accuracy. However, the same study noted significant variability — the model could generate confident, detailed responses that were factually wrong, particularly for rare conditions, drug interactions, and recent treatment guidelines.
Knowledge-Grounded Health AI
A newer category of tools grounds language model responses in structured medical data rather than relying solely on the model's training data. This approach — sometimes called retrieval-augmented generation (RAG) or knowledge-grounded AI — uses curated databases as a factual backbone, reducing (but not eliminating) hallucination.
PatientSupport.AI falls into this category, using the PrimeKG knowledge graph (Harvard Dataverse, Nature Scientific Data) as its medical data foundation. The difference between this approach and a generic chatbot is the difference between a system that looks up facts in a structured database and a system that predicts what facts probably are based on text patterns.
The Hallucination Problem
This is the section that matters most, and the one that most AI health companies prefer to skip.
Large language models hallucinate. They generate text that is fluent, grammatical, and wrong. In healthcare, hallucinated information can range from inconsequential (slightly incorrect prevalence statistics) to dangerous (fabricated drug interactions, invented contraindications, nonexistent clinical trials).
The evidence is sobering:
- A 2023 study in JAMA Internal Medicine compared AI chatbot responses to physician responses on a patient question forum. While the AI responses were rated as more empathetic and more complete on average, reviewers also identified factual errors in a meaningful proportion of AI-generated health responses.
- A 2024 preprint analyzing medical AI hallucinations found that models confidently cited nonexistent studies, fabricated dosage recommendations, and generated plausible clinical guidelines that did not exist. The hallucination rate varied by model and domain, but no model achieved zero hallucination in medical contexts.
- The National Academy of Medicine has published a framework for responsible AI in healthcare that explicitly identifies hallucination as a primary safety concern, recommending that all AI health tools include clear disclosure of their limitations.
Why Architecture Matters More Than Interface
Most AI health tools look similar from the outside — a chat interface where you type a question and get an answer. The critical differences are invisible to the user:
Generic Language Model (Ungrounded)
The model generates responses entirely from its training data — a statistical average of billions of web pages, textbooks, and forum posts. It cannot distinguish between a Cochrane review and a Reddit comment. When it does not know something, it generates a plausible-sounding response anyway.
Knowledge-Grounded System
The model's responses are anchored in a structured, curated dataset. When you ask about a disease, the system first retrieves relevant facts from the knowledge graph (disease relationships, gene associations, drug targets) and then generates a response informed by those facts. The facts are citable and traceable.
How PatientSupport.AI Approaches This
PatientSupport.AI uses PrimeKG — a knowledge graph with 17,080 diseases, 29,786 genes, and 4,050 drugs mapped across 4 million relationships (Chandak et al., Nature Scientific Data, 2023). When a user asks about a condition:
1. The system normalizes the condition name to its PrimeKG ontology identifier 2. It retrieves the condition's comorbidity network, associated genes, and drug relationships from the graph 3. The language model (Groq Llama 70B) generates a response grounded in that retrieved data
This does not eliminate hallucination. The language model can still generate errors in its natural language output. But the factual foundation is a curated, peer-reviewed knowledge graph rather than an unstructured text corpus. The difference is meaningful, even if it is not sufficient to replace clinical judgment.
What AI Cannot Do
Clarity about boundaries is more valuable than optimism about capabilities:
- AI cannot diagnose. Not reliably, not safely, not legally. A knowledge graph can tell you which conditions share symptoms. It cannot tell you which one you have.
- AI cannot prescribe. Drug selection requires consideration of individual patient factors — kidney function, allergies, other medications, preferences, cost — that no general AI tool captures.
- AI cannot provide emotional presence. A chatbot can generate empathetic text. It cannot hold space for grief, validate your anger, or sit with you in silence. Human support groups do this. AI does not.
- AI cannot replace clinical judgment. Medicine is not a lookup problem. It involves weighing incomplete information, managing uncertainty, and making decisions that account for individual values and circumstances. AI can inform these decisions. It cannot make them.
- AI cannot guarantee accuracy. Even knowledge-grounded systems can be wrong — the knowledge graph may be outdated, the retrieval may be imprecise, the language generation may introduce errors. Users must treat AI-generated health information as a starting point for conversation with their care team, not as a final answer.
The Right Framework: AI as Health Literacy Tool
The most productive way to think about AI health tools is as health literacy instruments. They work best when they:
- Help patients understand terminology and concepts before or after medical appointments
- Map relationships between conditions, medications, and symptoms to support informed questions
- Reduce the research burden for patients and caregivers navigating complex health landscapes
- Explicitly state their limitations and recommend professional consultation
The Bigger Picture
AI will not fix the healthcare system. The problems — too few primary care physicians, fragmented care, misaligned incentives, profound health inequity — are structural, not informational. But better health information access is one piece of a larger puzzle, and it is a piece that technology can meaningfully improve if the tools are built with honesty about what they can and cannot do.
The patients who benefit most from AI health tools are not the ones who use AI instead of doctors. They are the ones who use AI to prepare for doctor visits, understand what they were told, and navigate the space between appointments with less confusion and more agency.
That is a modest goal. It is also a worthwhile one.
Disclaimer: This article is for informational purposes only. It is not medical advice. AI health tools — including PatientSupport.AI — are not diagnostic tools, do not provide medical recommendations, and are not a substitute for qualified healthcare professionals. AI-generated information may contain errors despite knowledge graph grounding. Always consult your medical team for health decisions. If you are in a medical emergency, call 911 or your local emergency number.
Citation: Chandak, C., Huang, S. & Pauli, M. Building a knowledge graph to enable precision medicine. Sci Data 10, 67 (2023). https://doi.org/10.1038/s41597-023-01960-3