Healthily White Paper

Generistic AI

The Architecture Healthcare Has Been Waiting For

How the convergence of Generative and Deterministic AI will define the next era of safe, scalable clinical intelligence

Executive Summary

Healthcare stands at an inflection point. Generative AI has captivated the industry with its conversational fluency, yet it remains unable to make safe clinical decisions. Deterministic AI delivers the auditable, explainable reasoning and clinical adaptability that regulators and clinicians require, yet it has historically struggled to meet the accessibility and engagement expectations of modern users.

Neither paradigm alone is sufficient. What healthcare needs is a governed convergence of the two: an architecture in which generative AI handles the interaction layer, and deterministic AI handles the decision layer. Each does what it does best. Neither is asked to do what it cannot.

Healthily calls this convergence Generistic AI (Generative + Deterministic), and we believe it represents the only defensible architecture for safe, scalable and regulated clinical intelligence.

This white paper proposes the adoption and definition of the term “Generistic AI”, explains why it is structurally necessary, demonstrates its application through the example of the Healthily proprietary Bayesian Model and Dual-Track platform, and sets out its implications for health systems, insurers, AI platforms and regulators worldwide.

The central argument is clear: if a healthcare AI architecture is not Generistic, it is either unsafe or inadequate.

1. The Promise and the Problem: GenAI in Healthcare

Generative AI has arrived in healthcare with extraordinary momentum. Large Language Models (LLMs) can summarise clinical notes, generate patient communications, support literature review and engage users in natural conversation. The potential is real and the interest is understandable. But potential and safety are not the same thing.

In October 2025, Microsoft Research published a landmark paper titled The Illusion of Readiness which exposed the critical shortcomings of LLMs when deployed in clinical settings. The findings were stark. Models that excelled on medical licensing examinations failed under real-world conditions. They exhibited shortcut learning, giving correct answers for incorrect reasons. They proved fragile under minor variations in phrasing. They fabricated reasoning, producing confident explanations that bore no relationship to sound clinical logic.

These are not edge cases. They are structural properties of how LLMs work.

LLMs are probabilistic text generators. They predict the next most likely word in a sequence. They do not reason from first principles. They do not systematically reduce clinical uncertainty. They do not enforce safety rules or escalation logic. They cannot explain why they reached a particular conclusion in a way that is explainable, traceable and reproducible because they do not “reach conclusions” in any meaningful clinical sense. They generate plausible-sounding text.

The regulatory community has taken note. A 2025 study published in npj Digital Medicine demonstrated that popular LLMs readily produce medical device-like decision support output despite not being authorised by the FDA or any other regulatory body, as clinical decision support devices. The authors concluded that regulatory intervention is urgently needed. The Lancet Digital Health has similarly warned that LLM-based health applications, serving a medical purpose, qualify as medical devices under both EU and US regulations and should not be on the market without approval, yet many are.

The EU AI Act now classifies healthcare AI systems as high-risk. The MDR (Medical Device Regulation) requires that any software making or influencing clinical decisions must demonstrate safety, performance and traceability through a formal conformity assessment. A similar position applies in the UK MDR, where software that makes or influences clinical decisions is regulated as a medical device and must undergo an appropriate conformity assessment. LLMs, with their inherent output variability, opacity and hallucination risk, cannot meet these requirements when used as the primary clinical decision engine.

Most recently, and most damningly, a February 2026 study published in Nature Medicine by researchers at the Icahn School of Medicine at Mount Sinai subjected OpenAI’s ChatGPT Health, launched in January 2026 as a consumer-facing health tool reaching millions of users, to a structured stress test of triage recommendations. Using 60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions (960 total responses), the researchers found that ChatGPT Health’s performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at exactly the clinical extremes where accuracy matters most.

Among gold-standard emergency presentations, the system under-triaged 52% of cases, directing patients with conditions such as diabetic ketoacidosis and impending respiratory failure to routine evaluation rather than the emergency department.

Non-urgent presentations fared little better, with a 35% failure rate. When family members or friends minimised symptoms for edge cases (a common real-world scenario), triage recommendations shifted significantly toward less urgent care, with an odds ratio of 11.7. Perhaps most troublingly, crisis intervention messages for suicidal ideation activated unpredictably, firing more frequently when patients described no specific method than when they did.

This is not a theoretical concern. This is a consumer product, already deployed at scale, failing on the most consequential clinical decisions it is asked to make. The Mount Sinai researchers concluded that their findings raise safety concerns warranting prospective validation before consumer-scale deployment of AI triage systems. The implication is clear: LLM-based triage, as currently architected, is not fit for purpose.

The conclusion is not that LLMs have no role in healthcare. They do. The conclusion is that their role must be bounded: they should be used where they help and never where they decide.

2.The Deterministic Counterpoint: Strength and Limitation

If generative AI represents the new frontier, deterministic AI represents the old discipline. Rule-based systems, Bayesian Inference engines and expert systems have been used in clinical decision support for decades. They are explainable, auditable and reproducible. The same inputs always produce the same outputs.

For regulated healthcare, these are not minor advantages. They are prerequisites.

The Healthily proprietary Bayesian Model exemplifies the strengths of deterministic clinical AI. Operating over a manually curated medical graph of symptoms, conditions and influencing factors, it employs probabilistic inference to systematically reduce clinical uncertainty. It selects questions based on information gain rather than narrative plausibility. It enforces safety rules, red-flag detection and escalation logic. Every recommendation can be traced from user inputs through triggered safety rules, probability shifts and thresholded dispositions to a mapped service outcome. It has been internally validated to a 97.8% safety against the benchmark by Imperial College London in a study using clinical vignettes provided by the Royal College of GPs (UK) and has processed over five million assessments with zero clinical incidents. Real World Evidence (RWE) from recent pilots have shown a persistent safety rate of over 98%.

This is the foundation of safe clinical AI. But the foundation is not the whole building.

Deterministic systems have historically been constrained by rigid interaction models. Traditional symptom checkers present structured question-and-answer flows that, while clinically precise, often feel mechanical and impersonal. They struggle with the ambiguity of natural language. They cannot easily handle the “I just don’t feel right” queries that represent a significant proportion of how real people describe their health concerns. They offer clinical correctness at the expense of accessibility.

The result is a gap. The technology that is safe enough for clinical deployment is not engaging enough for mass adoption. The technology that is engaging enough for mass adoption is not safe enough for clinical deployment.

This gap is precisely what Generistic AI is designed to close.

3. Defining Generistic AI

Generistic AI is a governed, dual-track architecture in which generative AI and deterministic AI operate in concert within a single system, each performing the function for which it is structurally suited, under end-to-end clinical safety management.

The term is a portmanteau: Generative + Deterministic. It is deliberately chosen to be more precise and more descriptive than alternatives such as “Neuro-Symbolic AI” (which describes a broad academic category encompassing many domains) or “Hybrid AI” (which is vague to the point of meaninglessness). Generistic AI refers specifically to the governed convergence of generative and deterministic paradigms for healthcare applications where clinical decisions carry real consequences for real patients.

3.1 The Core Design Principles

Principle 1: Separation of concerns. The generative layer handles conversation, comprehension, summarisation and engagement. The deterministic layer handles clinical reasoning, triage, disposition and escalation. These responsibilities are architecturally separated and never conflated.

Principle 2: LLMs are used where they help, not where they decide. The generative layer may detect intent, identify symptoms, translate medical terminology, explain recommendations in plain language and create a warm, accessible user experience. It does not make clinical decisions. It does not generate dispositions. It does not determine what care the user needs.

Principle 3: Deterministic authority over clinical outputs. All clinical recommendations, triage dispositions and care pathway routing decisions are generated by the deterministic engine. These outputs are explainable, auditable, reproducible and traceable. They can be governed through standard medical device quality processes including hazard identification, risk controls, verification and post-market surveillance.

Principle 4: Governed escalation. The system must include explicit triggers for escalation from the generative layer to the deterministic layer. When the conversation moves from information-seeking to clinical uncertainty or symptom presentation, the deterministic engine takes control. This is not optional. It is architecturally enforced.

Principle 5: End-to-end clinical safety management. The complete system, including both layers and their interaction, has to operate within a clinical safety framework that meets medical device regulatory expectations. This includes a hazard log, risk controls, monitoring, audit trails, change management processes and post-market surveillance

3.2 What Generistic AI Is Not

Generistic AI is not merely “putting a chatbot in front of a symptom checker.” It is not a cosmetic layer. The integration is deep: the generative layer must understand when to yield to the deterministic engine, the deterministic engine must be able to communicate its reasoning back through the generative layer in accessible language, and the entire system must be governed as a single clinical entity.

Generistic AI is not Retrieval-Augmented Generation (RAG) alone. RAG is a valuable technique for grounding LLM responses in verified content and reducing hallucinations, but it remains a mitigation strategy rather than a guarantee of correctness. RAG can improve the quality of informational responses; it cannot produce the deterministic, traceable clinical reasoning required for safe triage and disposition.

Generistic AI is not an aspiration. It is an architecture that exists today.

4. The Architecture in Practice: the Healthily Dual-Track Platform

Healthily has built and is deploying the reference implementation of Generistic AI through its Dual-Track GenAI Health Navigation platform prototype.

Track 1: Generative AI Health Q&A (RAG Layer)

The first track provides free-text conversational interaction powered by retrieval-augmented generation over the Healthily medically validated content library of 2,000+ articles. This layer handles general health questions, lifestyle guidance, reassurance, medicines information and product queries. It is also used to improve the user experience during the deterministic medical assessment to help users better understand the questions, to summarise and explain recommendations or to allow users to ask additional questions about the reasons why a particular recommendation was given. Responses are cited, auditable and subject to active hallucination monitoring.

This is where generative AI excels: accessible, empathetic, natural-language interaction that meets users where they are.

Track 2: AI Medical Assessment (Deterministic Engine)

The second track is the Healthily proprietary Bayesian Model, an MDR Class IIa certified AI Medical Assessment with 2,000+ symptoms and conditions. This engine is triggered when the system detects symptom intent, clinical uncertainty or risk indicators. It employs probabilistic inference, information-gain questioning, safety rules and red-flag logic to produce a single, explainable “best next step” recommendation.

This is where deterministic AI is irreplaceable: auditable, reproducible, regulated clinical reasoning.

The Escalation Logic

The design principle governing the interaction between tracks is straightforward. Informational queries remain in Track 1 (GenAI Q&A). Clinical uncertainty or symptom presentation triggers escalation to Track 2 (Bayesian Assessment).

This is what makes the system an Agentic AI health navigator. The system does not merely respond to queries. It actively determines what to do next. It decides whether the user needs information or assessment, routes them accordingly and navigates them to the Right Care First Time, whether that is self-care, pharmacy, GP, urgent care or emergency services.

The Agentic AI capability is critical. This is not a passive information tool. It is an autonomous navigation system that makes governed, explainable decisions about what the user needs and where they should go.

5. Why Generistic AI Is Required for Regulated Healthcare

The regulatory landscape for healthcare AI is converging on a single, inescapable conclusion: clinical decision-making software must be safe, effective, explainable and traceable. Neither generative AI nor deterministic AI alone can meet all of these requirements simultaneously. Only a Generistic architecture can.

5.1 The Regulatory Framework

Under the EU Medical Device Regulation (MDR), software that makes or influences clinical decisions is classified as a medical device. For AI-powered triage and care navigation, this typically requires Class IIa certification at a minimum. The certification process demands demonstration of safety and performance through clinical evidence, a documented risk management framework, an audited Quality Management System (ISO 13485) and ongoing post-market surveillance.

LLMs cannot satisfy these requirements as clinical decision engines. Their outputs are non-deterministic: the same input may produce different outputs on different occasions. Their reasoning is opaque: there is no traceable path from input to output that can be formally verified. Their behaviour under edge cases is unpredictable: minor prompt variations can produce materially different clinical recommendations. The February 2026 Nature Medicine study of ChatGPT Health provided the starkest evidence yet: a 52% under-triage rate on genuine emergencies and statistically significant susceptibility to anchoring bias from third-party input. These are not implementation challenges that can be resolved with better engineering. They are structural properties of the technology.

The US FDA’s Clinical Decision Support guidance similarly emphasises scope, intended use and transparency expectations for software that influences clinical decisions. The direction of travel is clear on both sides of the Atlantic: clinical AI must be governable, and LLMs alone are not.

5.2 The Clinical Governance Case

Beyond formal regulation, clinical governance frameworks within health systems, insurers and provider organisations require that any tool influencing patient care can be audited, reviewed and held accountable. Clinicians need to understand why a recommendation was made. Clinical safety officers need to investigate adverse events. Medical directors need to sign off on the logic underpinning triage decisions.

A Generistic AI architecture satisfies these requirements because the clinical decision layer is deterministic. Every recommendation produced by the Bayesian Model can be decomposed into its constituent inputs, probability shifts and decision thresholds. If a recommendation is queried, the reasoning can be reconstructed and examined. If a safety concern arises, the hazard can be identified, the control assessed and the correction implemented through a governed change process. Note: this process in a Bayesian system does not require retraining of the model and the changes are immediate, unlike what happens with LLMs that require lengthy and costly retraining when changes are implemented.

This is not achievable with LLM-generated clinical advice, regardless of how many guardrails or fine-tuning steps are applied.

5.3 The User Experience Case

Equally, pure deterministic systems cannot meet the expectations of modern healthcare consumers. Users expect to interact with health tools in natural language. They expect empathy, personalisation and conversational flow. They expect an experience that feels like talking to a knowledgeable and caring advisor, not filling out a clinical form.

Generistic AI delivers this. The generative layer provides the warmth, accessibility and conversational intelligence that drives engagement. The Healthily mass testing programme of a Generistic AI version of the platform, involving 1,203 users and 6,300 questions, demonstrated an 89.5% satisfaction rate. Live deployments of the Healthily Deterministic AI shows 82% assessment completion rates, substantially above industry benchmarks for digital health tools.

The architecture does not compromise between safety and engagement. It delivers both, because each capability is handled by the technology best suited to it.

6. Generistic AI and Agentic Healthcare

The concept of Agentic AI has rapidly emerged as the next frontier in artificial intelligence: systems that do not merely respond to queries but autonomously plan, decide and act. In healthcare, the stakes of agentic behaviour are uniquely high. An AI agent that autonomously makes clinical decisions must be safe, explainable and accountable.

Generistic AI provides the only defensible foundation for Agentic AI in healthcare.

Consider what a true healthcare agent must do: it must understand the user’s intent, which may be ambiguous, emotionally charged or poorly articulated; it must determine whether the user needs information, reassurance, clinical assessment or emergency intervention; it must navigate the user to the right care pathway, accounting for their specific clinical presentation, their available services and their individual circumstances; and it must do all of this autonomously, at scale, without a clinician in the loop for every interaction.

This is precisely what the Healthily Generistic AI platform does. The generative layer understands intent and manages conversation. The deterministic layer makes clinical decisions. The system as a whole acts as an autonomous health navigation agent, routing users to the Right Care First Time.

The alternative, an agentic system built entirely on LLMs making clinical decisions, is not merely suboptimal. It is unsafe. It would be an autonomous agent making consequential clinical decisions using a technology that hallucinates, lacks explainability and cannot be governed through standard medical device processes. The Nature Medicine study of ChatGPT Health demonstrated precisely this risk: an LLM-based system that was already deployed at consumer scale, making autonomous triage decisions and getting the most dangerous ones wrong more than half the time. No responsible health system, insurer or regulator should accept this.

7. Why Terminology Matters

The broader AI industry has begun to recognise that hybrid architectures are necessary. The academic community has coalesced around the term “Neuro-Symbolic AI” to describe the integration of neural networks with symbolic reasoning systems. This is a useful academic framework, but it is insufficiently precise for healthcare applications.

Neuro-Symbolic AI is a broad category that encompasses drug discovery, clinical data extraction, diagnostic imaging analysis and many other domains. It does not specifically address the unique requirements of patient-facing care navigation and triage, where the consequences of error are immediate and personal.

Generistic AI is a more precise concept. It describes a specific architectural pattern, designed for a specific class of healthcare applications (patient-facing triage, care navigation and health assessment), governed by a specific set of regulatory and clinical safety requirements (MDR, ISO 13485, clinical risk management), with a specific design principle at its core: generative AI handles interaction, deterministic AI handles decisions.

This precision matters. In healthcare, vague terminology leads to vague governance, and vague governance leads to patient harm. The industry needs a clear, unambiguous label for the architecture that meets the standard. We propose this label to be Generistic AI.

Some companies in this space have begun to adopt hybrid positioning. Players in the certified digital triage space have publicly described their approach as “Neuro-Symbolic AI” combining Bayesian reasoning with LLM capabilities. But no one has articulated the architectural requirements with the specificity, the regulatory grounding or the operational evidence that the Generistic AI framework demands.

8. Implications for the Market

The emergence of Generistic AI as the standard architecture for regulated healthcare AI has significant implications for every stakeholder in the ecosystem.

For Health Insurers

Generistic AI enables a fundamentally new approach to demand management. Rather than paying for every consultation and hoping for efficient routing after the fact, insurers can deploy an Agentic AI front door that navigates members to the Right Care First Time, every time. The Healthily proven 25%+ cost savings on virtual GP services represent a direct, measurable impact on claims expenditure and member experience.

For Health Systems

Overstretched public health systems face a capacity crisis that cannot be resolved by hiring alone. Generistic AI provides a safe, scalable mechanism to triage demand, deflect low-acuity presentations to appropriate self-care or pharmacy pathways and ensure that clinical capacity is reserved for patients who genuinely need it. The Tony Blair Institute has estimated that AI triage could save the NHS £500 million to £1 billion annually.

For AI Platforms

Major AI platforms, from OpenAI and Anthropic to Microsoft and Google, are actively pursuing healthcare as a growth vertical. Yet none can safely make clinical decisions using their LLM technology alone, even with access to real world clinical data. Generistic AI represents the path to healthcare revenue: by integrating a certified deterministic clinical core, these platforms can transform their conversational AI into regulated healthcare agents. The platform that moves first will define the category.

For Regulators

Generistic AI provides regulators with a governable architecture. The clinical decision layer can be assessed, certified and monitored using existing medical device frameworks. The generative layer can be constrained and governed within the safety case of the overall system. This is a pragmatic path forward that accommodates innovation without compromising patient safety.

For Patients

Most importantly, Generistic AI delivers what patients actually need: a system that listens, understands and then navigates them, safely and efficiently, to the right care. Not a chatbot that provides unreliable medical advice. Not a rigid questionnaire that feels impersonal and mechanical. A system that combines the best of both worlds because it was designed from the ground up to do exactly that.

9. Conclusion: The Architecture Healthcare Has Been Waiting For

The healthcare AI market has spent the last years oscillating between two inadequate extremes. On one side, the promise that generative AI will revolutionise clinical care, which it will not, because it cannot safely make clinical decisions. On the other, the insistence that traditional deterministic systems are sufficient, which they are not, because they cannot meet the accessibility and engagement expectations of the modern healthcare consumer.

Generistic AI resolves this tension. Not through compromise, but through architecture. Each technology does what it does best. The result is a system that is simultaneously safe enough to be a medical device and engaging enough to change patient behaviour at scale.

Healthily is proposing this term, has defined this architecture and is building the reference implementation. Its proprietary Bayesian Model provides the deterministic clinical core. Its Dual-Track GenAI platform provides the governed integration with generative AI. Its MDR Class IIa certification provides the regulatory proof point. Its five million assessments, More than 98% safety validation and 25%+ cost savings in live deployments provide the operational evidence.

The question for the industry is no longer whether hybrid architectures are necessary. That debate is settled. The question is whether companies will adopt the architecture that has been defined, built, certified and proven, or whether they will attempt to build their own from scratch, consuming years of clinical validation and regulatory effort in the process.

Generistic AI is the architecture healthcare has been waiting for.

About Healthily

Healthily has pioneered the use of AI in care navigation since 2015, having been the first company to launch a health chatbot (2016), achieve Medical Device Class I certification (2017), publish an AI Explainability Statement (2021) and achieve MDR Class IIa certification for AI care navigation in the UK (2025).

The Healthily proprietary Bayesian Model is one of the most advanced MDR Class IIa certified AI Medical Assessment platforms in Europe. It covers 2,000+ symptoms and conditions and is supported by a medically validated content library of 2,000+ articles. The platform has processed over five million assessments with zero clinical incidents and has been validated by Imperial College London, the Royal College of GPs and Tel Aviv University.

Healthily is currently developing its first version of the Generistic AI platform.

For partnership and integration enquiries: partners@healthily.ai

Right Care First Time

March 2026

Healthily Limited

www.healthily.ai

References

1. Ramaswamy, A., Tyagi, A., Hugo, H. et al. "ChatGPT Health performance in a structured test of triage recommendations." Nature Medicine (2026). https://doi.org/10.1038/s41591-026-04297-7

2. Microsoft Research, The Illusion of Readiness: LLM Readiness for Clinical Decision Support (October 2025)

3. Imperial College London / Royal College of GPs, Healthily AI Medical Assessment Safety Validation Study

4. Tel Aviv University, Independent Assessment Efficiency Study

5. Tony Blair Institute, AI Triage and NHS Savings Potential (2025)

6. npj Digital Medicine, "Unregulated large language models produce medical device-like output" (2025)

7. The Lancet Digital Health, "A future role for health applications of large language models depends on regulators enforcing safety standards" (2024)

8. EU Medical Device Regulation (MDR) 2017/745

9. EU Artificial Intelligence Act (2024)

10. FDA Clinical Decision Support Guidance

11. ISO 13485:2016 Medical Devices Quality Management Systems

12. Frontiers in AI, "Bridging the gap: a practical step-by-step approach to warrant safe implementation of large language models in healthcare" (2025)

13. Communications Medicine, "Neuro-symbolic AI for auditable cognitive information extraction from medical reports" (2025)

The term "Generistic AI" was coined by Healthily Limited to describe the governed convergence of generative and deterministic AI for regulated healthcare applications.