The first countrywide pragmatic, cluster-randomised implementation of AI trial failed… because of the system.

TRICORDER failed in exactly the right way to show the most important problem in the healthcare system of today.

Mar 14, 2026

Abstract (TL;DR)

• The Trial: TRICORDER is the first pragmatic, cluster-randomised implementation trial of a clinical AI technology at national scale—205 UK NHS primary care practices, 1.5 million patients, published in The Lancet(Feb 2026).
• The Surprising Result: When clinicians used the AI stethoscope, detection rates jumped dramatically—heart failure ×2.3, atrial fibrillation ×3.5, valvular heart disease ×1.9. But the headline result was neutral. Because 40% of practices stopped using it.
• The Reason: The algorithm didn’t fail, implementation did. No EHR integration, extra steps in 10-minute consultations, alert fatigue…. This is, yet again, the classic Last Mile collapse.
• The Explanation: TRICORDER is the most powerful real-world validation of Clinical Design theory to date. Every single vowel (Adoption, Evidence, Interoperability, Ownership, Unit Economics) explains a piece of the failure, and a piece of the path forward.
• The Learning: We are officially in the Implementation Era. The question is no longer “does AI work?” It’s “can we design systems that let it?”

1. A Landmark Trial Hiding in Plain Sight

Let me be direct: TRICORDER may be the most important trial published in healthcare AI so far this year. And not because it succeeded, but rather, because it failed in exactly the right way.

The study (Kelshiker, Bachtiger, Petri et al., Lancet 2026)1 deployed an AI-enabled stethoscope (Eko Health) across 205 NHS primary care practices in North West London. The device records a 15-second single-lead ECG and phonocardiogram during routine cardiac auscultation, then runs three cloud-based AI algorithms for heart failure (reduced LVEF ≤40%), atrial fibrillation, and valvular heart disease.

The design was pragmatic by intention: No cherry-picked sites, no artificial incentives, no dedicated research staff running the workflow… There were real GP practices, real Tuesday-morning consultations, and real NHS infrastructure.

The intention-to-treat result: no significant difference in heart failure detection between intervention and control groups (IRR 0.94, 95% CI 0.87–1.00). No difference for AF or VHD either.

Headline: “AI stethoscope fails to improve detection.”

But that headline is wrong. And understanding why it’s wrong is the entire point.

2. The Per-Protocol Signal: The Algorithm Works

When clinicians actually used the AI stethoscope (per-protocol analysis with propensity score matching) the results were unambiguous:

• Heart failure: IRR 2.33 (95% CI 1.28–4.26)
• Atrial fibrillation: IRR 3.45 (95% CI 2.24–5.32)
• Valvular heart disease: IRR 1.92 (95% CI 1.09–3.40)

Time-to-diagnosis was shorter: Detection was genuinely increased. It was not an artifact of coding noise or surveillance bias.

As Bo Wang put it on X: “The algorithm works. No question. […] The humans were the bottleneck.” 2

This is not a negative trial. This is a positive algorithm trapped inside a negative system.

Eric Topol called it “an important lesson for AI med trials”3: the tool improved diagnoses when used, but wasn’t used enough by the doctors. That’s the real finding.

3. The Anatomy of Collapse: Why 40% Walked Away

Across the 96 intervention practices, the AI stethoscope was used 12,725 times in 12 months. That sounds like a lot. It isn’t. It means roughly 2 uses per day per practice—in sites seeing dozens of cardiac-relevant patients daily.

By month 12, the usage distribution was brutal:

• High users (≥31/month): 6% of practices
• Medium users (10–30/month): 15%
• Low users (1–9/month): 40%
• Non-users (abandoned): 40%

The top 5 practices contributed 34% of all recordings. One outlier practice alone contributed 19%.

When surveyed, clinicians identified the barriers with razor-sharp precision:

1) No EHR integration. The AI stethoscope was not embedded in the EHRs. This means that results required manual entry. In a 10-minute NHS consultation, that’s a dealbreaker.

61% of respondents ranked EHR workflow integration as the most influential change to improve use, even ahead of financial incentives (52%).

2) Extra workflow steps. Turn on device. Connect Bluetooth. Open app. Place stethoscope. Record 15 seconds. Wait for result. Manually log finding. Each step is a friction tax on an already-overloaded clinician.

3) Alert fatigue and false positives. The positive predictive value for heart failure was 0.30 (70% false positives). For VHD: 0.10 (90% false positives). In low-prevalence primary care, even high specificity generates noise; and so, clinicians learn to ignore the signal.

Therefore, this is not a story about a failing technology, it’s about a lack of clinical design.

4. TRICORDER Through the Vowels of Clinical Design

If you’ve been following this newsletter, the failure pattern should feel familiar. Let’s run TRICORDER through the vowels of Clinical Design: The AEIOU framework.

The Vowels of Clinical Design on TRICORDER — **Fig. 1.** The TRICORDER trial analyzed through the Vowels of Clinical Design. Evidence generation (E) was the trial’s strength; Adoption (A), Interoperability (I), Ownership (O), and Unit Economics (U) explain the implementation collapse. Creative Commons BY-NC-ND 4.0 | Marcos Gallego

A — Adoption

The AI stethoscope failed the sacred rule: if it doesn’t exist inside the clinical moment, it doesn’t exist.

The device lived outside the EHR. Outside the natural rhythm of the consultation. Clinicians had to leave their workflow to use it. And in primary care, not being in the workflow means not existing.

Compare this to PRAIM (the Vara breast screening study I discussed in Article 0)45: there, the AI was integrated as the viewer itself (triage and safety net built into the radiologist’s native reading environment). Adoption was structural and organic, not optional.

TRICORDER asked GPs to add a tool. PRAIM gave radiologists a better version of the tool they already had.

The learning: friction beats accuracy. Again.

E — Evidence

Here’s where TRICORDER actually shines—paradoxically.

This is the first pragmatic, cluster-randomised implementation trial of a clinical AI technology in a national primary care system. The study didn’t just measure algorithm performance. It measured system performance. It generated real-world evidence on adoption curves, workflow barriers, usage decay, and the gap between per-protocol efficacy and intention-to-treat effectiveness.

That’s exactly the kind of evidence loop Clinical Design demands. Not “does the model work in the lab?” but “does the system work on a Tuesday?”

The answer (honestly reported) was: the model works, the system doesn’t. Yet.

That honesty is more valuable than a hundred press releases about AUC scores.

I — Interoperability

This is where the disconnect was most obvious.

The AI stethoscope was not integrated with electronic health records, and results were stored on the manufacturer’s cloud platform. Clinical users were encouraged (not required) to label recordings with patient NHS identifiers. Only 49% of recordings (6,224 of 12,725) were actually linked to patient identifiers.

In Clinical Design terms: the data didn’t flow to the right place, at the right time, with the right quality.

Contrast this with EAGLE (the closest US comparator trial)6 where the AI result was embedded in the digital ECG report and linked to an automated EHR alert recommending echocardiography. There, EAGLE achieved near-complete uptake. TRICORDER didn’t.

The lesson writes itself: the possibility to connect is not enough. A full embedding must be the standard.

O — Ownership

TRICORDER had institutional backing: NHS executive clinical leadership, a regional guideline for use, blanket data governance approvals for 209 practices. That’s more ownership infrastructure than most AI deployments ever get.

But ownership at the institutional level didn’t translate to ownership at the practice level. Use was discretionary, no champion was required per practice, there was no accountability for non-use, and no feedback loop between the system and the clinician.

No operational accountability means that a mandate is not a mandate, is a memo.

U — Unit Economics

No discretionary research payments or financial incentives were provided to practices. The trial deliberately avoided artificial incentive structures to preserve real-world relevance. That’s methodologically admirable, but it’s also why the economics collapsed.

In NHS primary care, GPs are operating under crushing workload pressure. Adding an uncompensated and complex workflow step with no reimbursement pathway, no time allocation, no promise of time savings, and no visible return on effort is asking clinicians to subsidize innovation with their own exhaustion.

Previous studies suggest a £2,500 saving per heart failure diagnosis made in primary care rather than via emergency hospitalisation. This means that the case for unit economics does exist. It just wasn’t surfaced, aligned, or activated at the point of care.

5. The X Verdict: Optimistic Realism

The reaction on X has been remarkably nuanced—a sign the field is maturing.

The dominant voices clustered around three positions:

“The AI didn’t fail: the system did.” This was the most common reaction. One viral thread called it the implementation paradox. The technology works; the organizational barriers (time, integration, incentives) killed it.

“This is actually good news for AI.” Bo Wang’s high-engagement post captured this angle: when used, detection rates jumped dramatically. The algorithm is validated. The problem is deployable, solvable, and well-characterized. However, that’s an optimistic take. It will take effort and ecosystem engineering to solve the disconnect. And that’s where Clinical Design enters the conversation.

“This is the wake-up call for the Implementation Era.” Multiple commentatorsincluding Topol, framed TRICORDER as the definitive proof that we’ve moved beyond the era of model building. The question now is system design. As one commentator put it: “An algorithm with 95% accuracy that sits in a drawer is worse than one with 80% accuracy that’s actually deployed.”

The sentiment snapshot: approximately 70% optimistic realism, 25% cautionary, 5% neutral shares. No significant backlash. No viral misinformation. The conversation is evidence-based and focused on deployment, exactly where it should be, and where Clinical Design shines.

6. TRICORDER vs. PRAIM: Two Trials, One Lesson

These two studies are the bookends of Clinical Design theory.

PRAIM (Nature Medicine, 2025)7: AI integrated as the viewer. Triage + safety net built into the radiologist’s native workflow. Result: +17.6% increase in cancer detection across 463,094 women. Workload reduction potential of 56.7%.

TRICORDER (Lancet, 2026)8: AI as a separate device. Discretionary use. No EHR integration. Result: algorithm works (per-protocol ×2.3 heart failure detection), but system-level impact neutral because adoption collapsed.

Same thesis. Different outcomes. The variable wasn’t the algorithm. The variable was the design.

PRAIM designed the circuit. But TRICORDER limited itself to deploying the component. And in healthcare, components don’t save lives: Integrated circuits do.

7. What TRICORDER Tells Us About the Next Decade

TRICORDER is a blueprint that tells us exactly what the next generation of AI implementation trials needs:

First: EHR-native integration is not a “nice to have.” It’s the minimum viable product. If your AI output doesn’t appear where the clinician already works, it will not succeed. Congratulations: you’ll have done a nice science project, but nothing else.

Second: Pragmatic trials of AI must measure system performance, not just algorithm performance. Kudos to TRICORDER on this one.

Third: Incentive alignment must be designed, not assumed. If using the tool costs the clinician time with no visible return, usage will decay. The unit economics must be extremely, extremely obvious to all.

Fourth: Ownership must be operational at the user level, not only institutional. A regional guideline is necessary but not sufficient. Someone in each practice needs to own the workflow, drive implementation, oversee the feedback loop, and congratulate the team on the results.

The authors themselves acknowledge this: the next phase will prioritize “seamless integration into clinician workflows (e.g., through EHR linkage) whilst considering selective population targeting and financial incentivisation.”

That sentence defines Clinical Design in everything but name.

8. Call to Action

We have entered the Implementation Era. TRICORDER and PRAIM together prove what this newsletter has argued since day one: discovery scaled. Clinical delivery didn’t… Yet.

The AI is done. The algorithms work. The performance is sufficient (barring the issue of false positives). The regulatory frameworks exist.

What’s missing is the discipline of designing systems that let innovation reach the patient.

That discipline is Clinical Design. And TRICORDER is its most powerful case study to date.

The AI is not failing, but the system is. Now let’s fix the system.

Why I’m doing this: I believe the next 10 years won’t be defined by who discovers the next molecule, but by who figures out how to deliver it.

Whatever your role (clinician, founder, investor, or policy maker) we are all architects of this new system.

Let’s build.

— Marcos

Note & disclaimers:

• Context: The Clinical Decade (and this article) explore the theoretical foundations of Clinical Design, a teaching framework created by Marcos Gallego. It has been developed through independent research and academic activities, and is shared here as a personal contribution to the field.
• Independence: Views and materials published in The Clinical Decade are personal/independent and do not represent any employer, client, or institution.
• License: Licensed under Creative Commons Attribution–NonCommercial–NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless otherwise stated.

https://doi.org/10.1016/S0140-6736(25)02156-7