AI in Drug Discovery: The Illusion of Speed and the Reality of Clinical Failure

Its 2025, dozens of AI-derived candidates have entered human trials. Yet, a stark dichotomy has emerged: while AI has dramatically accelerated the early "sprint" to the clinic, it has hit a wall in the "marathon" of clinical proof.

Current data reveals that AI is not yet curing the industry’s most expensive problem: the 90% failure rate of drug candidates. Instead, it is currently generating "faster failures" speeding up the arrival of ineffective drugs rather than creating fundamentally better ones.

1. The Paradox: Solving Chemistry, Failing Biology

The disconnect lies in what AI has actually solved. The pharmaceutical pipeline has two main hurdles:

The Chemistry Problem: Can we make a stable, non-toxic molecule that hits a target?
The Biology Problem: Does hitting that target actually cure the disease in a human?

AI has largely solved the first but struggled with the second.

The "Chemistry" Win (Phase I)

In the preclinical and Phase I stages, AI is a resounding success.

Speed: AI-driven programs reach the clinic in 18–30 months (vs. ~5 years historically).
Safety: AI algorithms excel at optimizing ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties.
Result: AI-derived compounds have an 80–90% success rate in Phase I trials (safety focus), far outperforming the historical 40–65%. The molecules are chemically sound, soluble, and safe for human dosing.

The "Biology" Failure (Phase II/III)

Once these "perfect" molecules face the chaos of human disease in Phase II (efficacy focus), the advantage evaporates.

The Statistic: AI-discovered drugs fail in Phase II at the same stubborn rate (~60%) as traditional drugs.
The Case Study: Recursion Pharmaceuticals successfully used AI to identify REC-994 for cerebral cavernous malformation. While the drug passed safety hurdles (the "chemistry" win), it showed mixed results in improving patient functional outcomes (the "biology" hurdle), highlighting that AI can predict a safe molecule but cannot guarantee it will heal a complex organ.

2. Biological Bottlenecks: The Translational Gap

Why do AI models fail to predict human efficacy? The answer lies in the Translational Gap the chasm between a computer model and a human patient.

Oversimplified Training Data

AI is only as good as the data it learns from. Currently, models are trained on low-dimensional proxies:

Reductionism: Models rely on data from genetically identical cell lines or biochemical assays.
Missing Context: These datasets lack the immune systems, microenvironments, and comorbidities found in real patients.
The Consequence: An AI might find a molecule that perfectly reverses a disease phenotype in a petri dish, but that signal is noise in a human body. This is why "efficacy" in in silico models rarely translates to clinical benefit.

The Human Data Deficit

Despite the hype, fewer than 1 in 4 AI drug companies validate their predictions on human tissue or patient-derived data before clinical trials. By relying on legacy animal models ("mice are not men"), AI pipelines inherit the same biases that have plagued traditional discovery for decades. Without integrating high-dimensional human data—such as genomics, transcriptomics, and organoid responses—AI is merely optimizing our existing ignorance.

3. Chemical Bottlenecks: The "Make" Constraints

Even when AI designs a promising molecule, it faces physical limits in the lab. The transition from "bits to atoms" remains slow and manual.

The DMTA Cycle Drag

Drug discovery revolves around the Design-Make-Test-Analyze (DMTA) cycle.

AI Speed: AI can Design millions of molecules in seconds.
Lab Reality: Chemists must still Make and Test them. Synthesizing a single lead candidate involves complex steps that take weeks or months.
The Lag: Because lab automation lags behind algorithmic speed, the "Make" step becomes the rate-limiting factor. AI suggests 1,000 ideas, but the lab can only test 10.

Three specific case studies analyzing why their biological hypotheses failed in the clinic:

1. BenevolentAI: BEN-2293 (Atopic Dermatitis)

Status: Failed Phase IIa (April 2023); Pipeline status: "Completed/Concluded"

The AI "Win": BenevolentAI’s platform identified a novel Pan-Trk inhibitor (targeting TrkA, TrkB, and TrkC) intended to treat mild-to-moderate atopic dermatitis topically. The molecule was chemically successful—safe, well-tolerated, and reached the target tissues.
The Biological Failure: The hypothesis was that inhibiting these receptors would simultaneously reduce itch and inflammation.
- What went wrong: While the drug was safe, it failed its secondary efficacy endpoints, showing no statistically significant reduction in itch or inflammation compared to placebo in the broader population.
- The "Why": This illustrates the Redundancy Problem. In complex human immune diseases, blocking one pathway (Trk receptors) is often insufficient because the body compensates with redundant inflammatory pathways (like JAK/STAT or IL-4/13). The AI identified a valid target mechanistically, but biologically, the signal wasn't "loud" enough to override the disease's complexity in a diverse patient group.

2. Recursion Pharmaceuticals: REC-994 (Cerebral Cavernous Malformation)

Status: Discontinued (May 2025)

The AI "Win": Recursion used phenotypic screening (computer vision looking at cells) to find that REC-994 reversed the visual signs of disease in cellular models. It passed Phase I safety easily.
The Biological Failure: The drug was designed to treat Cerebral Cavernous Malformation (CCM), a genetic neurovascular disease.
- What went wrong: In the Phase II SYCAMORE trial, the drug met safety goals but failed to show significant improvements in MRI-based lesion volume or patient function compared to natural history data.
- The "Why": This highlights the Phenotypic Trap. The AI correctly identified a molecule that made cells in a dish look healthy. However, a cell culture lacks the 3D vascular architecture, blood flow dynamics, and decades of accumulated damage present in a human brain. The "visual" cure in the lab did not translate to a "functional" cure in a living organ, proving that cellular phenotype is a limited proxy for human organ physiology.

3. Exscientia: EXS-21546 (Solid Tumors)

Status: Discontinued (October 2023)

The AI "Win": Exscientia designed an A2A receptor antagonist intended for cancer immunotherapy. The AI optimized the molecule to be highly potent and selective, solving the "chemistry" of binding to the receptor perfectly.
The Biological Failure: The drug was meant to prevent adenosine from suppressing the immune system within tumors.
- What went wrong: The program was halted because it was "challenging to reach a suitable therapeutic index."
- The "Why": This is a Therapeutic Window Failure. The AI designed a perfect binder, but biology dictated that the dose required to be effective was too close to the dose that would cause toxicity. The complex "Tumor Microenvironment" (TME) in humans is far more hostile and variable than modeled; simply hitting the receptor wasn't enough to break the tumor's defense without potentially harming the patient.

In all three cases, AI successfully engineered a high-quality molecule (the "key"), but the biological lock (the "disease") was more complex than the training data anticipated.

Melissa Bime

Table of Contents