Why RWD-Informed Protocols Still Miss Enrollment Targets

Sponsors are using more RWD than ever, but enrollment is still failing 80% of the time. Here’s why.

The paradox sponsors keep running into

RWD and clinical trial enrollment are increasingly discussed in the same sentence.The use of real-world data in clinical development is no longer experimental. A 2025 Tufts Center for the Study of Drug Development (CSDD) study of 25 drug development professionals across 17 pharmaceutical companies found RWD now routinely informing trial design, feasibility, site selection, patient recruitment, and post-approval evidence generation. Sponsors and CROs interviewed described RWD as having moved from supplemental input to core infrastructure.

Enrollment performance has not improved in step. According to industry analysis aggregating recent feasibility benchmarks, nearly 80% of clinical trials fail to meet enrollment timelines, and one in ten sites enrolls zero patients, with each day of delay costing sponsors between $600,000 and $8 million. The investment is up. The outcome curve is flat.

This is the question worth answering: if sponsors are using more RWD, and the data is being used earlier and across more decisions, why is enrollment still the single most reliable failure mode in clinical research?

The answer is that not all RWD is fit to do the work it is being asked to do. Three structural problems show up again and again in post-mortems of trials that used RWD upstream and still missed enrollment downstream.

Failure 1: Aggregated data is being asked to answer patient-level questions

Most RWD products are sold and consumed at the population level: total patients in a region, prevalence by indication, network coverage by therapeutic area. These are the right units for epidemiology. They are the wrong units for protocol design.

A feasibility assessment for a second-line oncology trial does not actually need to know how many patients in a geography have the indication. It needs to know how many patients progressed on first-line therapy within the protocol’s eligibility window, had the required biomarker test, did not have the exclusionary comorbidities, and remain actively in care at a site that can enroll them. That question can only be answered by data resolved at the patient level, followed longitudinally across encounters and providers.

When sponsors describe RWD-informed protocols missing enrollment, the post-mortem usually traces back here. The dataset answered a population-level question. The protocol required a patient-level one. The miss was baked in before the first site was activated.

Failure 2: The variables that drive eligibility live in unstructured data

This is the most consequential and the most under-discussed of the three. Across healthcare, more than 80% of clinical information is unstructured, locked in free-text physician notes, pathology reports, radiology reports, and imaging files. In oncology and rare disease trials, between 45% and 70% of trial-relevant variables exist only in unstructured formats.

This is not a minor implementation detail. It is the difference between RWD that can support a protocol and RWD that cannot.

Performance status, biomarker results from outside labs, prior lines of therapy, reasons for discontinuation, disease progression dates, ECOG scores: these are precisely the variables that drive eligibility decisions in modern protocols, and they are precisely the variables that ICD and CPT codes do not capture. A dataset built primarily on claims, or on the structured layer of EHRs, will systematically under-represent the patients sponsors are trying to find. A protocol that requires ECOG 0-1 cannot be feasibility-modeled against a dataset where performance status is absent in two-thirds of records. An external control arm cannot be defended to the FDA when the comparator cohort’s progression dates were inferred from claim sequencing rather than read from the note.

Closing this gap requires AI-driven chart abstraction operating against the full record, structured and unstructured, at sufficient scale and clinical fidelity to be defensible. It is technically demanding. It is also the difference between RWD that supports decisions and RWD that just informs presentations.

Failure 3: Data is decoupled from the sites that generated it

The third failure mode is the one the industry has architected into its own buying behavior. RWD is procured from one set of vendors. Site networks are contracted with a different set of vendors. The two procurements rarely intersect, which means the dataset’s modeled patients and the network’s actual enrollable patients are often not the same people.

This shows up in two predictable ways. First, feasibility models project enrollment based on patient counts that no specific site can actually deliver against; the patients exist in the dataset but are scattered across providers outside the trial network. Second, site selection happens with little visibility into which sites actually have the relevant patients in active care, so sponsors over-rely on historical enrollment performance, equipment lists, and PI reputation. The result is the one-in-ten-sites-enrolls-zero-patients outcome that has been remarkably stable across decades of industry effort.

RWD that functions for enrollment, not just for feasibility, has to be anchored to identifiable, contactable sites at the patient level. The sponsor running a model should see not only that 1,200 patients meet eligibility nationally, but that they are concentrated across a specific set of sites with the staff, equipment, and prior trial experience to enroll them. The protocol can then be refined against that reality before it is finalized, rather than after.

What sponsors should ask their RWD partners before the next protocol

The Tufts CSDD findings make clear that the question is no longer whether to use RWD. It is which RWD, applied where, and with what assurance that it can carry the decision being asked of it. Five questions, asked early, separate RWD that improves clinical trial enrollment from the rest.:

  1. Is the data resolved at the patient level, longitudinally, or is it aggregated? Aggregated data answers different questions and cannot be retrofitted to patient-level use cases.
  2. What percentage of clinically meaningful variables (performance status, biomarker results, lines of therapy, progression dates) come from unstructured sources, and how is that abstraction performed and validated? If the answer is “we use the structured fields,” the dataset is missing the parts that drive eligibility.
  3. Can specific patients be linked to specific sites that could enroll them? If not, the data cannot inform site selection, only protocol scoping.
  4. How current is the data, and how is currency maintained? RWD that is six to twelve months stale cannot support real-time feasibility or eligibility screening.
  5. Was the data assembled for research, or assembled for billing and re-purposed for research? Provenance shapes what the data can defensibly support, particularly under FDA’s fit-for-purpose framework.

Sponsors who ask these questions early will find that the RWD market is smaller than the marketing suggests, but that what remains is the data that actually carries weight.

What better RWD for clinical trial enrollment actually looks like

RWD and clinical trial enrollment outcomes have drifted apart precisely because the category has grown faster than the data fitness within it. Real-world data has become a category. Enrollment performance has not changed. Both things can be true because the category, as it is currently sold, optimizes for the wrong unit of analysis. Patients are individuals followed through time, treated at specific places, with the most consequential information about them written rather than coded. RWD that mirrors that reality (patient-resolved, structured and unstructured, site-anchored) can change enrollment outcomes. RWD that does not, won’t, no matter how much of it sponsors buy.

The sponsors and CROs who will get the most out of RWD in 2026 and beyond are the ones who stop treating data and site access as separate procurements and start evaluating them as one integrated capability. That is the configuration in which RWD becomes infrastructure for enrollment rather than overhead on top of it.

Sources cited in this post:

  1. Tufts CSDD via Applied Clinical Trials: The Use of Real-World Data and Evidence in Clinical Trials (2025 study, published 2026).
  2. Applied Clinical Trials: Unlocking Unstructured Health Data: Scaling eSource-Enabled Clinical Trials (April 2026).
  3. Industry feasibility analysis: Bridging the Feasibility Gap.

Read More

Why “Eligible Patients” Aren’t Always Enrollable Patients

Why “Eligible Patients” Aren’t Always Enrollable Patients

Why “Eligible Patients” Aren’t Always Enrollable PatientsThe industry’s reliance on eligibility as a proxy for enrollment is understandable. Protocols are built around eligibility criteria. Feasibility assessments often rely on structured EHR data and historical...