Written by: Gaurav Kaushik, SVP and General Manager of AI, Veradigm and Rachana Kolloru, Product Manager, Veradigm
Worldwide, the total amount of data generated in 2024 was predicted to reach 149 zettabytes, about 30% of which was healthcare-generated. In other words, in 2024, healthcare created about 44.7 zettabytes of data. That’s 44.7 trillion gigabytes, the equivalent of 9.5 trillion DVDs worth of data. That number is still growing. In 2025, healthcare data is predicted to reach a compound annual growth rate of 36%. That’s 10% faster than financial services and 11% faster than media and entertainment.
This exponential growth of healthcare data has created significant opportunities for AI-powered solutions to advance healthcare.
Healthcare data, and specifically real-world data (RWD), comes from diverse sources. The FDA defines RWD as healthcare data collected during routine clinical practice. The diverse sources of RWD – including patient records, medical claims, administrative data, wearable device data, among other data—vary in the nature of data they provide, their level of population coverage, and the quality of data generated.
In this blog, we explore how using AI to analyze the vast—and ever-increasing—amount of RWD is advancing healthcare. We will look at how this powerful duo is influencing medical research and clinical care, as well as outlining some of the challenges. Additionally, we will delve into how you can overcome those challenges to harness AI and RWD to better meet your needs.
RWD have been used for years in medical product development to help inform study design, selection of study endpoints, and identification of potential study participants. Now, AI and RWD are being used to improve clinical trial design and execution in all healthcare areas, including monitoring the performance of tested treatments beyond the traditional clinical trial setting.
AI analysis of existing data can help researchers identify unmet medical needs, inspiring the design of more relevant clinical trials. AI-generated insights into patient populations and practice patterns can help researchers optimize trial design and streamline protocol development.
AI also streamlines the clinical trial process by helping researchers identify patients most likely to benefit from new therapies. Traditionally, identifying the best patients for a clinical trial was a difficult process, usually taking months or years—with up to 86% of trials failing to launch when researchers could not recruit sufficient numbers of eligible participants.
AI’s ability to analyze large volumes of data has enabled researchers to identify patients meeting specific trial criteria more efficiently and quickly. In drug trials, AI can help identify patients most likely to benefit from the new drug. For instance, AI was used to help identify patient participants for pediatric leukemia treatment trials. In another study, researchers utilized AI to analyze RWD to develop a highly specific cohort definition for patients with atrial fibrillation.
Monitoring patient response to treatment and potential side effects is traditionally time-consuming and expensive. Worse, researchers learned that results obtained in a clinical trial’s synthetic setting do not always reflect a drug’s performance in the real world.
However, AI can provide insight into treatment performance outside the clinical trial setting. AI enables real-time monitoring of trial participants to detect adverse events and other issues that could affect trial outcomes. This approach improves understanding of patient behavior and other real-world issues that can influence performance by providing data outside the clinical trial setting. It allows researchers to address patient health complexities and comorbidities and their influence on patients’ real-world outcomes.
Monitoring drug safety is one of the better-known applications of RWD. AI facilitates pharmacovigilance by making it easier to collect and analyze patient data from numerous sources. Depending on the type of trial, AI can be used to analyze visual data, such as CT scans, or natural language processing (NLP), a subdomain of AI used to understand human language, can be used to analyze self-reported notes to identify potential adverse events; or AI may be used to analyze data from wearable devices.
Real-world evidence (RWE), derived from RWD, can provide practitioners with insights into a treatment’s safety and efficacy in real-life environments. AI’s real-time analysis of datasets can improve patient safety and lead to significant financial savings.
Traditional randomized clinical trials (RCTs) gather data from controlled patient cohorts. This means findings may be limited by the characteristics of patients selected for the trial. In addition, these trials usually require a great deal of financial and time investment before data is generated.
However, RWD enables researchers to answer questions about a patient population without launching a full-scale RCT. RWD can be de-identified: Direct identifiers, such as patient names, addresses, and birth dates, can be removed to reduce the risk of re-identifying patients from the remaining information. Harnessing de-identified RWD for research delivers numerous potential benefits, including:
AI language models can read through clinical notes at a scale and speed far beyond what researchers are capable of, extracting key details about patient outcomes, side effects, and adherence patterns. As a result, far more patient interactions, treatment decisions, and outcomes can be processed and analyzed to help generate more, and more meaningful, insights into care. In this way, AI can further complement clinical trials with evidence reflecting real-world medical practice’s complexity and sensitivity.
The combination of AI and RWD is also used to inform care plans and improve patient outcomes. AI analysis of RWD can provide powerful clinical insights, such as a patient’s likelihood of developing specific diseases. AI can also help diagnose diseases and help clinicians optimize patient treatment plans.
Improving disease detection
AI analysis of EHR data can help clinicians to identify patients at risk for various conditions before symptoms appear, enabling providers to offer early intervention and prevention strategies and helping improve patient outcomes. AI models can also aid providers in early disease detection. For instance, researchers at Babylon Health and University College London developed a Machine Learning-based AI algorithm capable of expert-level diagnostic accuracy, out-performing more than 75% of general practitioners in their study. The algorithm’s improvements were especially pronounced with rare diseases, for which diagnostic errors are more common and often more serious.
AI analysis of large real-world datasets is also used to power “precision medicine,” where AI predicts the most effective treatment protocols for specific patients based on their individual characteristics. AI can also identify trends and anomalies in RWD, enabling providers to tailor treatments to individual patients’ specific needs. AI-generated insights help providers maximize treatment effectiveness for each patient.
One prominent recent example of how AI can enhance RWD is the developments in GLP-1 receptor agonists. Initially approved for diabetes management, these therapies are now widely used for weight loss and are being studied for beneficial effects for the treatment of various forms of cardiovascular disease and renal disease in patients without diabetes.
RWD has revealed crucial insights about how these drugs perform in everyday practice. Analysis of hundreds of thousands of patient records is helping strengthen understanding of which patients are most likely to maintain weight loss long-term, what supportive behaviors lead to better outcomes, and what demographic and clinical patient characteristics are likely to interact with the treatment.
As a result, the healthcare industry is learning how a new class of medications performs across diverse populations and care settings, and doing so in a manner that is well beyond what a traditional clinical trial would be in a position to capture.
Generating high-quality insights using AI presents numerous challenges, ranging from ensuring the privacy of patient data to preventing bias in AI algorithms. Incomplete data entries and misclassification errors are common in medical records; there may be inconsistencies in documentation practices, definitions, or coding used from practice to practice. In terms of producing trustworthy AI insights from real-world datasets, though, many of these challenges can be summed up as follows:
AI algorithms need to be trained on diverse, high-quality, real-world datasets that are free from bias—but datasets vary greatly in both quality and format. RWD are obtained from various sources, such as EHRs, claims data, and patient-generated data from wearable devices—each of which may supply data in different formats. Different sources may also use different coding systems. Even data from a single source type may vary in completeness and accuracy.
In addition, 80% of medical data is “unstructured;” that is, unlike data captured in structured numeric, categoric, coded, or other defined types of fields, the bulk of medical data is recorded as provider-generated free text. This text may include provider subjective, objective, assessment, and plan (SOAP) notes, hospital discharge summaries, imaging and pathology reports, and other types of unstructured data.
Implementing AI and RWD to advance healthcare requires high-quality datasets free from bias. Fortunately, NLP can be used for data enrichment, a process of evaluating datasets to render them more valuable, complete, and accurate.
Veradigm offers NLP-enriched, high-quality, healthcare EHR dataset. These datasets are tailored to meet the high-speed demands of today’s healthcare challenges, providing flexible, scalable, and efficient clinical data solutions that drive meaningful results.
Veradigm’s proprietary NLP models extract critical insights from unstructured data, revealing patient insights often missed by traditional methods.
Contact Veradigm to learn how we can help you generate timely insights into real-world patient cohorts to address challenging research questions, optimize clinical outcomes, and advance patient care.