Real-world data (RWD) are data relating to patients’ health or health care, which are routinely collected from several existing sources.1 This article explores what those sources are along an assessment of each. We also explore why linked healthcare data are the new research standard, harnessing the strengths of RWD while helping to compensate for some of its’ limitations.
Traditionally, researchers have obtained RWD primarily by mining sources such as:1-4
Insurance claims data collects information from millions of doctor’s appointments, bills, and other patient-provider communications.3
Claims data follow a standard format and are relatively complete, making it widely used among researchers. Insurance claims include information about every service from every medical provider covered by the payer.5 This information comes directly from billing information submitted by the providers’ practice for services provided.3
Claims data generally contains a broader set of information than data from EHRs, because EHRs may not be connected to every healthcare facility visited by the patient. This means claims data may do a better job capturing records of tests, procedures, and services received by the patient. It also means claims data may include important information about the patient’s medication: every filled prescription, the amount dispensed, etc., which can be assessed to determine whether the patient is taking their medication as directed.6
EHR data are desirable for research because they are automatically collected by the physician at the point-of-care. Providers usually record data in the EHR during or soon after the patient encounter, making EHR data fairly reliable.3,7 Most importantly, EHRs contain a wealth of information not available elsewhere because it is created directly by providers:3,6,7
Data scientists have even begun extracting data found in unstructured or semi-structured EHR fields through use of natural language processing (NLP) and machine learning.7,8
Patient and disease registries are types of public health surveillance that record health and demographic information about patients who are affected by specific diseases. The Centers for Disease Control and Prevention (CDC), the World Health Organization (WHO), and other medical institutions provide databases that track information about various disease outbreaks.3,4
Similarly, care improvement registries are used to provide a longitudinal view of patients with a specific disease or condition. They are often collaborative, with multiple physicians or healthcare facilities collecting EHR data toward a common purpose. For example, Veradigm® provides two clinical data registries in association with the American College of Cardiology (ACC). The PINNACLE Registry® captures data on coronary artery disease, hypertension, atrial fibrillation, and heart failure to create cardiology’s largest outpatient quality improvement registry. The Diabetes Collaborative Registry® is the first clinical ambulatory registry designed to track and improve the quality of diabetes and metabolic care in both primary and specialty care. Both registries draw data from multiple specialties, including primary care, family care, internal medicine, endocrinology, and cardiology.9
These types of data are valuable because they are usually provided in partnership with a broad spectrum of healthcare providers, such as labs, hospitals, and private physicians. Since the data are provided directly from patient records, they tend to be more reliable than, for example, survey data. In addition, these data are stored in registries that make the data easier to access and analyze than many other data types.
CMS administrative data are claims data derived from Medicaid and Medicare reimbursement information, bill payment, or enrollment/disenrollment information.10 CMS data files cover a broad population segment: Over 45 million beneficiaries are enrolled in Medicare today, or 98% of U.S. adults ages 65 and over. These data also include demographic information, such as date of birth, race, place of residence, and date of death.10
Administrative claims records are a powerful source of data, but on their own they can handicap researchers with gaps in the information they provide. New linkages between claims data and clinical data give traditional RWD a priceless upgrade.6,11
Linked data can provide researchers with more complete information. Data linkages can be used to:1
Sequirus™ and Veradigm conducted a non-interventional, retrospective cohort study on vaccine effectiveness during the 2018-2019 influenza season using RWD from a large dataset that linked ambulatory patient EHRs (Allscripts Touchworks® EHR, Veradigm EHR™, and Veradigm’s Practice Fusion) with medical and pharmacy claims. This dataset enabled them to assess relative vaccine effectiveness in over ten million individuals who had a record of receiving either the cell culture-derived inactivated quadrivalent influenza vaccine or the egg-derived inactivated quadrivalent influenza vaccine.12,13
Linking claims and EHR data provides the best of both worlds: detailed accounts of all costs and services covered in the claims data linked to deep, rich clinical information from individual patient records.
Veradigm is one of the largest providers of deidentified ambulatory EHR data, data that is captured directly from our point-of-care systems. These data are used to provide flexible linked data solutions supporting multiple claims data providers and linking technologies.
Our high-quality linked database set includes:
Veradigm offers both off-the-shelf and custom linked data solutions to meet your research needs. To learn more about how Veradigm’s linked data assets and point-of-care platforms can help you with your research goals, click here.
*5 Year Time Period: November 2015 – October 2019
References: