Read excerpts from a symposium presentation, Data Quality 2.0: The Future of Real-World Evidence, by Flatiron Health’s Chief Medical Officer, Javier Jimenez, MD, MPH, and Senior Medical Director, Emily Castellanos, MD, MPH, at the ISPOR 2023 Conference.
During the session, Dr Jimenez and Dr. Castellanos discussed the need for a data-quality framework as real-world data (RWD) becomes increasingly complex and diverse. They delved into the role of data quality in the utilization of real-world evidence in scientific and clinical contexts. And they explored how data quality can enhance the impact and applicability of real-world evidence (RWE), such as value assessments, and enable researchers to better assess fitness of RWD for any given use case.
Javier Jimenez, MD, MPH, Chief Medical Officer:
Why does real-world evidence matter? Because outcomes matter.
The one objective of medical interventions (including pharmaceuticals) is to improve patient outcomes. Outcomes of an intervention occur at the intersection of patient biology, healthcare practice and patient behavior. Randomized Controlled Trials (RCTs) evaluate new interventions in an experimental environment that homogenize patient biology and control healthcare practice and patient behavior.
But there is a gap between the evidence used by regulatory authorities to make decisions and the evidence required by health care decision makers on the day-to-day management of patients. Real world evidence can help to close the gap and provide the evidence required to support decision-making at the patient level.
RWE is essential across the product lifecycle in oncology.
Today, we see RWE leveraged across the product lifecycle in oncology. RWE is being used to efficiently answer research questions throughout research, development, and after drugs have entered the market. This includes the use of RWE to prioritize investments, design trials, augment clinical trials, extend labels, post-marketing commitments, and HTA assessment submission. This increased reliance on RWE and diversity of applications was made possible by significant innovations in electronic health record (EHR)-derived oncology RWE.
Understanding the data journey is critical in defining the fit-for-purpose data requirements to generate RWE.
The process of generating real-world evidence has had to evolve as well – from how we design real world evidence studies, through data generation, analysis, interpretation and finally application of that evidence.
In part, this is because RWE is being applied to more challenging use cases. But also, because of how much more we understand real-world data from the EHR. Our understanding of aspects of RWE like the nature of real-world endpoints, the sensitivity and specificity of composite variables, and the use of machine learning and natural language processing to extract hidden information have all evolved.
These advancements in Oncology RWD are built on a deep understanding of the underlying data – something often referred as “data empathy”. Data empathy is about understanding the journey of a dataset: What was the context in which the data was collected? How was it processed? How is it being used? Data empathy enhances researchers’ understanding of the fitness and “limitations” of RWE to answer specific scientific questions.
Flatiron’s team of research oncologists and methodology experts equip us with that context. They apply their day-to-day experience in clinical patient care and health data collection for a clinical interpretation of the data. They are also equipped with the methods needed to transform raw data into research grade data.
Emily Castellanos, MD, MPH, Senior Medical Director:
I'm excited to talk to you all today about the dimensions of quality in EHR data.
How do you know if RWD quality is fit-for-purpose?
As we’ve seen the use of RWD expand, we’ve also seen the growth of regulatory and policy guidance around this use. In the past several years, quality dimensions have been discussed in frameworks and guidances by the FDA, EMA, NICE, Duke-Margolis Health Policy Center, and the Patient-Centered Outcomes Research Institute. So, it’s clear that quality matters.
It's also clear that quality is not just a single concept. It has multiple dimensions that should be assessed. These dimensions generally fall into the categories of relevance and reliability. So, let's take a deep dive today into these dimensions for EHR derived data.
RWD that captures patients’ experience with cancer.
If we're talking about EHR data quality though, let’s take a moment to dive a bit deeper into RWD from the EHR. Flatiron Health RWD is curated from the EHRs of a nationwide network of academic and community cancer clinics. Data lives in the EHR in structured and unstructured formats.
The richest clinical data, like stage of diagnosis and clinical endpoints, exists in unstructured documents. It's challenging and complicated to pull that information out of unstructured documents. Traditionally, this has required human interaction. At Flatiron, we have over 2,000 human abstractors pulling that data. We use machine learning and natural language processing to help curate that data too. In addition, Flatiron has combined EHR data with other sources (e.g., genomics, claims, etc.) to fully capture patients’ experience with cancer.
Assessing RWD: relevance.
So now that we have a general idea of what EHR data looks like, let's turn our attention to the dimensions of quality. Quality, in terms of fitness for use, starts with relevance of the source data. Relevance can be assessed using several subdimensions:
Availability: Are critical data fields representing exposures, covariates, and outcomes available?
Representativeness: Do the patients in the dataset represent the population you want to study? For cancer research, that usually includes patients on a particular therapy.
Sufficiency: Is the size of the population enough? Is there enough follow-up time in the data source to demonstrate the expected outcomes (e.g., survival, adverse events)?
These are traditionally assessed in support of a specific research question or use case. The follow-up time and population needed for a study of overall survival in early breast cancer is quite different than in metastatic pancreatic cancer. For us, as a data vendor, however, since we're developing multi-purpose datasets, we must think of data relevance in a very broad way. This entails considering sets of common and important use cases, including natural history, treatment patterns, safety, and efficacy, and ensuring we include the variables needed for these studies in our core datasets.
We also consider relevance as we expand our network – relying not only on community clinics that use our EHR software, OncoEMR®, but also intentionally partnering with academic centers that use other software. This enables us to both improve the number of patients represented, and also to ensure alignment with where cancer patients across the US actually receive care.
Assessing Flatiron RWD: reliability.
Once the dataset has been determined to be relevant, it should be assessed for its reliability. Similarly to relevancy, reliability can be assessed along several critical sub-dimensions.
Sub-dimensions for reliability:
Accuracy: how well does the data measure what it’s actually supposed to measure?
Completeness: How much of the data is present or absent for the cohort studied? This is agnostic to what the data fields actually are.
Provenance: what is the origin of a piece of data (in a database, document or repository)? This is assessed together with an explanation of how and why it got to the present place. So, for example, this includes a record of transformations from point of collection to the final database.
Timeliness: does the data collected and curated have acceptable recency such that the dataset’s period of coverage represents reality? For example, are documents refreshed in real time recency?
At Flatiron, we have developed important processes to ensure the reliability of our data. We have clear operational definitions for each of the sub-dimensions in place – whether using an ML algorithm or guidance for human abstraction. We ensure the infrastructure is intact to implement processes addressing each of these dimensions. Our clinical and scientific experts help establish these processes.
How does Flatiron perform validation to ensure accuracy?
Let’s dive a little deeper into the sub-dimension of accuracy. To select a validation approach, we consider both its robustness and feasibility. We also consider the complexity of the clinical concept we are capturing and the complexity of how that concept is documented in the EHR. For instance, our validation approach will contrast for birth year, which is documented clearly in a structured field that is verified through the routine course of care. Contrast this to real world progression, which is found in clinical notes and can be documented using a wide array of medical language.
We perform validation at multiple levels throughout the data lifecycle, e.g., at the field level at the time of data entry and at the cohort level. In addition, we use different quantitative and statistical approaches to validate the data at different levels - using a range of metrics depending upon the approach.
Here are several examples of validation approaches we use at Flatiron Health to ensure accuracy:
External reference standard: Our first example is one in which we used an external reference standard, the National Death Index, to validate a composite mortality variable (date of death which is algorithmically derived using both EHR and non-EHR sources such as the SSDI and obituary data). We examined survival curves using our mortality variable to define the time to event outcome, and found that we got essentially the same curve using death dates from the NDI as from our mortality variable.
Indirect benchmark: External reference standards, while useful, are rare for RWD variables. More frequently, we have information from literature or clinical practice that can be used as an “indirect” benchmark. For example, here we validated a novel real world progression variable by correlating to literature and related endpoints. Each curve represents a different time-to-event (TTE) analysis, and you see expected correlations between progression free survival, overall survival, Time-to-next treatment (TTNT) and time-to-progression (TTP).
Internal reference standard: Internal reference standard is an approach we typically use when evaluating a novel curation process like machine learning. In this case, we take the approach of using human-abstracted data as our “reference standard” — something many in this audience may be familiar with from working with claims. In this figure you see two survival curves for patients with ROS1 positive NSCLC. The curves closely overlap, demonstrating very similar results with each curation process.
Accuracy: verification checks.
The validation analyses we reviewed are designed to build confidence in variable accuracy, and we consider them a highly robust approach. But we know that there are practical limitations to validation – lack of benchmarks or references, and also scalability. So, it’s not always practical. Although it is necessary to validate the operational approach to curation, we also need to monitor data and address discrepancies and outliers over time.
Verification checks are proxies for accuracy that employ the data itself as a check, and can be sub-categorized as conformance, plausibility, and consistency.
Conformance is the compliance of data values with internal relational, formatting, or computational definitions or standards.
Plausibility is the believability or truthfulness of data values.
Consistency is the stability of a data value within a dataset, across linked datasets, or over time.
At Flatiron, we use clinical knowledge to develop good patient verification checks. For example, since we expect treatments to follow medical guidelines, we use clinical expertise to evaluate temporal plausibility of a patient's timeline of diagnosis, treatment sequences, and follow-up to assess whether data are logically believable. If not, it may be a good prompt to perform additional verifications checks of a patient’s journey through cancer care, for instance.
Completeness is a critical complement to accuracy to assess reliability. It’s not enough for data to be accurate, if it must be present first! We realize though that completeness in EHR based data is unlikely to be 100%. If it looks too good to be true, it probably is!
That’s why, to understand whether or not completeness of a given variable is meeting an acceptable level for quality, we have to exercise data empathy. Data flows through many channels between the exam room and the final dataset – each step along the way is a point at which some elements may be lost, mislabeled, or inappropriately transformed. We know that certain variables - for instance, those important for oncology care, treatment decisions, communication, billing - will be more complete than others.
We place controls and processes in place across multiple levels to monitor completeness. Thresholds are based on clinical expectations. In addition, integration of sources within or beyond the EHR can improve completeness.
In summary, quality is not measured in a single number. Multiple dimensions are needed to determine fitness-for purpose.
Addressing quality in EHR-based RWD requires cross-disciplinary expertise implemented across the data lifecycle including clinical medicine, medical informatics, engineering, data management operations, and quantitative science. Understanding the quality of real world data is critical to developing the right analytic approach. At Flatiron, we are committed to being transparent about our data quality to ensure our data users are equipped with the knowledge they need to generate RWE.
Learn more about how Flatiron's experts can help deliver oncology real-world evidence.