Real-world data and real-world evidence are playing an increasingly important role in drug development. The phrase “real-world” refers to data collected in the course of routine patient care (e.g., data entered into the EHR every time a patient visits his/her physician), as compared to data collected during a tightly monitored clinical trial. While these data must be carefully curated in order to be meaningful for research purposes, real-world data can potentially be representative of a much larger cohort of patients (fewer than 5% of all cancer patients, for example, enroll in a clinical trial).
Importantly, the FDA has become a proponent of using real-world evidence to inform drug development and to support the agency’s regulatory decisions (check out the FDA’s post on these topics here). In fact, just recently, the FDA accepted real-world safety evidence from post-marketing reports and electronic health records (EHRs) in order to expand the indication for palbociclib (IBRANCE) combined with endocrine treatments for male patients with HER2-negative advanced/metastatic breast cancer. As real-world datasets grow, there are likely to be many opportunities for such data to support regulatory and coverage decisions, and to inform evidence-based clinical treatment guidelines.
Historically, one of the challenges in real-world cancer data generation has been that data are often siloed based on their source. One important example of this is that EHR data (residing in provider software systems) has typically lived separately from tumor sequencing data (residing in diagnostic testing laboratory databases). And in oncology care, these data are becoming more and more important to connect, as biomarkers in a patient’s tumor are increasingly critical in determining the appropriate treatment. Clinical trials for many kinds of therapies are being conducted in specific biomarker-defined populations, where the rate of response is expected to be highest, but designing such trials requires data. And as many patients’ tumors are sequenced in the course of routine care, we are collecting vast amounts of genomic data about each patient — but it’s not always clear how to interpret some of these findings without being able to see outcomes for other patients with similar genomic alterations.
For all these reasons, there is a need for real-world datasets in oncology which integrate both clinical and genomic data. And this is why we partnered with Foundation Medicine, a leading molecular information company, to develop the Clinico-Genomic Database (CGDB). The CGDB is a linked, de-identified, longitudinal database which connects comprehensive genomic profiling data from Foundation Medicine to clinical data curated from Flatiron Health’s EHR database. We started this effort about four years ago, and we are delighted to report that we had the opportunity to describe the CGDB (and the process by which it was created!) in a joint publication in the Journal of the American Medical Association (JAMA). At the time of this publication, the CGDB contained linked clinical and genomic data from nearly 29,000 patients treated across 275 community oncology practices across the country. We specifically examined associations between tumor molecular profiles and treatment outcomes in a sub-cohort of 4,064 non-small cell lung cancer (NSCLC) patients.
We are especially proud of the methodologies that the Flatiron Health-Foundation Medicine teams developed to create this database. This includes everything from the methods required to protect patient privacy at every step of the way, the methods required to reliably link genomic and clinical data for the same patient accurately, and the methods required to scrub and standardize real-world data from the EHR. Our manuscript includes an early description of “real-world endpoints”, which are clinical outcomes such as overall survival, disease progression, and tumor response, all gleaned from real-world sources (e.g., clinic visit notes and radiology reports). How do you develop such endpoints outside the context of a clinical trial? Hint: you need lots of rules for retrospective data review. Stay tuned for future blog posts describing these efforts further.
So novel methods are what enabled us to bring these datasets together and create a powerful CGDB. But in fact, most of the results described in the paper are not particularly surprising or new. And that was actually precisely the point — we have now demonstrated that a large, real-world dataset, created entirely from data collected during routine patient care, can recapitulate well-known findings in lung cancer epidemiology. This validation was important for us to conduct before we can use the database to make novel discoveries, since we now know that the underlying curated data is reliable enough for research (what we call “research-grade”). The publication demonstrates how we pressure tested the data in numerous ways, and were repeatedly able to validate what we already know to be true via traditional research methods. We hope this demonstration increases the research community’s confidence in the CGDB, and in other real-world oncology datasets.
The CGDB is a “living” database, and will be updated with new patient data on an ongoing basis (roughly quarterly), and expanded to include deep characterization of patients with many kinds of cancer. At the time of this blog post, the CGDB already includes nearly 50,000 patients! Our publication in JAMA offers everyone just a first glimpse of all the future research that will be possible.
The full publication can be found here.
An accompanying editorial on real-world data also published in JAMA can be found here.