The curation of real-world data (RWD) has already transformed cancer care —machine learning (ML) has the potential to exponentially increase this impact. This ResearchX episode will highlight Flatiron's evolving perspective of machine learning, describe our models for extracting variables as well as present our recently published framework on how to evaluate RWD produced using ML. Finally, we’ll hear from a life sciences leader about their vision for ML and the potential applications ML-extracted data can unlock for the industry.
Hi everyone, and welcome to ResearchX. We are so excited you've joined us today. We're looking forward to an engaging hour where we're going to dive into the topic of machine learning and how it serves as a foundational tool to unlock insights and transform cancer care through speed, scalability, and flexibility. My name is Erin Fidyk and I'm a Senior Clinical Director on our Real World Evidence Team at Flatiron Health, working with cross-functional colleagues on machine learning and ML extracted variables. It's great to see so many of you from across the industry in attendance today. Thanks again for joining.
I'd first like to introduce our speakers to you, who will help us bring this session to life. We will start off with Will Shapiro, our Head of Data Science, who will share the evolving perspectives of machine learning at Flatiron Health. Next, Auriane Blarre, one of our Data Insights Engineers will do a deeper dive into our data models and discuss ML extracted variables. We'll then have Melissa Estevez, Senior Quantitative Scientist, followed by assessing performance of ML extracted real world data. And lastly, I'll join Yiduo Zhang, Head of Global RWE science on the Oncology Global Medical Affairs team at AstraZeneca for a discussion on ML perspectives, the vision of machine learning and the applications ML extracted data can unlock. And while we've got a really packed agenda today, we will save some time at the end for audience questions and we really look forward to those.
So before we get started, just a few quick housekeeping items. I'd like to draw your attention to the Q&A option available throughout the webinar, and you'll see this at the bottom of your screen. Please feel free to submit a question anytime or reach out to us afterwards if you'd like to discuss any of today's content in more detail. And if you have any technical questions or issues, please let us know through the Q&A tool and we'll do our best to help in real time. And finally, please excuse any interruptions from our pets or loved ones. Given this hybrid working world, many of us are working from home, as you know.
So before we get started, we'd like to learn a little bit more about you. You should see a poll pop up on your screen momentarily, and note that other attendees will not be able to see the responses that you choose. So our poll question is this, what are you most excited about when it comes to ML? Please select one of the below options: increasing sample sizes with data on more patients, extracting new variables out of unstructured documentation, dynamic and faster updates to existing variables, or all of the above. All right, a couple more seconds. And let's close the poll and share the results. All right, excellent. Thank you for providing your input. It's really insightful to get a pulse check on what you and your organizations are most excited about when it comes to ML, and I'm looking forward to digging into each of these topics with you during our session today. So with that, let's go ahead and get started. I'm going to hand it over to Will to kick things off by discussing our evolving perspectives of ML at Flatiron. Over to you, Will.
Thanks so much, Erin. Over the past several decades, our understanding of cancer and how heterogeneous it is has increased exponentially. In parallel, our understanding of patients has gone from generic to increasingly specific, incorporating a growing number of clinically relevant factors. The incredibly exciting potential of this highly specific understanding of disease in patient profiles is a development of personalized medicine, individualized treatments fine tuned to each patient, which maximize efficacy and tolerability. In this context, Flatiron's mission to improve and extend lives by learning from the experience of every person with cancer is more important than ever. With increasingly specific combinations of patient disease and therapy, we need to learn from as many relevant examples as possible in order to have statistically meaningful results. Ultimately, data at this scale supports personalized medicine for everyone, not just the traditional targets of clinical trials.
In this talk, I'm going to focus on the learning piece. How do we learn from the experience of every person with cancer? Learning from the experience of every person with cancer is challenging for a number of different reasons. Every is a lot. Flatiron currently has data for over three million patients across our 280 community oncology clinics and eight academic sites. Furthermore, there is a lot to learn about each patient, from basic information like a patient's stage or diagnosis to rare adverse events. Much of this valuable information is stored in unstructured documents like doctor's notes, so extracting the relevant data is far from trivial. Finally, learnings can quickly become outdated. As new biomarkers become relevant to the standard of care or new variables like social determinants of health become integral to the analysis our partners want to perform, we need to learn new things about each patient.
Machine learning has the potential to address these challenges. ML has the ability to achieve scale, rapidly iterating through charts in parallel across their total patient population. ML also has the ability to read through hundreds of pages of documents to learn more about the experience of every cancer patient, finding a needle in a haystack, things like comorbidities or whether a person has ever smoked. Finally, ML can dynamically keep up to date with the standard of care and new relevant variables. Once a model is updated or developed, it can immediately be deployed across all of our patients. So why didn't we use ML from the start? Taking a step back from the earliest days of Flatiron, ML was always an exciting opportunity to pursue, but we've been very deliberate about what needed to be in place before we felt ready to extract RWD using machine learning. To begin with, real world evidence for Oncology as a field was just beginning when Flatiron was founded. Building trust in our output was critically important to us. Oncology is an incredibly nuanced discipline, and we felt strongly that a human needed to be in the loop to ensure sufficient quality. We were also paying close attention to what competitors were doing with ML and learning about the challenges companies faced in extracting oncology data with machine learning. Finally, we knew that we needed high quality data to train our models with. We all know about the limitations of garbage in garbage out, and we wanted to ensure that our training data was the highest quality available.
So although we didn't start off using ML to extract RWD, from the beginning, one of Flatiron's core principles is to combine technology with expert wisdom, using both to the best of their ability. This is fundamental to how we operate at Flatiron and a paradigm we've been building on for a decade. As our experience has grown and our technological ecosystem has evolved, what we've seen is that the relationship between human expertise and technology has become stronger and more closely intertwined. This evolution can be broken down into three distinct phases. In order to extract data from patient charts, we started by building some foundational technology that allowed clinical experts to easily review hundreds of pages of documents in order to determine patient characteristics, ranging from disease classification and diagnosis date to biomarker status and histology. We called this platform Patient Manager and launched it in 2013. We also developed a set of clinically grounded policies and procedures to guide abstractors as they reviewed charts and made determinations. Finally, we assembled and trained a team of over 2000 clinical experts to work in Patient Manager. We've built our reputation on the quality of the data these expert extractors have produced. And in so doing, we've built a corpus of amazing labels, the highest quality available anywhere in oncology to train our ML models with.
The first major usage of these labels for machine learning was our MACS tool, or Model Assisted Cohort Selection, which makes cohort selection more efficient and allows our abstractors to focus their talent on the most demanding abstraction challenges. We've built many different models for MACS over the past five years, which help us build cohorts based on disease classification, biomarker status or oral medications. At a high level, MACS essentially narrows down the set of patients an abstracter needs to review, for example, filtering out patients that are not metastatic to support building a cohort of patients with metastatic breast cancer. MACS has been in use since 2017, allowing our abstracters to focus on the most demanding tasks so we can learn more about each patient and increase the size of our cohorts.
During the past five years, we've continued to iterate on the model architectures we've developed, incorporating many of the state of the art advances in natural language processing that have emerged, like transformers and large language models. We've now entered a phase where we are increasingly producing data with ML that is so high quality, it can be used directly. We've seen a step change in the performance of our models. We now have models that are able to meet or even exceed performance on certain tasks. Not only are these models highly performant as measured by traditional ML metrics such as precision and recall, we've seen that the data they generate can reproduce the performance of abstracted variables in downstream analysis. Based on these findings and other analyses in the evaluation framework that Melissa will speak about shortly, we feel confident that many of the models we've built produce data that can be used directly in the same manner as our abstracted data. But what's really exciting about having developed these high quality models is the scale they unlock, which Erin and Yiduo will be discussing in more depth later today. With highly performant models, we can truly learn from every patient, not just the sample. We've also been developing what we're calling hybrid variables, which use both ML and abstraction, not as alternatives, but together. Hybrid variables allow us to unlock the scale of ML without compromising on quality, even for the hardest variables. The key thing is that the ML models we're building don't just make predictions, like whether or not a patient has advanced NSCLC, they also output a confidence score that you can interpret as the probability that the model is correct.
Across a range of variables, we can very effectively identify and remove model errors by abstracting low confidence patients. As a result, we can construct hybrid variables that maintain the quality of fully abstracted data while only rarely needing a human to open a chart. And by setting the threshold at different points, we have a dial that we can turn to balance scale and quality as appropriate for the particular use case. This means that, by incorporating ML in a smart way, we can scale cohort size and produce data faster while reserving human expertise for where it's really needed: harder cases and more complex variables. We use the right tool for the right task.
As oncology continues to rapidly evolve, and the treatment landscape becomes more granular, targeting new biomarkers, finely tuned to increasingly particular patient profiles, we see ML as a critical tool in service of our mission of learning from the experience of every person with cancer. Machine learning has the scalability to unlock our addressable patient population and the speed to dynamically keep up with the changes in the standard of care. Ultimately, we believe it is an important part of making sure that the future of medicine is personalized for everyone. And I'll hand it back to you, Erin.
Thanks so much, Will. We've seen how personalized medicine has evolved over time, how data at scale can support this personalized medicine for everyone, and how ML can be a critical tool in service of our mission to learn from the experience of every person with cancer. We're now going to take it a step further and take a deeper look at how our actual models work with a focus on ML extracted data variables. Auriane is going to take us through how we extract these variables, what they actually do from a model perspective and how the models are built. And with that, I'll pass it over to Auriane.
Thank you very much, Erin. As Will mentioned, many of the variables that are most critical for clinical research are not captured consistently and accurately in a structured form. To obtain these critical data elements from unstructured text, we have largely relied on manual chart review performed by an expert clinical extractor, but extracting information from charts manually is a hard and time-consuming task. In this presentation, I'm going to run through an example of how we use machine learning to extract clinical events and the dates from text documents. For this example, we'll focus on the task of extracting advanced diagnosis status and date.
We begin with a subset of patients obtained from the target population. A trained clinical abstractor reviews each patient's medical chart, and extracts whether they have advanced disease and if so, the date of the advanced diagnosis. With this labeled data set, we can train a machine learning model to learn and recognize the language patterns indicating whether the patient received an advanced diagnosis, and when. Broadly speaking, the model does this by taking labeled patient data and learning the words, phrases, and documentation styles. Then, the model predicts a probability the chart will contain evidence that the patient received an advanced disease on a certain date.
Several machine learning algorithms can be used to learn these associations such as logistic regression and various deep learning architectures. Once the machine learning model has been trained, we give it some unlabeled patient data. The model then predicts the probability that the patient's clinical documents include evidence of an advanced diagnosis and the associated date. Now, let's take a closer look at the model building process. The first step to creating a machine learning algorithm is to prepare input for the model, and this is what the section is going to discuss. So first, we come up with a list of search terms. They should be terms such that if you looked at sentences of text around these terms, you might be able to figure out whether the patient is advanced. We build those terms with our research oncologists and clinical data analysts. We ask them how they go about reviewing charts, what is the information and the keyword that they are searching for. Eventually, we want the model to mimic their behaviors.
We then find all matches to these terms in the text, and extract sentences around them. Once we've done this extraction, we can form a timeline of sentences that may be related to the patient's advanced diagnosis over time, where each sentence is associated with the timestamp of the document it was extracted from. Those timestamp sentences are the model inputs. But what model can we choose?
A quick warning. This next section will go deep into the model architecture, and this is for those in the audience interested in the technical details. We want a model that predicts a probability the documentation shows the patient has advanced disease. The most basic model that one might want to use is logistic regression. Let's see if a simple approach with logistic regression can solve this problem.
Logistic regression is a statistical technique that can be used to evaluate the relationship between various predictor variables and an outcome binary variable. It returns the logodds of the outcome variable that can be interpreted as a probability. Remember, the model inputs are timed sentences extracted from the patient's chart. We can binarize words into sentences to create predictor variables. Each dimension in the variable represents a word. One means that the word appears in the extracted sentences, zero means it does not.
The logistic regression uses these predictive variables and the abstraction labels to learn which terms are strong indicators that the patient received an advanced diagnosis and which terms are not. When presented with a new input, the model is going to average out the positive and negative signals from each word, and return an overall probability. Well, is that good enough? Unfortunately, no. This model is too simplistic, and is not able to capture the information needed to make an accurate prediction. Let's use an example to understand why. The sentences, "Advanced not EGFR positive," and, "EGFR positive not advanced," contain the exact same words. So the model would give them the same score, but they mean opposite things. The same word can mean different things depending on where it appears in a sentence. We need a model that captures information beyond whether a word appears or not.
We need to use more advanced techniques called language models to capture nuances in long and complex texts. Language models are powerful tools. They can learn millions of parameters, which enables them to capture subtleties in the text. They read sentences as a whole and use terms around the world to understand its contextual meaning. When they receive very large text as inputs, they can figure out where the relevant information is and focus on this section only. Language models are a state of the art technique used to solve some of the most complex language problems. You use them in your everyday life. For instance, Apple uses them for quick type on the iPhone. A language model interprets the incomplete sentence that the user is typing and uses it to predict what is the most likely next word. Google also uses them for its translation engine. Another classic use case for language models is video closed captioning that you can see on YouTube, for instance. A language model stores a memory of the sequence of events in the video and uses it to generate a sentence describing what is happening in the frame.
An example of a language model that we use at Flatiron is LSTM. In the logistic regression example, we saw how, if we don't keep track of the word order, we can't reach a good conclusion. In LSTMs, words are passed in the model sequentially, but each step remembers the previous word. The LSTM block combines the new word with the information that came before to derive the adequate variable representing the word. For instance, when the LSTM reads the word, "Advanced," it knows it was preceded by the word, "Not," and understands the patient is not advanced. LSTM blocks applied sequentially for just variables that encapsulate the information added to the sentence by each new word.
The last modeling step is to aggregate different word variables into a single sentence variable. Okay now, let's put everything together and summarize the modeling steps. We start by extracting sentences around relevant terms created with our team of clinical experts. We then assign a date to each extracted sentence. Next we use LSTMs to create variables that capture the meaning of those sentences. A deep learning model then predicts the probability as to whether or not the sentences show information that a patient has advanced disease. With the LSTM variable, the classifier here is all the information necessary to understand what is being said in the patient's chart.
We have built this model. Now, let's see how well it performs. The model predicts two outputs, the diagnosis status and dates. The top table shows the model's performance at predicting the diagnosis status. There are many different metrics we could use, but we only show two on the slides: sensitivity and PPV. Sensitivity measures what fraction of the advanced disease population the model was able to identify and PPV measures how reliable the prediction is when the model says that a patient is advanced. The bottom table shows the performance at predicting when a patient received an advanced diagnosis.
Those metrics are informative, but they don't tell the whole story. Measuring performance is a complex problem and Melissa’s talk will cover how we think about performance assessments and make sure that our model predictions are not biased, and that they can be used for relevant clinical use cases. Flatiron revolutionizes research by using advanced deep learning techniques to unlock the value of data in patient charts. Our models can extract variables accurately, quickly, and at scale. I will pass it back to Erin.
Perfect. Thanks so much, Auriane. Now that we've taken a closer look at how models work and are developed, Melissa's going to help us better understand how we measure model performance. But before I pass it over to Melissa, just a quick reminder to please submit questions using the Q&A tool at the bottom of your screen. All right, Melissa, over to you.
Thank you so much, Erin. Today, I will show you that variables extracted by using machine learning are fit for purpose for common use cases by describing how Flatiron assesses model performance and sharing with you the results of one of those analyses. As we've just learned from Will, the move towards personalized medicine means that it has never been more important to learn from every patient living with cancer. However, today we only learn from patients with abstracted data, and due to the resource constraints of abstraction, this represents only a portion of the patients in the Flatiron network. And even for patients with abstracted data, we only learn from a limited set of structured and unstructured data elements.
Machine learning and natural language processing can enable us to learn from all patients, and for rare populations, this is particularly valuable. Consider a study seeking to learn about patients with advanced non-small cell lung cancer with a ROS1 rearrangement. Once we filter down to this target cohort, we may be left with only a small number of patients if we rely on abstracted data alone. However, if we use machine learning to scan an enormous population to identify these rare patient characteristics buried in the unstructured EHR, we can untap the potential of the entire Flatiron network. So now the question is, once we have developed models that can extract key variables from the EHR, such as diagnosis dates and biomarker results, how do we assess the performance and determine whether these variables are ready for use? First, you'll see how Flatiron thoroughly and transparently assesses the performance of individual variables. Second, you will discover that research questions using abstracted data can be replicated using ML extracted data to demonstrate that ML extraction can generate real world data that is fit for purpose.
Flatiron has developed a research-centric evaluation framework for ML extracted data. In addition to using common machine learning metrics like accuracy, sensitivity, and positive predictive value, the framework proposes the combination of stratified performance assessment, quantitative error analysis, and replication analysis to identify potential model bias and better understand how extracted variables perform relative to manual abstraction. While the first three components are most relevant for individual variables, replication analysis can be applied to data sets with multiple ML extracted variables. And today, I will focus on this component of the framework.
There are two places where the evaluation framework can be applied when developing individual variables. The first is during model prototyping. At this stage, standard metrics might be sufficient to iterate on the model, but some of the other elements in the framework can also be useful at this stage. For example, we might want to understand whether the model is making systematic errors, or we might want to understand what impact model errors are having on downstream results. The second place where the framework can be applied once prototyping is complete. At this stage, the framework is applied to the held out test set for the first time, and results are thoroughly evaluated before ML extracted data is incorporated into delivered datasets.
To fully unlock the potential of the entire Flatiron network for research use, it will likely require more than one ML variable to be used together in an analysis. So let's return to that ROS1 example. Not only is more than one ML extracted variable required to select the study cohort, but ML variables are also used in this example as covariates and index dates in survival analysis. The value of replication is that we can study what happens when multiple ML extracted variables are used together to better understand how model error can be compounded and introduce bias into downstream results.
So at this point, are you convinced that replication analysis is valuable? Okay, so now let's get into the how. To perform a replication analysis, first dataset is needed where for all patients in the data set we have relevant variables abstracted by a clinical expert and extracted by a model. We then perform the analysis using abstracted data as shown on the left. We then repeat the same analysis using the same variables extracted by a model as shown on the right. Finally, we compare the results we obtain when abstracted variables are used relative to extracted variables. I will now walk you through the results of a replication analysis that we conducted. Let's start with the objectives. We first select a population of patients with advanced non-small cell lung cancer that have a positive or negative ROS1 tester result. Once selecting this population, we compare real world overall survival from ROS1 result date and the association between ROS1 status and survival. Using a study population of roughly 180,000 patients with a lung cancer ICD code and two or more visits on or after January 1st, 2011. We defined our study cohorts first using ML extracted variables and second using abstracted variables. Due to the low prevalence of ROS1 rearrangements, which is less than 2%. This analysis could not be conducted on the held out test set, but rather leverages all patients in the Flatiron network that have fully abstracted data. The table on the right shows the variables that were used in this study. Specifically for the ML extracted variables, some of these were used to select the study cohort, some were used to describe cohort characteristics and others were used within a COX regression model.
From the study population just described, we further defined two cohorts of interest, ROS1 positive who were ever positive for a ROS1 rearrangement, and ROS1 negative who were negative for a ROS1 rearrangement and were also never positive for ALK, BRAF or EGFR. After selecting the abstracted and ML cohorts, we generated Kaplan-Meier survival curves stratified to ROS1 status. Survival was indexed to ROS1 result date, which is also an ML extracted variable. When we compare the ROS1 negative abstracted and ML cohorts, we observed similar survival distributions with only a 0.4 month difference in the median real world overall survival. This difference was not statistically significant as measured using a permutation test.
Among ROS1 positive patients, we again observe similar survival distributions among the ML and abstracted cohort. And both cohorts show a longer median real world overall survival for ROS1 positive patients relative to ROS1 negative patients. While we do observe that the ML cohort had a shorter median real world overall survival by 5.3 months, the sample sizes are small, the confidence intervals are wide and overlapping and the difference was not statistically significant. Lastly, we were interested in understanding if using ML variables impacted the association between ROS1 status and survival.
For this analysis, we created two Cox regression models. The first, unmatched and unadjusted for any potential confounding variables. For the second model, patients were matched on age at advanced diagnosis, year of advanced diagnosis and gender. We also statistically controlled for all matching factors as well as treatment status, PDL1 status, stage, ECOG, histology and practice type. What is exciting to see is that when we use the Cox regression model to account for the entire survival curve, the ML and abstracted cohorts are similar.
And the difference in median real world overall survival for the ROS1 positive cohort that we saw in the previous slide did not translate to differences in the measured hazard ratio. Specifically, for the unmatched and unadjusted model, the hazard ratios differed by 0.06 with no difference in the model standard errors. For the matched and adjusted model, both hazard ratios and standard errors differed by 0.03. Overall, we found that the use of ML variables in place of abstracted variables did not impact the measured hazard ratio, nor did it impact the interpretation of these results.
In conclusion, Flatiron's ML extracted real world data unlocks the full potential of the Flatiron network. First, ML extracted data is fast and scalable. Using ML, unstructured data can be processed quickly and at scale when expert labeled data are available for training. This greatly reduces the time from study question to insight and increases sample size of rare populations. Second, ML extracted data is reliable. We can measure ML extracted data quality against abstracted data and model errors can be evaluated for systematic bias. Finally, Flatiron's ML extracted real world data is fit for purpose for common use cases.
ML extraction can be a substitute for expert abstraction when models are trained on expert label data and are high performing. As seen in today's ROS1 example, the evaluation framework proved to be an important tool for yielding results that inform the fit for purpose of ML for particular use cases. Specifically the results and decisions regarding the association between ROS1 status and survival were not impacted by the use of ML. With ML data that is fast and scalable, reliable and fit for purpose, we can learn in ways that are not possible with the traditional approach, and this is something that truly excites us. I will now pass things back to Erin.
Great, thanks so much Melissa. We've come full circle now after discussing how we approach individual variable performance and replication use cases, both of which speak to the quality of our ML extracted data and how we assess it. I'd like to now invite Yiduo Zhang from AstraZeneca to join me for a discussion on ML perspectives. Yiduo is an oncology outcomes researcher from AstraZeneca. In the past decade, he has led evidence generation for global health technology assessment submissions as well as built teams in R&D to support data analytics for medical and payer teams. And so, hi Yiduo, I'm going to pass it over to you to tell us a bit more about you, and thank you so much for joining us today.
Hello Erin, I'm so excited to be here and thank you so much for inviting me to chat about this important topic. Currently I'm leading a team with AstraZeneca and we are covering the design, execution, and dissemination of observational research from a medical perspective and also a lot of data analysis. We've been using Flatiron data a lot. We really enjoy the high quality of the data and I would say it's a privilege to be here to share some of our thoughts about the machine learning capabilities. Obviously I'm only here to speak from my own experience and my own opinion instead of that of the company's, thank you.
Thank you for sharing. You've got such a wealth of experience in both real world data and health economics, and you've certainly seen industry change with time and how ML has come into play. So I'm really excited to have this opportunity to chat with you and learn about your experience and in particular around ML. So to get us started, we've heard from our previous three speakers that speed, flexibility, and scale really are key and it's what makes ML such a foundational tool to unlock insights. So my first question is from your experience, how does ML enable scale and why does this matter?
I, myself, am learning machine learning techniques for the moment. So I don't want to claim to be an expert, but I did have experience in designing and driving a machine learning project using other data sources as well. So I think the notion is great, put it in a simple way. As human beings, we have to rest, we have to sleep. Cancer does not sleep, right? So machine learning has the capability of really looking into the data we have to be able to catch up with the exponential increase in data, which is very hard for our researchers, our talented data abstractors to really keep up with.
If you want me to maybe mention one more thing, I would say, like Will mentioned early on, every patient's experience is valuable in outcome research. And what makes our research so rewarding is we want to be able to generate knowledge that can really impact a patient's life. So from that perspective, machine learning has really help us to be able to get the information we needed from the rich and increasing medical charts, other resources quickly and it has given us a data set that is more reliable in terms of reproducing the results and replicating the data and the information, and ML gave researchers additional tools that we need to be able to answer the questions we weren't able to answer previously.
No, thank you for those insights and you brought up having access to different components of the chart that maybe you didn't have access to previously. Can you give an example of that and what are some of those things that researchers are really looking to find and access in the charts that are so important for making these insights?
There are so many examples. Let's look at the bigger picture before we dive into the smaller kind of specific examples. If you look at the landscape of cancer care, it becomes more and more personalized but also becomes very complex. When we are thinking about all the biomarkers, all the kinds of oncology driven genes that we identified in the pathway to be able to address. So finding those biomarkers, for example, can be extremely challenging, especially when we get into a smaller population. Either we're talking about the clinical trial design or clinical trial execution, all in the postmarketing space in terms of surveillance of the market, looking at the population and how the trial data turned out to be in the real world setting.
Finding those patients is the first hurdle. So with the machine learning capability, we can give researchers the luxury of a bigger sample size, that is more reliable and a more precise data estimate. That will really help us turn the knowledge into something more actionable for patients so that we can improve patient care and improve patient outcomes down the road. That's a really exciting kind of capability we're looking at.
No, absolutely. And I love that you mentioned personalized medicine because that ties very much into what Will was speaking about earlier in terms of how personalized medicine has really evolved over the years or decades. It's very individualized now and certainly at Flatiron, our mission, it's very much driven by this. We really want to learn from the experience of every person with cancer. And so that's one of the things we're so excited about is having access to all of these patients, some 3 million plus patients. But there's certainly resource constraints with the abstraction approach. So being able to go in there from a machine learning approach and access these charts and really having so much more data is pretty key. So if we think about biomarkers or potentially other use cases, use cases we haven't even thought about or used traditionally with other real world data, do you have any thoughts about that or from your experience in terms of how it can unlock new use cases, so to speak?
From my experience, there has been a lot of emphasis in terms of health equity for the moment, thinking about how the care, not only the technology, but also how the care has been delivered to patients and one of the barriers in terms of either a testing or treatment adherence or other services that are needed for those patients. Erin, I'm so glad you mentioned that. Getting useful information from medical charts, especially unstructured data is not a trivial task. It's so involved and resource consuming. I think most of us, I can also speak for others, but for myself we really experience that. It's a rate limiting factor for us, but now we have the tools to be able to really help us guide the data in the quantity we needed in terms of sample size or the kind of precision we need which is really helpful.
So going back to the example of health equity in terms of how we understand health disparity. Thinking about the challenge, if we do a study, it's relatively easier to understand the treatment pattern, for example. On a global, national scale, but it becomes increasingly difficult to understand that at the local level. With that being said, there's no global patient, All patients are in their own community getting care as needed. So that is the piece, the broader sample size really helps us to be able to get a more granular estimate of understanding of what is going on for this particular patient or for that community. Really understand other pieces that will impact patient care. So that is another very exciting opportunity that can have a real impact on patients and how we deliver care in a kind of shelter.
Absolutely. Thank you so much for sharing. You mentioned some of that missingness. I mean obviously Flatiron is very focused on disparities work as well and so you kind of talked about filling in some of the missing pieces. Do you have any examples that you could share on some of the things you're looking for from a research perspective?
Maybe I can generalize the question a little bit in terms of when we have the data, how can we generalize it to a broader population? So the barriers for that include missingness, that we as researchers have to work very carefully to deal with.
I think the machine learning element in several aspects gives us bigger volume. We can actually learn from all the patients we have. Inherently, that puts us in a really better position as compared to only being limited by the resources we have and how many patients we can look at. But at the same time, I think another thing, Will and other presenters mentioned it as well is that machine learning does more systemic kind of work to be able to look at all the information that is available, to making sure that guiding the useful information from the resource we have so that we know we are not missing some of the key important information that may lead imperfect information or information that hampers us to make the best decision for the patient.
So, this is really thinking about the missingness, addressing those, how can we reproduce the data and that is really a very exciting area. I would like to see some of the impact in the near future when data can be used by the broader community.
Oh, very much so. Thank you so much for sharing this. Something that hits very close to home in terms of one of the things we're focused so much on as a company. We're getting a little bit short on time here, but I would love to just, as we finish really, if we had to think about where ML or ML extracted variables offer the most value and what's so exciting in your thoughts, what would that be?
I think the speed really speaks to me. Let me unfold that a little bit. Speed also means we can get the information quicker than the traditional way of doing that so that our learning cycles to generate knowledge to impact care can be accelerated. That is super exciting. On the quality side, I think I have probably mentioned already, real world evidence data is not going to be perfect, but we want that to be more predictable and to be more systematic. I think machine learning is addressing some of the inter reader reliability. All the variations, I mean, from that perspective to really give us a data set, over time, we can trust more and more. And we know we can replicate it from a regulatory perspective or from a medical evidence generation or payer evidence generation perspective.
That certainty of handling uncertainties really helps us. And of course, we need to scale this. I think oncology is moving really quickly. We need to be able to really apply this technology to all the data we have. Other speakers have said eloquently, we can scale this. So, the technology is ready and this is really the time, in my own opinion, to take the prime stage to be able to transform how we generate data and how we use that to really improve patient care and patient outcome.
Oh, that's super insightful. Thank you, Yiduo. Maybe to sum it up a little bit before we move to questions. We've heard that turning data into useful knowledge quickly is the key to improving care. And from a value perspective, scalability is the key benefit to using ML. So ultimately, the speed and scale with which ML extraction unlocks really lends itself to improving care, and changing patient outcomes, and potentially reducing care disparities. I think we could talk about this for hours and hours, so we'll have to continue that at another point. But for now, I think we will shift to the Q&A part of the session. And we've had a lot of great questions come through, so let's move on to answering some of them. And to kick things off, we have a question for Will. Is Flatiron using machine learning to make predictions? For example, predicting which patients may be at high risk for mortality in the next year?
Yeah. Thanks, Erin. I think in the context of what we've been talking about today, we're really only using machine learning to extract data that's been explicitly documented in the chart, so much as an abstracter would. At Flatiron, we do have a risk stratification product that we've built in collaboration with our academic partners that notifies doctors whether a patient is at risk of going to the ER but the mechanics and the data that feed that product are substantively different from what we've been talking about today. So, the short answer is in the context of today, no.
Thank you for sharing. Okay, we'll move on to the next question. This one is for Auriane. Does the advanced diagnosis model consider time points? And if so, how does it deal with patients that were lost at some point during their journey?
Yeah, thank you, Erin. Yes, the advanced diagnosis model uses the timestamp and dates found in documents to make predictions. If a patient changed practice during their cancer journey, we have a lot of different practices in our network and a mechanism to try to map back patients who have been treated at different facilities, so that we have only one identifier for the whole journey. That's one way that we can address this problem.
Okay, great. Thanks so much, Auriane. I've got a question for Melissa now. Is using ML extracted data fundamentally different from abstracted data?
Yeah, that's a great question. Thank you for that. There is fundamentally no difference between using ML extracted data versus abstracted data. For example, the data model is the same as you saw from Auriane's example for advanced diagnosis date. Just like abstracted data would appear, you have , is the patient abstracted binary variable? And then you have a variable with the date. And there's no difference in how that data model appears. Second, the way you think about sensitivity analyses for ML extracted data is exactly the same as for abstracted data. Your standard approaches for dealing with measurement errors such as quantitative bias analysis also applies to ML extracted data.
Great. Thanks, Melissa. I think I've got a question for now, Melissa and also Will. If ML is not a hundred percent accurate, does that mean that some variables or values in the Flatiron database may not be correct? For the cases where ML is not able to identify the variables accurately, which part of the workflow contributes to the most error? And then, maybe separately to that, what implications does this have on confidence and data analysis using Flatiron databases?
Thanks. Maybe I can start us off. This is a great question. First, I would say that we have the performance evaluation framework that I described where we thoroughly assess the performance of these ML extracted variables, including looking into whether there's potential model bias or what impact model errors could have on analytical results. We thoroughly review these results before we even decide that we want to add an ML extracted variable into a delivered dataset. Second, we have Knowledge Center materials that are provided with our delivered data sets that will provide thorough information about the model's performance, including performance metrics that can be useful for performing your own sensitivity analyses.
Yeah. And just to add a little bit, I spoke a bit about how deliberate we've been as a company in approaching the use of machine learning. And I think it's very much rooted in this question. We wanted to really feel confident that the data that we're producing with machine learning is fit for use. I think one of the things we're really excited about with hybrid is that this is a way of essentially ameliorating any risk in using machine learning. And in fact, there are tasks that machine learning is really better at than humans are. So, our philosophy is to use both of them to do what they're best at. And when you're talking about reviewing hundreds of pages of documents, machine learning is sometimes the best tool. I think that's really what we're excited about doing here.
Awesome. Thanks so much, Will. I think we've got time for one last question. Could you comment on LSTM versus other language models such as BERT or GTP3, etc? And I'll pass it over to Auriane and then maybe Will if we have some time, you could weigh in, too.
Okay, thanks. That's a great question. LSTM is just an example of a language model that we use. We are not locked in with LSTM. We have also tried to make predictions using BERT and GTP3. What we've found is that the performance for our specific use case from BERTGTP3 was pretty similar to what we got with LSTMs. They are some tasks for which BERT did perform better, for instance in low data regimes. But there are also some limitations to using BERT. It can require more fine tuning. We basically tried to use the model that works the best for our specific use case, on a use case to use case basis.
Yeah. And I'll just add, we keep close tabs on what's happening with BERT and other large language models and are really excited about them, but given that they're often trained on incredibly large- like the whole internet essentially, they're often not as good at oncology specific tasks. But it's definitely something that we're paying close attention to.
Great. Thank you both so much. Looks like we are very short on time here. So many good questions came through and I'm sorry we didn't get to all of them today, but please feel free to get in touch with us by email at email@example.com and we'll be sure to follow up with you offline. And just a friendly reminder before you go to please take the survey upon closing, it'll help us improve future webinars. And with that, thank you all for joining us and to our speakers and to Yiduo for joining us today. Stay healthy, stay safe, and we'll see you again in the new year.