Using Machine Learning to Reimagine the Infrastructure of Cancer Care

July 21, 2022
Anyone who knows me has probably heard how much time I spend writing clinical notes. What most people don't know is why. When oncologists care for patients with cancer, every visit generates countless pieces of information. These details help clinicians and patients choose the right treatments and management, of course. But long after patients' charts are closed, clinical researchers still depend on these meticulous notes to retrospectively study treatment patterns and outcomes for patients with cancer. And many of the insights they contain may only be visible when aggregated across thousands of electronic health records (EHRs).

Despite the copious amounts of notes that clinicians like myself take, researchers still struggle to track down even the most basic information about patients with cancer. What was the stage of the cancer? When was it diagnosed? Did it come back and when? One key reason is that even with a meticulously crafted note, this information is entered as free text and not readily accessible or usable for retrospective research. Further, attempts to build workflows to capture such information in a structured (and more usable) format are often unsuccessful, as it is often just faster and easier for clinicians to type than find the right place to point and click multiple times. This is a hard problem - and an important one to solve - in order to help patients and researchers.

Machine learning is empowering us to see and make sense of this information faster and at a greater scale than has ever been possible. By rendering more of the patient's experience visible, machine learning is quickly changing what we are able to learn about cancer.

At Flatiron, we're pioneering the use of deep learning technology and natural language processing (NLP) to extract clinically relevant information from patients' EHRs. Whereas, in the past, researchers may have been limited by either less granular details or smaller sample sizes, we're using machine learning to help us learn from the experience of every person with cancer.

Take the example of ROS1-positive lung cancer, a biomarker that may only be present in 1% of patients. Understanding which patients are impacted, how they are treated, and how they do on those treatments is essential for guiding additional research and drug development. But it would be virtually impossible to manually sift through thousands of EHRs to first identify patients with this rare biomarker and then learn from them. The best options available manually might be to single out and then learn from a patient or two, but such a miniscule sample size will be limited in the kinds of insights that it can provide. With machine learning, we're able to build algorithms that can identify details ranging from the presence of a ROS1 rearrangement, to the stage of the cancer, to the actual date that the cancer metastasized (a really hard problem!) at scale. These details matter, large cohort sizes matter, and these tools are helping us to answer important research questions that may not have been possible otherwise.

Machine learning not only helps us to learn from more patients, it can help ensure that more diverse patient cohorts are included in research. From helping to surface patients appropriate for a clinical trial to improving the completeness of data about race and ethnicity, we can use these tools to gain insights from patients who we haven't done a good enough job of learning from in the past. The more research cohorts reflect the actual patient population we hope to benefit, the better the impact will be.

There's a lot that machine learning can do, but machine learning won't fix everything. It won't replace the need for thoughtful clinicians, high quality labeled data, and rigorous, methodologically sound research. But it's extremely powerful, full of potential, and we realize that untapping it requires the expertise and collaboration of people across Flatiron. Our team of abstractors collect and label patients' clinical data. Everything starts with this high quality labeled data, which, when coupled with the high number of records we have, enable our models to work as well as they do. Our data scientists use state-of-the-art modeling techniques to construct and deploy models that capture real life clinical details and contexts. Our biostatisticians have developed novel frameworks to understand and characterize the performance of these models in a transparent and methodical way, ensuring they work effectively, not only across large groups of patients, but fairly across patients of different races and genders. As a clinician, I work with each of these functions to ensure that we are solving clinically meaningful problems that will benefit patients.

Machine learning is making so much possible for research now, but we're even more excited about the breakthroughs ahead. Advances in machine learning will enable us to find connections not previously visible or known and allow us to truly practice personalized medicine. It will help us make better sense – and more use – of everything we know about cancer. It may even shorten the time I need to spend on my notes! But most importantly, it will truly allow us to fulfill our mission: to improve and extend lives by learning from the experience of every person with cancer.