Like many engineers, I love doing work that is not only interesting but also helps to make the world a better place. But despite the long list of problems that are being solved with software these days, it can be hard to imagine a direct connection between writing code and making an impact on an issue that is truly meaningful to humanity on a global scale, such as addressing poverty, developing sustainable energy sources, or solving world hunger.
I worked on digital advertising technology at Google before joining Flatiron Health, and at the time I knew very little about cancer, or healthcare in general. But as I’ve learned about the disease and all of the challenges that patients, providers, and researchers face in dealing with it, I keep seeing parallels with problems that the ad tech world has already faced and solved. Both industries need to make informed decisions based on a corpus of potentially complex and unstructured data. And as we keep discovering, the same modern software engineering and data techniques that have powered huge leaps of progress in industries like media and advertising are actually extremely valuable when it comes to tackling one of the most difficult yet wide-reaching challenges out there: fighting cancer.
Here’s a small example. Let’s say a cancer researcher would like to identify a cohort of patients that have received paclitaxel, a chemotherapy that is commonly administered to patients with advanced disease. Given a database of medical records, you might try to simply search the administrations table for this drug name. But looking at the data across different providers, you’ll find that there are actually dozens of different ways that this drug is represented. It’s hard to draw out any insights when working with such a messy dataset.
This is a data harmonization problem, and many other industries have to deal with it as well. In the digital advertising world, it comes up when advertisers would like to vary their rendering technology based on the user’s browser. Information about users typically comes in the form of a user agent string, which is only loosely standardized. As a result, a property like browser name might be represented differently across hundreds of variations.
The solution at a high level is to come up with an algorithm that maps a specific variation into a standardized hierarchy of concepts. The approach can range from something simple – like triaging unknown source values to humans who map them to a standard format – to sophisticated machine learning or NLP systems. At Flatiron, our engineers collaborate with medical informaticists to build these mappings and keep them updated over time.
Harmonizing a messy dataset enables many kinds of interesting and useful analyses. In the ads world, it lets you aggregate important factors across all of your users to see how impactful they are, like which browsers are most popular.
Similarly, harmonizing medical data gives us the ability to observe interesting trends and analyze how treatments vary over time, such as after the approval of a drug by the FDA.
Harmonizing data also opens up the opportunity to link it to other datasets and expand our overall understanding of the situation that the data reflects. There are many bits of information that, as a whole, represent a full patient journey. The next challenge is taking all of these low-level data points and writing algorithms to transform them into higher-level models that better represents how physicians and researchers think about cancer treatment.
Engineers at Flatiron spend a lot of time working with oncologists and nurses who bring a deep understanding of the clinical side of oncology. With their help, we can build systems that continually construct this unified data model for a cancer patient and monitor how it changes over time. With this in place, our high-quality data on cancer treatment can help researchers identify which treatment sequences lead to the best outcomes for patients – just like how a webmaster can use user analytics to determine which landing page leads to a higher conversion rate.
We’ve barely scratched the surface of how modern data processing can change how we think about cancer treatment. It’s one of the aspects that excites me the most about working at Flatiron and I’m looking forward to seeing what other interesting applications we’ll find.