Electronic health records (EHR) are an important source of real-world data (RWD) and for conducting retrospective clinical research. However, critical clinical details in the EHR may be incompletely captured or buried in the free text of clinician notes. Natural language processing (NLP) offers an automated approach to extracting information from unstructured clinical data and improving the completeness of these details at scale. Eastern Cooperative Oncology Group performance status (ECOG PS), which indicates the general health status of a patient with cancer, is a critical variable for conducting outcomes research and determining cohort eligibility criteria. However, this detail has high levels of missingness, particularly around time of treatment initiation.
In this study, researchers from Huntsman Cancer Institute, NYU Grossman School of Medicine, and Flatiron Health developed a high-performing NLP algorithm to extract ECOG PS from unstructured EHR sources for patients starting new treatments across 21 distinct cancer types.
Why this matters
Utilizing natural language processing algorithms can help tackle critical challenges associated with RWD, including data missingness. Moreover, it can facilitate the achievement of a fundamental benefit offered by RWD: the ability to aggregate extensive longitudinal clinical information from large patient cohorts, leading to high-quality clinical research. This advancement improves our ability to answer meaningful research questions and brings significant advantages to healthcare providers, regulatory stakeholders, and, above all, patients.