A natural language processing algorithm to improve completeness of ECOG performance status in real-world data

Summary

Health authorities have highlighted data completeness in real-world data from electronic health records (EHRs) as a key component of data integrity and a shortcoming of observational data. Fortunately, natural language processing (NLP) has proven beneficial in addressing missingness, particularly in oncology EHRs where clinical notes often hide essential information, by automating the approach and improving the completeness of these details at scale.

In this study, researchers from Huntsman Cancer Institute, NYU Grossman School of Medicine, and Flatiron Health successfully developed a high-performing NLP algorithm to extract Eastern Cooperative Oncology Group Performance Status (ECOG PS) from unstructured EHR sources for patients starting new treatments across 21 distinct cancer types.

ECOG PS indicates the general health status of a patient with cancer. Access to this variable can enhance oncology research, help determine eligibility criteria in clinical trials, and facilitate decisions by both regulatory and health technology assessment bodies.

The study found that NLP can be an important tool to address RWD missingness. Implementing NLP enhanced the availability of ECOG PS in the dataset from 60% to 73%. When compared with ECOG values captured in structured EHR fields, NLP-derived ECOG PS had high accuracy (93%) and sensitivity (88%) and a positive predictive value (PPV) of 88%.

Why this matters

Utilizing natural language processing algorithms can help tackle critical challenges associated with RWD, including data missingness. Moreover, it can facilitate the achievement of a fundamental benefit offered by RWD: the ability to aggregate extensive longitudinal clinical information from large patient cohorts, leading to high-quality clinical research. This advancement improves our ability to answer meaningful research questions and brings significant advantages to healthcare providers, HTA, regulatory stakeholders, and, above all, patients.

Read the research

Publications

A natural language processing algorithm to improve completeness of ECOG performance status in real-world data

Summary

Why this matters

Share

Posted in

More publications

Nature Communications

March 2024

A framework for evaluating clinical artificial intelligence systems without ground-truth annotations

Kiyasseh D, Cohen A, Jiang C, et al.

Poster presented at: ISPOR US 2024; May 5-8, 2024; Atlanta, GA

April 2024

Using large language models to extract PD-L1 testing details from electronic health records

Cohen A, Waskom M, Adamson B, et al.

Poster presented at: ISPOR US 2024; May 5-8, 2024; Atlanta, GA

April 2024

Evaluation of real-world response rate in clinical trial-aligned cohorts of patients with lung, colon and breast cancer using machine learning

Zhang C, Krismer K, Wang X, et al.