Considerations for the use of machine learning extracted real-world data to support evidence generation: A research-centric evaluation framework

Published

June 2022

Citation

Estevez M, Benedum CM, Jiang C, Cohen AB, Phadke S, Sarkar S, Bozkurt S. Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework. Cancers. 2022; 14(13):3063. https://doi.org/10.3390/cancers14133063

Our summary

When working with real-world data (RWD), key information, such as diagnosis dates, biomarker status, and therapies received, are only available as unstructured text in electronic health records (EHRs). Machine learning (ML) can be used to extract these unstructured data elements—but unique challenges emerge when using the data produced with ML techniques for research purposes. Specifically, how best to assess validity and generalizability to different cohorts of interest.

This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.

Why this matters

Using machine learning to extract unstructured data elements found in EHRs has the ability to unlock retrospective research at scale. This framework guides a multi-stakeholder evaluation that is transparent, goes beyond standard machine learning metrics, and focuses on RWD methodologic fundamentals and considerations, to help determine whether ML-extracted variables are fit for research use.

Read the research

Publications

Considerations for the use of machine learning extracted real-world data to support evidence generation: A research-centric evaluation framework

Our summary

Why this matters

Share

Posted in

More publications

Poster presented at: ISPOR US 2024; May 5-8, 2024; Atlanta, GA

April 2024

Using large language models to extract PD-L1 testing details from electronic health records

Cohen A, Waskom M, Adamson B, et al.

Poster presented at: ISPOR US 2024; May 5-8, 2024; Atlanta, GA

April 2024

Performance assessment of a disease-agnostic treatment sequencing heuristic for deriving line of therapy in a real-world, rare, multi-tumor cohort across tumor mutational burden status

Bouzit L, Burugupalli M, Snow T, et al.

BMJ Open

October 2023

Evaluation of US oncology electronic health record real-world data to reduce uncertainty in health technology appraisals: a retrospective cohort study

Mpofu P, Kent S, Jonsson P, et al.