Zeke Bergeron
September 01, 2022

The Gist

New studies indicate that using machine learning to analyze electronic medical records and clinician notes is showing promise for identifying bias in treatment assignments and for addressing incomplete medical records.

At AKASA, we are constantly exploring the intersection of machine learning (ML), artificial intelligence (AI), and the healthcare and human health industry. Given how quickly these fields evolve and intersect, we invest heavily in ongoing research to stay at the cutting edge.

Aside from our investment in research and publishing peer-reviewed articles, AKASA frequently hosts experts in the field to present to our technical teams.

As part of this ongoing learning series, we recently hosted Jiaming Zeng, Ph.D., a postdoctoral researcher in the Center of Computational Health at IBM. She presented her research on leveraging electronic medical records (EMRs) to improve decision-making in ontology by adapting causal inference, ML, and natural language processing (NLP).

Dr. Zeng’s research on leveraging electronic medical records to improve decision-making in oncology by adapting causal inference, machine learning, and natural language processing, has great potential. At AKASA, we’re always looking to push the limits of what’s possible with patient data and machine learning, and Dr. Zeng shows one of many such possibilities with this incredible research.


~ Byung-Hak Kim, AI Technology Lead at AKASA

Mining Electronic Medical Records for Cancer Treatment Decisions

Cancer continues to be one of the leading causes of death worldwide. Since 1990, we have averted roughly 2.1 million deaths for men and one million deaths for women thanks to advancements in cancer treatments and improvements in early diagnosis.

However, the increased number of treatment options means clinicians are faced with more difficult decisions when determining a course of treatment for a given patient, resulting in increased demand for tools to assist in this decision-making. The gold standard for determining if one treatment is better than another is randomized controlled tests (RCTs). Unfortunately, RCTs can be expensive and time-consuming.

Given the healthcare industry’s broad adoption of EMRs, the medical community has massive datasets that can present clinicians and researchers with large amounts of observational information. While some tools currently use this data, they fall short in some ways. Improving upon these tools is the focus of Jiaming’s research.

Comparative Effectiveness Challenges

When working to determine which treatment is better using observational data, clinicians are faced with two primary challenges: selection bias in treatment assignments and incomplete treatment records for patients.

Failure to adjust selection bias can undermine the reliability of observational data in any application. Missing patient records reduce the cohort size that you can build to study. Addressing these challenges is critical in developing practical, reliable data sources upon which to base treatment decisions.

Identifying Selection Bias in Treatment Assignments

One of the critical data sources in determining an approach to cancer treatment is weighing the benefits of the treatment against its potential costs, such as deciding whether to treat a given type of cancer with surgery, radiation, or to monitor it further rather than taking it more invasive measures.

This first study focused on using ML techniques to identify any biases present in current RCT data for multiple cancers, focusing primarily on bladder cancer.

Jiaming and her team built a set of covariates from the EMRs using Bag of Words, then trained a treatment prediction model and survival outcome model, and used Lasso to identify any intersections between these two models. These intersections identified potential sources of bias, called confounders. They then performed survival analysis by training a Cox PH model on these intersections. Finally, her team compared these survival analysis results against an established RCT.

Traditional RCTs indicate that monitoring is much better than surgery for bladder cancer, as surgery has a higher mortality rate. However, the confounders identified during this study suggest that patients with bladder cancer or existing bladder issues are more likely to receive surgery, as bladder cancer doesn’t respond as well to radiation. Bladder cancer patients also tend to be older and have additional medical issues, hence a higher death rate.

Upon further analysis, the bias towards surgery despite worse overall health can confound the data behind making the surgery vs. monitoring decision when considering treatment for bladder cancer.

We have developed a method that offers a coherent and adaptable process to identify sources of bias from textual data. And although here we have applied it in a medical sense, we really believe that this can be easily applied to any other context where there’s textual data that you wish to use.


~ Jiaming Zeng, Ph.D., Researcher, Center of Computational Health at IBM

Using NLP to Identify Cancer Treatments and Address Incomplete Medical Records

Incomplete treatment records present a problem when building a large enough cohort to study.

Currently, the definitive resource for treatment analytics is cancer registries. These records only record the initial treatment decisions, and sometimes they can require hours of extensive manual labor to be useable in large-scale studies. Even with this time expense, the records tend to be incomplete, especially when tracking the outcomes of these treatments.

To reduce the amount of manual effort and to close any gaps in the records themselves, Jiaming explored using NLP to analyze EMRs, specifically focusing on clinicians’ notes and using them to fill gaps in treatment outcome data.

They built three different data sources:

  1. One baseline set of data using treatment groups grouped by billing code
  2. A structured data source using a supervised model
  3. A second unstructured data source using clinical notes as their source

The team then compared the results that all three models returned.

Similar results were produced when these models were applied to both prostate and esophageal cancers. The baseline data is serviceable, but not great. A slight improvement is observable when the structured data is included with the baseline.

However, the most significant improvement was observed when the structured, unstructured, and baseline data were all included. This indicates that the structured and the unstructured data are both valuable in filling in the gaps that the conventional systems of record can contain.

Why This Research Matters

The benefits of providing clinicians with the highest-quality data to inform their treatment decisions for cancer patients are of utmost importance. Jiaming’s research in applying modern ML methods to improve the data upon which the decisions are made is a powerful illustration of new techniques of analyzing data that can directly result in a better quality of care for patients.

If you’re interested in using AI and ML to improve the healthcare industry, AKASA is always looking for top talent.

Join the AKASA Engineering Team Today

Zeke Bergeron

Zeke Bergeron is a senior technical program manager at AKASA. He has worked at companies ranging from small startups to large banks, in roles including technical writing, product operations, knowledge management, and Agile program management. Bergeron works closely with engineering to accelerate their ability to deliver and helps guide the organization towards a healthy and efficient approach to internal knowledge management.

You may also like

Blog auto
Feb 20, 2024

How the AKASA Engineering Team Created an Automation Solution for Database Migrations

AKASA builds products and tools to improve the various components of revenue cycle management (medical billing) for hospital systems....

Blog Resource
May 1, 2023

ChatGPT and Healthcare: Exciting Potential That Needs To Be Channeled

Recently I heard that as a fun exercise, the security officer at one of our healthcare clients tried asking...

Blog Resource
Jun 12, 2023

Overcoming the Top 3 Challenges Holding Back Healthcare Innovation

Healthcare is notoriously slow at adapting and incorporating new technologies into day-to-day operations. Healthcare lags behind as one of...

Blog Resource
Jun 12, 2023

7 IT Mistakes You’re Making With Your RCM Automation Partner

The right revenue cycle management (RCM) automation is capable of helping healthcare organizations overcome a litany of issues —...

Blog Resource
Jun 12, 2023

Questions Healthcare IT Teams Should Ask About Revenue Cycle Automation

RCM leaders at your organization are discussing automation. Period. The healthcare revenue cycle is fighting non-stop battles. Staffing challenges...

Blog Resource
Jan 26, 2023

9 Healthcare Technology Trends To Watch in 2023

Keeping track of the rapid changes in healthcare technology is no small task. The industry has seen numerous healthcare...

Blog Resource
Nov 30, 2022

The Gradient Podcast: An Interview on AI and Healthcare With AKASA CTO and Co-Founder Varun Ganapathi

On a recent episode of the Gradient Podcast, host Daniel Bashir sat down with AKASA CTO and co-founder, Varun...

Blog Machine Learning in Medicine: Using AI to Predict Optimal Treatments Hero Image
Sep 1, 2022

Machine Learning in Medicine: Using AI to Predict Optimal Treatments

At AKASA, we’re always thinking about how we can use machine learning (ML) and artificial intelligence (AI) to better...

Find out how AKASA's GenAI-driven revenue cycle solutions can help you.