Zeke Bergeron
August 25, 2022

The Gist

New studies indicate that using machine learning to analyze electronic medical records and clinician notes is showing promise for identifying bias in treatment assignments and for addressing incomplete medical records.

At AKASA, we are constantly exploring the intersection of machine learning (ML), artificial intelligence (AI), and the healthcare and human health industry. Given how quickly these fields evolve and intersect, we invest heavily in ongoing research to stay at the cutting edge.

Aside from our investment in research and publishing peer-reviewed articles, AKASA frequently hosts experts in the field to present to our technical teams.

As part of this ongoing learning series, we recently hosted Jiaming Zeng, Ph.D., a postdoctoral researcher in the Center of Computational Health at IBM. She presented her research on leveraging electronic medical records (EMRs) to improve decision-making in ontology by adapting causal inference, ML, and natural language processing (NLP).

Dr. Zeng’s research on leveraging electronic medical records to improve decision-making in oncology by adapting causal inference, machine learning, and natural language processing, has great potential. At AKASA, we’re always looking to push the limits of what’s possible with patient data and machine learning, and Dr. Zeng shows one of many such possibilities with this incredible research.


~ Byung-Hak Kim, AI Technology Lead at AKASA

Mining Electronic Medical Records for Cancer Treatment Decisions

Cancer continues to be one of the leading causes of death worldwide. Since 1990, we have averted roughly 2.1 million deaths for men and one million deaths for women thanks to advancements in cancer treatments and improvements in early diagnosis.

However, the increased number of treatment options means clinicians are faced with more difficult decisions when determining a course of treatment for a given patient, resulting in increased demand for tools to assist in this decision-making. The gold standard for determining if one treatment is better than another is randomized controlled tests (RCTs). Unfortunately, RCTs can be expensive and time-consuming.

Given the healthcare industry’s broad adoption of EMRs, the medical community has massive datasets that can present clinicians and researchers with large amounts of observational information. While some tools currently use this data, they fall short in some ways. Improving upon these tools is the focus of Jiaming’s research.

Comparative Effectiveness Challenges

When working to determine which treatment is better using observational data, clinicians are faced with two primary challenges: selection bias in treatment assignments and incomplete treatment records for patients.

Failure to adjust selection bias can undermine the reliability of observational data in any application. Missing patient records reduce the cohort size that you can build to study. Addressing these challenges is critical in developing practical, reliable data sources upon which to base treatment decisions.

Identifying Selection Bias in Treatment Assignments

One of the critical data sources in determining an approach to cancer treatment is weighing the benefits of the treatment against its potential costs, such as deciding whether to treat a given type of cancer with surgery, radiation, or to monitor it further rather than taking it more invasive measures.

This first study focused on using ML techniques to identify any biases present in current RCT data for multiple cancers, focusing primarily on bladder cancer.

Jiaming and her team built a set of covariates from the EMRs using Bag of Words, then trained a treatment prediction model and survival outcome model, and used Lasso to identify any intersections between these two models. These intersections identified potential sources of bias, called confounders. They then performed survival analysis by training a Cox PH model on these intersections. Finally, her team compared these survival analysis results against an established RCT.

Traditional RCTs indicate that monitoring is much better than surgery for bladder cancer, as surgery has a higher mortality rate. However, the confounders identified during this study suggest that patients with bladder cancer or existing bladder issues are more likely to receive surgery, as bladder cancer doesn’t respond as well to radiation. Bladder cancer patients also tend to be older and have additional medical issues, hence a higher death rate.

Upon further analysis, the bias towards surgery despite worse overall health can confound the data behind making the surgery vs. monitoring decision when considering treatment for bladder cancer.

We have developed a method that offers a coherent and adaptable process to identify sources of bias from textual data. And although here we have applied it in a medical sense, we really believe that this can be easily applied to any other context where there’s textual data that you wish to use.


~ Jiaming Zeng, Ph.D., Researcher, Center of Computational Health at IBM

Using NLP to Identify Cancer Treatments and Address Incomplete Medical Records

Incomplete treatment records present a problem when building a large enough cohort to study.

Currently, the definitive resource for treatment analytics is cancer registries. These records only record the initial treatment decisions, and sometimes they can require hours of extensive manual labor to be useable in large-scale studies. Even with this time expense, the records tend to be incomplete, especially when tracking the outcomes of these treatments.

To reduce the amount of manual effort and to close any gaps in the records themselves, Jiaming explored using NLP to analyze EMRs, specifically focusing on clinicians’ notes and using them to fill gaps in treatment outcome data.

They built three different data sources:

  1. One baseline set of data using treatment groups grouped by billing code
  2. A structured data source using a supervised model
  3. A second unstructured data source using clinical notes as their source

The team then compared the results that all three models returned.

Similar results were produced when these models were applied to both prostate and esophageal cancers. The baseline data is serviceable, but not great. A slight improvement is observable when the structured data is included with the baseline.

However, the most significant improvement was observed when the structured, unstructured, and baseline data were all included. This indicates that the structured and the unstructured data are both valuable in filling in the gaps that the conventional systems of record can contain.

Why This Research Matters

The benefits of providing clinicians with the highest-quality data to inform their treatment decisions for cancer patients are of utmost importance. Jiaming’s research in applying modern ML methods to improve the data upon which the decisions are made is a powerful illustration of new techniques of analyzing data that can directly result in a better quality of care for patients.

If you’re interested in using AI and ML to improve the healthcare industry, AKASA is always looking for top talent.

Join the AKASA Engineering Team Today

Zeke Bergeron

Zeke Bergeron is a senior technical program manager at AKASA. He has worked at companies ranging from small startups to large banks, in roles including technical writing, product operations, knowledge management, and Agile program management. Bergeron works closely with engineering to accelerate their ability to deliver and helps guide the organization towards a healthy and efficient approach to internal knowledge management.


Get our monthly newsletter

You may also like

Blog Resource
May 1, 2023

ChatGPT and Healthcare: Exciting Potential That Needs To Be Channeled

Recently I heard that as a fun exercise, the security officer at one of our healthcare clients tried asking...

Blog Resource
Jan 26, 2023

9 Healthcare Technology Trends To Watch in 2023

Keeping track of the rapid changes in healthcare technology is no small task. The industry has seen numerous healthcare...

Blog Resource
Nov 29, 2022

The Gradient Podcast: An Interview on AI and Healthcare With AKASA CTO and Co-Founder Varun Ganapathi

On a recent episode of the Gradient Podcast, host Daniel Bashir sat down with AKASA CTO and co-founder, Varun...

Blog Machine Learning in Medicine: Using AI to Predict Optimal Treatments Hero Image
Aug 30, 2022

Machine Learning in Medicine: Using AI to Predict Optimal Treatments

At AKASA, we’re always thinking about how we can use machine learning (ML) and artificial intelligence (AI) to better...

Blog Reaffirming Our Commitment to Data Security and Privacy Hero Image
Aug 29, 2022

Reaffirming Our Commitment to Data Security and Privacy

Data plays a vital role in our work at AKASA — without it, we wouldn’t be able to provide...

Blog Machine learning in emr hero image
Aug 25, 2022

Mining Electronic Medical Records for Cancer Treatment Decisions

At AKASA, we are constantly exploring the intersection of machine learning (ML), artificial intelligence (AI), and the healthcare and...

Blog Resource
Aug 2, 2022

Senior Software Engineer Armaghan Behlum on Why He Joined AKASA

It’s been nearly one year since I left Verily, an Alphabet company focused on using advanced technology to improve...

Blog Resource
Jul 12, 2022

Leading With a Research-First Approach

There’s a lot of buzz around AI in human health and healthcare, and many companies often overhype what their...

Find out how AKASA's AI-driven automation can help you.