Machine Learning Helps Develop Better ICU Benchmarking Model

Omar Badawi, PharmD, MPH, FCCM

09/15/2025

_{Read the source article for this feature from Critical Care Medicine.}

Many current length of stay (LOS) models are flawed in capturing why patients stay longer in the intensive care unit (ICU).¹ This information is important for decreasing hospital costs and improving patient safety and outcomes.

Omar Badawi, PharmD, MPH, FCCM, chief of the Division of Data Sciences at the U.S. Telemedicine and Advanced Technology Research Center, and his team developed and tested the Critical Care Outcomes Prediction Model (CCOPM) and published their findings in the April 2025 issue of Critical Care Medicine.² This new model leverages a data-driven deep learning technique called DeepHit³ to learn time-to-event distributions for events with competing risks.

Most patients stay in an ICU for only a couple of days, Dr. Badawi explained. But those who are very sick and survive tend to stay in the ICU much longer. Historical data show that hospitals that do well handling patients with complex, severe conditions tend to receive more of them, but as survival improves for patients in these hospitals, their longer stays can penalize the hospital’s LOS metrics. An ICU in which the same patient might have been less likely to survive would likely show a shorter LOS.

“That small group of very sick patients may well be outliers, but those will not be captured well in a typical generalized model,” Dr. Badawi said. “This isn’t a completely bimodal distribution. There’s a spectrum there, but what you want is to have models and approaches that will have fair estimates of outcomes across that spectrum of risk.”

LOS is a widely used hospital metric, particularly for ICUs. “What you want to see is that patients don’t stay in the ICU any longer than they should. Same with the hospital. We want them to be treated safely, effectively, have a good outcome, and then not linger in the ICU or hospital.”

Determining appropriate LOS goals is difficult because each patient is different, and external factors also affect ICU LOS, such as bed availability throughout the hospital. Practices and documentation vary too. For example, a patient may stay in an ICU bed but receive step-down care, complicating the definition of ICU LOS.

The Findings
The authors used the large Philips Electronic ICU (eICU) Research Institute database, which contains data from more than 600,000 patients from 329 ICUs. The machine learning model used 27 predictors for two outcomes: total LOS and patient discharge status (alive or dead).

“By using a deep learning framework, rather than traditional competing risk models, which often require an unrealistic assumption of proportional hazards, LOS could be more accurately estimated. Implementing this, we were able to show that there is a major difference when you account for the trajectory of patients based off of their survival status,” said Dr. Badawi.

The authors compared their novel LOS models against three other models: an ordinary least-squares model, a light gradient-boosting machine regressor model, and the Acute Physiology and Chronic Health Evaluation (APACHE) IVb model. Compared with these three models, the CCOPM exhibited lower mean absolute error and higher concordance index and coefficient of determination in all ICU and hospital LOS subgroups stratified by survival status, with the comparators often presenting negligible coefficient of determinations in nonsurvivors, indicating very little value in these subgroups.

CCOPM's superior performance is a promising step toward more equitable ICU benchmarking, especially across ICUs with varying risk profiles. Building on this study, future studies of quality outcomes and benchmark reporting can lead to improved care, Dr. Badawi said. “Being able to uncover potential biases in the data is really important.”

A study by Liu et al found that bias in chart documentation could affect outcome metrics, degrading the value of benchmarking.⁴ For example, a lung infection could be classified as pneumonia or pulmonary sepsis, depending on the physician, hospital culture, spectrum of disease, and speed of practice changes with new literature. How the condition is coded will affect the benchmarking.

“There’s a gray area where some patients may be allocated into either one of those, and this can lead to significant bias when different ICUs preferentially document or code one way,” he said. “It’s important, when benchmarking for quality, that we minimize the potential impact of this bias by relying more on physiology, even at the cost of optimal accuracy.”

What’s Next
Machine learning offers opportunities to help clinicians analyze and better understand data. “There’s a lot of power in machine learning and AI in health care,” Dr. Badawi said. “As AI and machine learning evolve, that gives us new opportunities to learn from the data and have more accurate, meaningful models.”

And it’s not one and done, he said. These should be “living, breathing models.” Researchers and clinicians need to continually observe, improve the models, and adapt to changing conditions. “Some of the important steps that need to become more standard are routine recalibration and monitoring of these models for data drift. We need to look at how outcomes and data inputs are evolving over time, keep models up to date, and have local validation and potentially local recalibration, if needed, at different sites.”

Diversifying teams will also improve the models. Collaborations should be considered among bedside clinicians, data scientists, researchers, and hospital administrators to bring different perspectives and challenge one another’s assumptions. As populations, diseases, and treatments evolve, so too should the models. Data will be influenced by situations such as the flu season, a local outbreak, a pandemic, changes in literature, and changes in external forces such as the economy and insurance practices. “If you’re not continuously evaluating and looking at those things and recalibrating, then you risk these models becoming very stale and having less utility over time.”

References

Moran JL, Duke GJ, Santamaria JD, et al; Australian & New Zealand Intensive Care Society (ANZICS) Centre for Outcomes & Resource Evaluation (CORE) . Modelling of intensive care unit (ICU) length of stay as a quality measure: a problematic exercise. BMC Med Res Methodol. 2023 Sep 14;23(1):207.
Brochini L, Liu X, Atallah L, et al. Prediction of intensive care length of stay for surviving and nonsurviving patients using deep learning. Crit Care Med. 2025 Apr;53(4):e794-e804.
Lee C, Zame W, Yoon J, van der Schaar, M. DeepHit: a deep learning approach to survival analysis with competing risks. Proc AAAI Conf Artif Intell. 2018;32(1).
Liu X, Armaignac DL, Becker C, et al. Improving ICU risk predictive models through automation designed for resilience against documentation bias. Crit Care Med. 2023 Mar 1;51(3):376-387.

Author

Omar Badawi, PharmD, MPH, FCCM

Omar Badawi, PharmD, MPH, FCCM, is chief of the Division of Data Sciences at the U.S. Telemedicine and Advanced Technology Research Center.

Knowledge Area:

Tags:

Machine Learning Helps Develop Better ICU Benchmarking Model

Author

Recent Blog Posts