SCCM is performing maintenance on its websites. For the best browsing experience, please use Microsoft Edge or Safari. Those using Chrome or Firefox may experience access issues at this time.

Discovery Datathon

Create pragmatic, data-driven models applicable to the care of critically ill patients.

2023 SCCM Datathon

2023 Theme: Data Science in Critical Care
2023 Subtheme: COVID-19 and Health Equity

visual bubble
visual bubble

What is a datathon? 

A datathon is a collaborative event in which science and data science techniques are combined to address real-world problems using existing datasets. In this datathon, de-identified critical care electronic health record (EHR) datasets will be used to create pragmatic data-driven models applicable to the care of critically ill patients. The goal of this datathon is to identify projects that have the potential to lead to publication and eventually to improve the care of critically ill patients. 

Which databases are being used in this datathon? 

  • SCCM Discovery VIRUS registry
    • The SCCM Discovery Viral Infection and Respiratory Illness Universal Study (VIRUS) is a prospective, cross-sectional, observational study and registry of all eligible adult and pediatric patients who are admitted to a hospital. The dataset developed from VIRUS comprises de-identified data from 306 participant sites across 28 countries. This is the first time the VIRUS dataset will be made available for analysis during a datathon. 
  • eICU-CRD database 
    • The original electronic eICU Collaborative Research Database (eICU-CRD) is a large multicenter critical care database made available by Philips Healthcare in partnership with the MIT Laboratory for Computational Physiology. The eICU-CRD holds data associated with over 200,000 patient stays, providing a large sample size for research studies. The updated eICU-CRD database contains recent COVID-19 data.  
  • MIMIC-III dataset 
    • The Medical Information Mart for Intensive Care III (MIMIC-III) is a large, freely available database comprising de-identified health-related data associated with over 40,000 patients who stayed in critical care units at Beth Israel Deaconess Medical Center between 2001 and 2012. 
  • MIMIC-IV dataset 
    • MIMIC-IV is an updated, improved version of MIMIC-III that incorporates contemporary data. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. 

What is the workflow like in a datathon? 

The datathon occurs over a period of two days, with the following approximate agenda. 

  1. ​Team introductions
  2. Determination of the scope of the research question
  3. Database selection
  4. Identification of population variables
  5. Identification of inclusion and exclusion criteria
  6. Identification of outcome variables
  7. Clinical and statistical analyses
  8. Presentation of results 

Subject matter experts will help guide the teams in accomplishing each step. The complexity of the process will increase if data from two or more databases are combined. 

Who can participate in the datathon? 

Critical care clinicians, researchers, and data scientists are encouraged to participate. Clinicians need not be primarily researchers. Data scientists need not work in the field of critical care. Participants will be separated into teams comprising a mix of participants to maximize collaboration and success. 

Why is SCCM supporting an annual datathon? 

Machine learning algorithms using EHR data play an increasingly important role in healthcare to provide predictive models for prognosis, quality, and patient safety. Interest in data science-related content has recently been growing in the critical care community. While many clinicians have access to large datasets in their organizations, few have been exposed to data analytics techniques. While data scientists have the analytic techniques to process large amounts of data, few have the medical knowledge and experience to identify significant routes of research. The datathon is a demonstration of the multiprofessional approach required to successfully analyze large datasets. 

Can teams publish their findings? 

The enthusiastic atmosphere of the datathon and the unique access to data and data science expertise often result in publication, and ongoing collaborations often arise. Although it is not possible to write a manuscript during the datathon, many teams continue their collaboration afterward and successfully publish their work in Critical Care Medicine and other peer-reviewed journals. 

Are awards available for datathon participation? 

The top three teams will have the opportunity to send a representative to present their results at the 2024 Critical Care Congress, to be held January 21-24, 2024, at the Phoenix Convention Center in Phoenix, Arizona, USA.

An-Kwok Ian Wong, MD, PhD
An-Kwok Ian Wong, MD, PhD
Critical Care Specialist and Pulmonologist
Duke University Medical Center
Durham, North Carolina, USA
Nurse Scientist and Clinical Nurse Specialist 
Stanford Health Care 
Stanford, California, USA