Algorithms.
Last updated November, 2023
Many algorithms are in development within MIDRC. Please see below for a list of algorithms/models/code that are publicly available at the moment. Many more to come!
(CRP=Collaborative Reseach Project, TDP=Tecnology Development Project)
-
This algorithm uses multi-dimensional stratified sampling where several variables of interest (such as demographics - race, gender, imaging acquisition system) can be sequentially used to divide the data into numerous strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to a specific dataset. This algorithm was developed and is used by MIDRC for separation of data into either the open data commons or the sequestered data commons. However, as shared here by MIDRC, it can be generalized by users for other needs for stratified sampling, e.g., dividing your own dataset into a two separate sets: one for training and one for testing.
Code
COVID-specific: https://github.com/MIDRC/Stratified_Sampling
General: https://github.com/MIDRC/Generalized_Stratified_Sampling
-
Task based sampling begins with the identification of cases relevant for a specific task and target population demographic characteristics (such as age range, COVID status, and imaging modality). Then, optimized quota sampling is conducted by randomly sampling cases until the maximum category margin (Baughan et al. 2022) is less than a pre-specified value. N. Baughan et al., “Task-Based Sampling of the MIDRC Sequestered Data Commons for Algorithm Performance Evaluation,” presented at Annual Meeting of the American Association of Physicists in Medicine, 2022, E257–E258).
Code:
-
A document-level classifier for COVID-19 on radiology reports to help find COVID cases, as well as create large numbers of labels for computer vision models.
Code: https://huggingface.co/StanfordAIMI/covid-radbert
Publication: https://pubmed.ncbi.nlm.nih.gov/36323915/
-
An automated de-identification pipeline for radiology reports that detects protected health information (PHI) entities and replaces them with realistic surrogates "hiding in plain sight." Our model outperformed all de-identifiers as well as human labelers when it was compared on all test sets of i2b2 2014 data. It enables accurate and automatic de-identification of radiology reports.
Code: https://huggingface.co/StanfordAIMI/stanford-deidentifier-base
Publication: https://www.ncbi.nlm.nih.gov/pubmed/36416419
-
RadBERT is a transformer that was continuously pre-trained on radiology reports from a BioBERT initialization.
-
A knowledge graph for radiology reports to further improve the factual completeness and correctness of generated radiology reports. More precisely, we leverage the RadGraph dataset containing annotated chest X-ray reports with entities and relations between entities. On two open radiology report datasets, our system substantially improves the scores up to 14.2% and 25.3% on metrics evaluating the factual correctness and completeness of reports.
Code: https://physionet.org/content/radgraph/1.0.0/
Publication: https://aclanthology.org/2022.findings-emnlp.319/
-
An end-to-end pipeline for the classification of chest X-rays that may belong to COVID-19 positive patients to enable real time diagnosis of the virus in the field without having to wait 24-48 hours for the results of an RT-PCR test or the less accurate results of a rapid antigen test.
Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_Emory-ResNet50
-
COVID19 Disease Trajectory Prediction using Xrays and EHR, this model predicts a label for each chest X-ray.
Code:
-
The model is trained with JSRT data and the corresponding lung masks. The training images are enhanced and re-sized to 256 x 256 before feeding to the network. The model is trained at The Ohio State University Wexner Medical Center, Department of Radiology, using Python, Tensorflow Keras API, and trained on an NVIDIA QuadroGV100 system with CUDA/CuDNNv9 dependencies.
Code:
https://github.com/MIDRC/COVID19_Lung_Segmentation_CXR_OSU-UNet
-
The model is trained with CT sequences and the corresponding lung masks. The training images are enhanced and re-sized to 256 x 256 before feeding to the network. The model is trained at The Ohio State University Wexner Medical Center, Department of Radiology, using Python, Tensorflow Keras API, and trained on an NVIDIA QuadroGV100 system with CUDA/CuDNNv9 dependencies.
Code: https://github.com/MIDRC/COVID19_Lung_Segmentation_CT_OSU-UNet
-
ViLMedic is a modular framework for vision and language multimodal research in the medical field.
This library contains reference implementations of state-of-the-art vision and language architectures, referred as “blocks” and full solutions for multimodal medical tasks using one or several blocks.
Code: https://vilmedic.app/, https://github.com/jbdel/vilmedic
-
RoentGen is a generative vision-language model to create chest x-rays based on radiological text inputs.
-
A classification model for COVID-19 detection on Chest X-Rays.
Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_DenseNet
-
The American College of Radiology developed a chest x-ray COVID-19 classification algorithm by training on the labeled CXR MIDRC data.
Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_ACR
-
Notebooks and materials for cohort building for MIDRC Grand Challenges. The COVIDx challenge concerned the classification of portable chest radiographs for COVID-19. The mRALE Mastermind Challenge involved AI to predict COVID severity on portable chest radiographs.
Materials for COVIDx: https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2022_COVIDx
Materials for mRALE Mastermind: https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2023_mRALE%20Mastermind
-
MIDRC AI Interface for Covid (MAIIC) provides an interface for easy prototyping and testing of AI algorithms for AI researchers and physicians.
-
MIDRC collaborators at Argonne National Laboratory developed the Advanced Privacy Preserving Federated Learning (APPFL) framework for federated learning scenarios in which data privacy can be maintained across communication through differential privacy.
Code:
Coming soon!
Documentation:
-
The MIDRC-LOINC mapping table serves as a tool for standardizing DICOM metadata, particularly for secondary research endeavors such as AI studies. By translating DICOM image terms into LOINC codes and Long Common Names, this resource streamlines cohort selection based on essential attributes like body region and contrast presence. Its regular updates, managed by the MIDRC Data Quality and Harmonization subcommittee, ensures ongoing relevance and utility for the broader research community.
Code:
-
Jupyter or R notebooks that demonstrate how to build cohorts via queries and access associated metadata and files in MIDRC using Python or R code.
Code: https://github.com/MIDRC/tutorial_notebooks
Where to find in the data portal: https://data.midrc.org/resource-browser
-
The COVIDx challenge task was the classification of portable chest radiographs for COVID-19.
First place: Ran Zhang, Dalton Griner, Guang-Hong Chen
Second place: Mathieu Goulet
Third place: Finn Behrendt
-
1st place: Ian Pan (Brigham and Women’s Hospital)
2nd place: Ran Zhang (University of Wisconsin-Madison)
code: currently not available due to potential regulatory approval
3rd place: Finn Behrendt (University of Technology Hamburg)
4th place: Team: Christian Mattjie, Luis Vinicius de Moura, Rafaela Cappelari Ravazio, Otavio Parraga, Luca Silveira Kupssinskü, Adilson Medronha, and Rodrigo Coelho Barros (Pontificia Universidade Católica do Rio Grande do Sul)
5th place: Yijie Yuan (Johns Hopkins Medical)
6th place: Team: Cohen Archbold, Imran Abdullah-Al-Zubaer, Atik Ahamed (University of Kentucky)
7th place: Mathieu Goulet (Centre régional intégré de cancérologie)
8th place: Team: Yifan Wu, Hayden Gunraj, Chengzong Zhao, Yuhao Chen, Alexander Wong, Pengcheng Xi (University of Waterloo)
9th place: Team: Stanley Liang, Sameer Antani, Zhiyun Xue, Sivaramakrishnan Rajaraman, Feng Yang (NIH National Library of Medicine, Computational Health Research Branch)
Questions? Check out our answers to frequently asked questions!
How to acknowledge 1) MIDRC funded research and 2) use of data downloaded from the MIDRC Data Commons