Algorithms — MIDRC

This algorithm uses multi-dimensional stratified sampling where several variables of interest (such as demographics - race, gender, imaging acquisition system) can be sequentially used to divide the data into numerous strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to a specific dataset. This algorithm was developed and is used by MIDRC for separation of data into either the open data commons or the sequestered data commons. However, as shared here by MIDRC, it can be generalized by users for other needs for stratified sampling, e.g., dividing your own dataset into a two separate sets: one for training and one for testing.

Code

COVID-specific: https://github.com/MIDRC/Stratified_Sampling

General: https://github.com/MIDRC/Generalized_Stratified_Sampling

Task based sampling begins with the identification of cases relevant for a specific task and target population demographic characteristics (such as age range, COVID status, and imaging modality). Then, optimized quota sampling is conducted by randomly sampling cases until the maximum category margin (Baughan et al. 2022) is less than a pre-specified value. N. Baughan et al., “Task-Based Sampling of the MIDRC Sequestered Data Commons for Algorithm Performance Evaluation,” presented at Annual Meeting of the American Association of Physicists in Medicine, 2022, E257–E258).

Code:

https://github.com/MIDRC/task-based-sampling

A document-level classifier for COVID-19 on radiology reports to help find COVID cases, as well as create large numbers of labels for computer vision models.

Code: https://huggingface.co/StanfordAIMI/covid-radbert

Publication: https://pubmed.ncbi.nlm.nih.gov/36323915/

An automated de-identification pipeline for radiology reports that detects protected health information (PHI) entities and replaces them with realistic surrogates "hiding in plain sight." Our model outperformed all de-identifiers as well as human labelers when it was compared on all test sets of i2b2 2014 data. It enables accurate and automatic de-identification of radiology reports.

Code: https://huggingface.co/StanfordAIMI/stanford-deidentifier-base

Publication: https://www.ncbi.nlm.nih.gov/pubmed/36416419

RadBERT is a transformer that was continuously pre-trained on radiology reports from a BioBERT initialization.

Code: https://huggingface.co/StanfordAIMI/RadBERT

A knowledge graph for radiology reports to further improve the factual completeness and correctness of generated radiology reports. More precisely, we leverage the RadGraph dataset containing annotated chest X-ray reports with entities and relations between entities. On two open radiology report datasets, our system substantially improves the scores up to 14.2% and 25.3% on metrics evaluating the factual correctness and completeness of reports.

Code: https://physionet.org/content/radgraph/1.0.0/

Publication: https://aclanthology.org/2022.findings-emnlp.319/

An end-to-end pipeline for the classification of chest X-rays that may belong to COVID-19 positive patients to enable real time diagnosis of the virus in the field without having to wait 24-48 hours for the results of an RT-PCR test or the less accurate results of a rapid antigen test.

Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_Emory-ResNet50

COVID19 Disease Trajectory Prediction using Xrays and EHR, this model predicts a label for each chest X-ray.

Code:

https://github.com/amaratariq/COVID19_GNN_public

The model is trained with JSRT data and the corresponding lung masks. The training images are enhanced and re-sized to 256 x 256 before feeding to the network. The model is trained at The Ohio State University Wexner Medical Center, Department of Radiology, using Python, Tensorflow Keras API, and trained on an NVIDIA QuadroGV100 system with CUDA/CuDNNv9 dependencies.

Code:

https://github.com/MIDRC/COVID19_Lung_Segmentation_CXR_OSU-UNet

The model is trained with CT sequences and the corresponding lung masks. The training images are enhanced and re-sized to 256 x 256 before feeding to the network. The model is trained at The Ohio State University Wexner Medical Center, Department of Radiology, using Python, Tensorflow Keras API, and trained on an NVIDIA QuadroGV100 system with CUDA/CuDNNv9 dependencies.

Code: https://github.com/MIDRC/COVID19_Lung_Segmentation_CT_OSU-UNet

ViLMedic is a modular framework for vision and language multimodal research in the medical field.

This library contains reference implementations of state-of-the-art vision and language architectures, referred as “blocks” and full solutions for multimodal medical tasks using one or several blocks.

Code: https://vilmedic.app/, https://github.com/jbdel/vilmedic

RoentGen is a generative vision-language model to create chest x-rays based on radiological text inputs.

Code: https://stanfordmimi.github.io/RoentGen/

A classification model for COVID-19 detection on Chest X-Rays.

Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_DenseNet

The American College of Radiology developed a chest x-ray COVID-19 classification algorithm by training on the labeled CXR MIDRC data.

Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_ACR

Notebooks and materials for cohort building for MIDRC Grand Challenges. The COVIDx challenge concerned the classification of portable chest radiographs for COVID-19. The mRALE Mastermind Challenge involved AI to predict COVID severity on portable chest radiographs.

Materials for COVIDx: https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2022_COVIDx

Materials for mRALE Mastermind: https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2023_mRALE%20Mastermind

MIDRC AI Interface for Covid (MAIIC) provides an interface for easy prototyping and testing of AI algorithms for AI researchers and physicians.

Code: https://github.com/MIDRC/COVID19_CRP10-AIInterface

MIDRC collaborators at Argonne National Laboratory developed the Advanced Privacy Preserving Federated Learning (APPFL) framework for federated learning scenarios in which data privacy can be maintained across communication through differential privacy.

Code:

Coming soon!

Documentation:

https://appfl.readthedocs.io/en/latest/index.html

Publication: https://ieeexplore.ieee.org/abstract/document/9835407?casa_token=a8LeXteVcDwAAAAA:nsOtdixRKhxx7ua0qjTckBFiaWOxL4gt-wlmfCLAnCibLu-cs40U6AtrLKn5eXT-JtnBlg

The MIDRC-LOINC mapping table serves as a tool for standardizing DICOM metadata, particularly for secondary research endeavors such as AI studies. By translating DICOM image terms into LOINC codes and Long Common Names, this resource streamlines cohort selection based on essential attributes like body region and contrast presence. Its regular updates, managed by the MIDRC Data Quality and Harmonization subcommittee, ensures ongoing relevance and utility for the broader research community.

Code:

https://github.com/MIDRC/midrc_dicom_harmonization

Jupyter or R notebooks that demonstrate how to build cohorts via queries and access associated metadata and files in MIDRC using Python or R code.

Code: https://github.com/MIDRC/tutorial_notebooks

Where to find in the data portal: https://data.midrc.org/resource-browser

The COVIDx challenge task was the classification of portable chest radiographs for COVID-19.

First place: Ran Zhang, Dalton Griner, Guang-Hong Chen

Code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2022_COVIDx/Winning%20Challenge%20Submissions/winner_algorithm_description.md

Second place: Mathieu Goulet

Code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2022_COVIDx/Winning%20Challenge%20Submissions/runner_up_algorithm_description.md

Third place: Finn Behrendt

Code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2022_COVIDx/Winning%20Challenge%20Submissions/third_place_algorithm_description.md

1st place: Ian Pan (Brigham and Women’s Hospital)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/winner_algorithm_description.md

2nd place: Ran Zhang (University of Wisconsin-Madison)

code: currently not available due to potential regulatory approval

3rd place: Finn Behrendt (University of Technology Hamburg)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/third_place_algorithm_description.md

4th place: Team: Christian Mattjie, Luis Vinicius de Moura, Rafaela Cappelari Ravazio, Otavio Parraga, Luca Silveira Kupssinskü, Adilson Medronha, and Rodrigo Coelho Barros (Pontificia Universidade Católica do Rio Grande do Sul)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/fourth_place_algorithm_description.md

5th place: Yijie Yuan (Johns Hopkins Medical)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/fifth_place_algorithm_description.md

6th place: Team: Cohen Archbold, Imran Abdullah-Al-Zubaer, Atik Ahamed (University of Kentucky)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/sixth_place_algorithm_description.md

7th place: Mathieu Goulet (Centre régional intégré de cancérologie)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/seventh_place_algorithm_description.md

8th place: Team: Yifan Wu, Hayden Gunraj, Chengzong Zhao, Yuhao Chen, Alexander Wong, Pengcheng Xi (University of Waterloo)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/eighth_place_algorithm_description.md

9th place: Team: Stanley Liang, Sameer Antani, Zhiyun Xue, Sivaramakrishnan Rajaraman, Feng Yang (NIH National Library of Medicine, Computational Health Research Branch)

code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/ninth_place_algorithm_description.md

Algorithms.

More about MIDRC

MIDRC data

Support

Algorithms.

TDP3d: Stratified sampling for dataset splitting

TDP3d: Task-based sampling

CRP1: COVID detection in radiology reports

CRP1: De-identification of radiology reports

CRP1: RadBERT

CRP1: RadGraph

CRP2: End-to-end pipeline for the classification of chest X-rays for COVID-19

CRP2: GNN model for COVID19 disease trajectory prediction

CRP2: Chest X-ray lung segmentation model based on U-Net

CRP2: Lung Segmentation Model for CT

CRP4: ViLMedic: A framework for research at the intersection of vision and language in medical AI

CRP4: Roentgen: Vision-language foundation model for chest X-ray generation

CRP4: COVID-19 Diagnosis based on a DenseNet architecture

CRP5: Classifier for COVID-19 on lung chest radiographs

CRP9: COVIDx and mRALE Mastermind cohort building

CRP10: Interface for easy prototyping and testing of AI algorithms

CRP10: Advanced privacy preserving federated learning framework

CRP12: LOINC mapping

Gen3: Example cohort building notebooks

AI models from MIDRC COVIDx challenge participants

AI models from the MIDRC mRALE Mastermind Challenge

More about MIDRC

MIDRC data

Support