Technology Development Project 1

Principal Investigators: Curtis Langlotz and Adam Flanders.

Create an open discovery platform for COVID-19 imaging and associated data.

Updated January 20, 2023

The paramount purpose of this project is to obtain and openly distribute curated (COVID-19) imaging and associated clinical and technical data as rapidly and widely as possible. This open platform empowers a broad community of data scientists to answer, quickly and rigorously, identified critical questions about patient care. 


Develop a large-scale open repository of multi-institutional imaging data and associated clinical data.

John Mongan, (University of California-San Francisco)


This project has been working on constructing the information technology infrastructure for COVID-19 data management as a foundation for the open discovery platform. This discovery platform has driven progress through harmonized data curation and labeling methods, hosting of data science challenges, and benchmarking of algorithm performance. 

Now, this project will develop a more flexible and redundant curation infrastructure to meet the increasing volume of images being processed. The hardware infrastructure will be parallelized so multiple curators can process a bolus of images simultaneously. Emphasis will be placed on the design and creation of a cloud-based intake and storage platform that enables contributors to submit images easily to RSNA for curation and inclusion in MIDRC. Both 3rd party tools and custom solutions will be considered.

Example chest CT image

Example chest CT image

Example annotations of a MIDRC dataset

Example annotations of a MIDRC dataset

Engage the data science community through crowd-sourced labeling and Data Science Challenges.

Adam Flanders (Jefferson University) and Carol Wu (MD Anderson Cancer Center)


This project has organized and evaluated a series of challenges on COVID-19 use cases by leveraging recent experience conducting annual clinical imaging data science challenges focusing on clinically relevant use cases. 

After two successful challenges in the initial funding period, additional challenges will be conducted -- predicting COVID severity on CXR and CT at a single time point using expert annotation schemes devised for COVID; change detection on paired image sets obtained within 24 hours. Each lung field will be mapped into three distinct zones and annotators will provide a severity score for each zone; prediction of clinical outcomes parameters such as hospital admission, length of stay, and the need for respiratory support using a combination of imaging and clinical parameters; and others.  We will also design and develop an open applications programming interface (API) for labeling of sequestered and non-sequestered images in Gen3. 

Provide algorithm benchmarking services for COVID-19 algorithms.

Adam Flanders (Jefferson University) and Carol Wu (MD Anderson Cancer Center)


Following each Data Science Challenge or other cohort development activity, this project has labeled the retrospective data cohorts so that these remain readily available through MIDRC for testing and comparison of algorithms addressing the same use case.

This project will continue research into the use of ‘Helper AI’ models to catalog image data accurately and efficiently, thereby providing more accurate metadata than just current DICOM tags.  This additional information will significantly enhance the existing MIDRC archive and allow researchers and others to better utilize the collected data for AI development and hence provide more detailed indexed anatomy classes for researchers.

An open-source DICOM anonymizer software application on an ML-compatible technology stack.

Adam Flanders (Jefferson University) and George Shih (Cornell University) 


This project started in funding year 3 and is working on a new MIDRC Anonymizer tool that will:

  • De-identify DICOM header information, removing, pseudonymizing or preserving specific header elements based on an easily configurable script (with a default script that conforms to standard DICOM protocols). 

  • Assign a pseudonymous value to each new patient it encounters and uses that value for all studies of the patient. 

  • Shift all study dates for a patient by a unique number of days determined by a hashing mechanism.

  • Log all transactions and provide a table for contributing sites that maps pseudonymous values back to the original identifiers. 


Next
Next

TDP 2: Create a real-world testing and implementation platform...