This repository provides tools for validating and harmonizing datasets submitted to the RADx-rad Data Coordinating Center (DCC) for integration into the NIH RADx Data Hub. It includes utilities to process and convert raw submission files into standardized formats for downstream use.
The following RADx-rad datasets have been harmonized using this toolkit and are available in the NIH RADx Data Hub.
RADx-rad study datasets must follow this structure before harmonization:
data_harmonized/ βββ rad_xxx_yyy-zz/ # Unique study directory βββ preorigcopy/ # Raw submitted files βββ rad_xxx_yyy-zz_label_DATA_preorigcopy.csv βββ rad_xxx_yyy-zz_label_DICT_preorigcopy.csv βββ rad_xxx_yyy-zz_label_META_preorigcopy.csv βββ ... Each label is a unique user-defined string that describes each triplet of files (data, dictionary, metadata).
Run the following steps for each study (rad_xxx_yyy-zz), fixing any reported errors along the way.
cd src python phase1.py -include rad_xxx_yyy-zz- Output:
work/phase1_errors.csv - Fix files in
preorigcopy/and rerun if needed.
python phase2.py -include rad_xxx_yyy-zz- Output:
work/phase2_errors.csv - Fix files in
work/and rerun if needed.
python phase3.py -include rad_xxx_yyy-zz- Output directories:
origcopy/: Harmonized raw submission filestransformcopy/: Globally harmonized Tier 1 files (optional)
- Errors:
work/phase3_errors.csv
Submit the origcopy/ and, if available, transformcopy/ directories to the NIH RADx Data Hub.
- Miniconda3
- Git
- Java 17
# Update Conda and install prerequisites conda update conda # Install git if not present conda install git -n base -c anaconda # Install Java 17 if not presentgit clone https://github.com/radxrad/metadata.git git clone https://github.com/radxrad/radx-harmonizer.git cd radx-harmonizermkdir source # Data Dictionary Validator wget -P source/ https://github.com/bmir-radx/radx-data-dictionary-validator/releases/download/v1.3.4/radx-data-dictionary-validator-app-1.3.4.jar # Metadata Validator wget -P source/ https://github.com/bmir-radx/radx-metadata-validator/releases/download/v1.0.6/radx-metadata-validator-app-1.0.6.jar # Metadata Compiler wget -P source/ https://github.com/bmir-radx/radx-rad-metadata-compiler/releases/download/v1.0.3/radx-rad-metadata-compiler-1.0.3.jarmkdir reference # Metadata Specification wget -P reference/ https://github.com/bmir-radx/radx-metadata-validator/releases/download/v1.0.6/RADxMetadataSpecification.json # Global Tier1 Dictionary wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-global_tier1_dict_2025-06-24.csv # RADx-rad Tier1 and Tier2 Dictionaries wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_tier1_dict_2025-06-24.csv wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_tier2_dict_2025-06-24.csv # Legacy Dictionary wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_legacy_dict_2025-06-24.csvmkdir meta cp ../metadata/metadata_templates/*.csv metaCreate and activate the project environment using the provided environment.yml.
conda env create -f environment.yml conda activate radx-harmonizerTo deactivate:
conda deactivate| Resource | Description |
|---|---|
| RADx Data Dictionary Specification | Specification of the RADx Data Dictionary format |
| RADx-rad Data Dictionaries | Tier 1 (RADx global) and Tier 2 (RADx-rad-specific) data elements |
| RADx-rad Metadata | Study-specific metadata files |
| RADx-rad Publications | List of publications related to RADx-rad objectives |
| RADx-rad Tech Data Organization | Description how data for diagnostic methods development are organized |
Peter W. Rose, RADx-rad Harmonizer: Data Validation and Harmonization Toolkit for Data Submissions, Available online: https://github.com/radxrad/radx-harmonizer (2025)
Supported by the Office of the Director, National Institutes of Health under:
RADx-Rad Discoveries & Data: Consortium Coordination Center Program Organization
Grant: 7U24LM013755