Skip to content

radxrad/radx-harmonizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

90 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RADx Harmonizer: Data Validation and Harmonization Toolkit for RADx-rad Submissions

This repository provides tools for validating and harmonizing datasets submitted to the RADx-rad Data Coordinating Center (DCC) for integration into the NIH RADx Data Hub. It includes utilities to process and convert raw submission files into standardized formats for downstream use.

The following RADx-rad datasets have been harmonized using this toolkit and are available in the NIH RADx Data Hub.


πŸ“‚ Directory Structure for RADx-DCC Data Harmonization

RADx-rad study datasets must follow this structure before harmonization:

data_harmonized/ └── rad_xxx_yyy-zz/ # Unique study directory └── preorigcopy/ # Raw submitted files β”œβ”€β”€ rad_xxx_yyy-zz_label_DATA_preorigcopy.csv β”œβ”€β”€ rad_xxx_yyy-zz_label_DICT_preorigcopy.csv β”œβ”€β”€ rad_xxx_yyy-zz_label_META_preorigcopy.csv └── ... 

Each label is a unique user-defined string that describes each triplet of files (data, dictionary, metadata).


πŸ›  Harmonization Workflow

Run the following steps for each study (rad_xxx_yyy-zz), fixing any reported errors along the way.

1. Phase 1 – Validate Submission Files

cd src python phase1.py -include rad_xxx_yyy-zz
  • Output: work/phase1_errors.csv
  • Fix files in preorigcopy/ and rerun if needed.

2. Phase 2 – Standardize and Validate Copies in work Directory

python phase2.py -include rad_xxx_yyy-zz
  • Output: work/phase2_errors.csv
  • Fix files in work/ and rerun if needed.

3. Phase 3 – Harmonize Data

python phase3.py -include rad_xxx_yyy-zz
  • Output directories:
    • origcopy/: Harmonized raw submission files
    • transformcopy/: Globally harmonized Tier 1 files (optional)
  • Errors: work/phase3_errors.csv

4. Upload to NIH RADx Data Hub

Submit the origcopy/ and, if available, transformcopy/ directories to the NIH RADx Data Hub.


βš™οΈ Setup Instructions

Prerequisites

# Update Conda and install prerequisites conda update conda # Install git if not present conda install git -n base -c anaconda # Install Java 17 if not present

πŸ“₯ Download Required Files

1. Clone Repositories

git clone https://github.com/radxrad/metadata.git git clone https://github.com/radxrad/radx-harmonizer.git cd radx-harmonizer

2. Download Validation Tools

mkdir source # Data Dictionary Validator wget -P source/ https://github.com/bmir-radx/radx-data-dictionary-validator/releases/download/v1.3.4/radx-data-dictionary-validator-app-1.3.4.jar # Metadata Validator wget -P source/ https://github.com/bmir-radx/radx-metadata-validator/releases/download/v1.0.6/radx-metadata-validator-app-1.0.6.jar # Metadata Compiler wget -P source/ https://github.com/bmir-radx/radx-rad-metadata-compiler/releases/download/v1.0.3/radx-rad-metadata-compiler-1.0.3.jar

3. Download Specifications and Dictionaries

mkdir reference # Metadata Specification wget -P reference/ https://github.com/bmir-radx/radx-metadata-validator/releases/download/v1.0.6/RADxMetadataSpecification.json # Global Tier1 Dictionary wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-global_tier1_dict_2025-06-24.csv # RADx-rad Tier1 and Tier2 Dictionaries wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_tier1_dict_2025-06-24.csv wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_tier2_dict_2025-06-24.csv # Legacy Dictionary wget -P reference/ https://raw.githubusercontent.com/radxrad/common-data-elements/refs/heads/main/cdes/RADx-rad_legacy_dict_2025-06-24.csv

4. Copy the Metadata Template Files

mkdir meta cp ../metadata/metadata_templates/*.csv meta

πŸ§ͺ Environment Setup

Create and activate the project environment using the provided environment.yml.

conda env create -f environment.yml conda activate radx-harmonizer

To deactivate:

conda deactivate

πŸ“š Related Resources

Resource Description
RADx Data Dictionary Specification Specification of the RADx Data Dictionary format
RADx-rad Data Dictionaries Tier 1 (RADx global) and Tier 2 (RADx-rad-specific) data elements
RADx-rad Metadata Study-specific metadata files
RADx-rad Publications List of publications related to RADx-rad objectives
RADx-rad Tech Data Organization Description how data for diagnostic methods development are organized

πŸ“ Citation

Peter W. Rose, RADx-rad Harmonizer: Data Validation and Harmonization Toolkit for Data Submissions, Available online: https://github.com/radxrad/radx-harmonizer (2025)


πŸ’° Funding

Supported by the Office of the Director, National Institutes of Health under:

RADx-Rad Discoveries & Data: Consortium Coordination Center Program Organization
Grant: 7U24LM013755

About

Validation and Harmonization Toolkit for RADx-rad Submissions

Topics

Resources

License

Stars

Watchers

Forks

Languages