A modular, GPU-accelerated image processing pipeline for multiplexed immunofluorescence imaging data from CODEX (Co-Detection by indexing) systems. It transforms raw multi-cycle, multi-channel microscopy images into analysis-ready SpatialData datasets through a configurable sequence of processing steps.
- Richardson-Lucy Deconvolution — restores image sharpness using Gibson-Lanni PSF models with GPU support via flowdec
- Extended Depth of Field (EDoF) — collapses z-stacks into single focused images using Sobel or Dual-Tree Complex Wavelet methods
- Illumination Correction — removes spatial shading artifacts with the BaSiC algorithm
- Tile Stitching — assembles overlapping tiles into seamless mosaics via Ashlar or a built-in M2Stitch module
- Background Correction — subtracts autofluorescence using linear interpolation or adaptive probe-based modeling
- TMA Dearraying — automatically detects and extracts tissue cores from Tissue Microarrays using a U-Net segmentation model (Coreograph)
- SpatialData Export — writes processed images to the SpatialData Zarr format with multi-resolution pyramids
All steps are optional and independently configurable through Hydra.
- Python ≥ 3.11
- CUDA 12-capable GPU (recommended for deconvolution and EDoF)
- ~100 GB RAM for large TMA datasets
conda env create -f env_cuda12.yml conda activate codex_preppip install -e .PyTorch with CUDA 12.4 support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124pytorch_wavelets (required for wavelet-based EDoF):
pip install git+https://github.com/fbcotter/pytorch_waveletsRun the full pipeline on a CODEX dataset:
python main.py data.root_dir=/path/to/raw/dataPredefined experiment configurations live in config/experiment/. Run one with the +experiment flag:
python main.py +experiment=preprocess_cccDisable individual steps using Hydra's ~ (delete) syntax:
python main.py +experiment=preprocess_ccc \ ~pipeline.deconvolution \ ~pipeline.stitchingOverride any config value from the command line:
python main.py +experiment=preprocess_ccc \ pipeline.deconvolution.algorithm.n_iter=30 \ pipeline.deconvolution.algorithm.use_gpu=true \ pipeline.remove_intermediate=trueThe pipeline is configured via Hydra YAML files under config/.
config/ ├── preprocess.yaml # Main config (entry point) ├── data/ │ └── raw_codex.yaml # Dataset configuration ├── experiment/ │ └── preprocess_ccc.yaml # Experiment presets ├── pipeline/ │ ├── default.yaml # Pipeline defaults │ ├── deconvolution/ # Deconvolution settings │ ├── edof/ # EDoF algorithm selection │ ├── illumination_correction/ # BaSiC parameters │ ├── stitching/ # Ashlar / M2Stitch settings │ ├── background_correction/ # Background subtraction │ ├── tma_dearray/ # Core detection parameters │ └── data_export/ # SpatialData export options └── hydra/ └── default.yaml # Hydra runtime settings Point the pipeline to your raw CODEX data directory. The expected layout follows the standard CODEX file convention:
raw_data/ ├── experimentV4.json # Experiment metadata ├── cyc001_reg001/ │ ├── 1_00001_Z001_CH1.tif │ ├── 1_00001_Z001_CH2.tif │ └── ... ├── cyc002_reg001/ │ └── ... └── ... Set the root directory in your config or on the command line:
# config/data/raw_codex.yaml _target_: codex_preprocessing.data.CodexDataset root_dir: ??? # Required — path to raw data mode: raw # "raw" or "proc" (CODEX Processor format) lazy_loading: false # Enable dask-based lazy loading for large datasets read_markers: false # Load marker names from metadataToggle steps and select algorithms in config/pipeline/default.yaml:
defaults: - deconvolution: default - edof: focus_whiten # or: focus_wavelet - illumination_correction: default - stitching: default - background_correction: default - tma_dearray: null # Enable with: tma_dearray: default - data_export: default remove_intermediate: false # Delete intermediate outputs to save disk space├── main.py # Entry point ├── config/ # Hydra configuration files ├── src/codex_preprocessing/ │ ├── pipeline.py # Pipeline orchestration │ ├── nodes.py # Abstract node / parallel execution logic │ ├── data/ │ │ ├── dataset.py # CodexDataset class │ │ └── metadata.py # Experiment metadata parser │ ├── io/ │ │ ├── reader.py # Raw & processed data readers │ │ └── writer.py # TIFF writers │ ├── modules/ # Processing algorithms │ │ ├── deconvolution.py # Richardson-Lucy deconvolution │ │ ├── edof.py # Extended depth of field │ │ ├── illumination.py # BaSiC illumination correction │ │ ├── stitching.py # Ashlar / M2Stitch stitching │ │ ├── background_correction.py │ │ ├── tma_dearray.py # Coreograph TMA dearraying │ │ └── spatialdata_exporter.py # SpatialData Zarr export │ ├── models/coreograph/ # U-Net model for TMA segmentation │ └── utils/ # Image & general utilities ├── weights/coreograph/ # Pre-trained U-Net weights ├── notebooks/ # Jupyter notebooks for testing individual steps ├── env_cuda12.yml # Conda environment (CUDA 12) └── pyproject.toml # Package metadata Interactive Jupyter notebooks are provided in notebooks/ for testing and debugging individual pipeline steps:
| Notebook | Purpose |
|---|---|
test_deconvolution.ipynb | Richardson-Lucy deconvolution |
test_edof.ipynb | Extended depth of field |
test_illumination.ipynb | BaSiC illumination correction |
test_stitching.ipynb | Tile stitching |
test_bg_sub.ipynb | Background subtraction |
test_dearray.ipynb | TMA dearraying |
test_ometif.ipynb | OME-TIFF export |
Contributions are welcome. To get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Make your changes and ensure existing functionality is preserved
- Submit a pull request