Skip to content

ajortizg/codex_preprocessing

Repository files navigation

CODEX Preprocessing

A modular, GPU-accelerated image processing pipeline for multiplexed immunofluorescence imaging data from CODEX (Co-Detection by indexing) systems. It transforms raw multi-cycle, multi-channel microscopy images into analysis-ready SpatialData datasets through a configurable sequence of processing steps.

Key Features

  • Richardson-Lucy Deconvolution — restores image sharpness using Gibson-Lanni PSF models with GPU support via flowdec
  • Extended Depth of Field (EDoF) — collapses z-stacks into single focused images using Sobel or Dual-Tree Complex Wavelet methods
  • Illumination Correction — removes spatial shading artifacts with the BaSiC algorithm
  • Tile Stitching — assembles overlapping tiles into seamless mosaics via Ashlar or a built-in M2Stitch module
  • Background Correction — subtracts autofluorescence using linear interpolation or adaptive probe-based modeling
  • TMA Dearraying — automatically detects and extracts tissue cores from Tissue Microarrays using a U-Net segmentation model (Coreograph)
  • SpatialData Export — writes processed images to the SpatialData Zarr format with multi-resolution pyramids

All steps are optional and independently configurable through Hydra.

Requirements

  • Python ≥ 3.11
  • CUDA 12-capable GPU (recommended for deconvolution and EDoF)
  • ~100 GB RAM for large TMA datasets

Installation

1. Create the conda environment

conda env create -f env_cuda12.yml conda activate codex_prep

2. Install the package

pip install -e .

3. Install additional GPU dependencies

PyTorch with CUDA 12.4 support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

pytorch_wavelets (required for wavelet-based EDoF):

pip install git+https://github.com/fbcotter/pytorch_wavelets

Usage

Quick Start

Run the full pipeline on a CODEX dataset:

python main.py data.root_dir=/path/to/raw/data

Using Experiment Configs

Predefined experiment configurations live in config/experiment/. Run one with the +experiment flag:

python main.py +experiment=preprocess_ccc

Skipping Pipeline Steps

Disable individual steps using Hydra's ~ (delete) syntax:

python main.py +experiment=preprocess_ccc \ ~pipeline.deconvolution \ ~pipeline.stitching

Overriding Parameters

Override any config value from the command line:

python main.py +experiment=preprocess_ccc \ pipeline.deconvolution.algorithm.n_iter=30 \ pipeline.deconvolution.algorithm.use_gpu=true \ pipeline.remove_intermediate=true

Configuration

The pipeline is configured via Hydra YAML files under config/.

config/ ├── preprocess.yaml # Main config (entry point) ├── data/ │ └── raw_codex.yaml # Dataset configuration ├── experiment/ │ └── preprocess_ccc.yaml # Experiment presets ├── pipeline/ │ ├── default.yaml # Pipeline defaults │ ├── deconvolution/ # Deconvolution settings │ ├── edof/ # EDoF algorithm selection │ ├── illumination_correction/ # BaSiC parameters │ ├── stitching/ # Ashlar / M2Stitch settings │ ├── background_correction/ # Background subtraction │ ├── tma_dearray/ # Core detection parameters │ └── data_export/ # SpatialData export options └── hydra/ └── default.yaml # Hydra runtime settings 

Data Configuration

Point the pipeline to your raw CODEX data directory. The expected layout follows the standard CODEX file convention:

raw_data/ ├── experimentV4.json # Experiment metadata ├── cyc001_reg001/ │ ├── 1_00001_Z001_CH1.tif │ ├── 1_00001_Z001_CH2.tif │ └── ... ├── cyc002_reg001/ │ └── ... └── ... 

Set the root directory in your config or on the command line:

# config/data/raw_codex.yaml _target_: codex_preprocessing.data.CodexDataset root_dir: ??? # Required — path to raw data mode: raw # "raw" or "proc" (CODEX Processor format) lazy_loading: false # Enable dask-based lazy loading for large datasets read_markers: false # Load marker names from metadata

Pipeline Configuration

Toggle steps and select algorithms in config/pipeline/default.yaml:

defaults: - deconvolution: default - edof: focus_whiten # or: focus_wavelet - illumination_correction: default - stitching: default - background_correction: default - tma_dearray: null # Enable with: tma_dearray: default - data_export: default remove_intermediate: false # Delete intermediate outputs to save disk space

Project Structure

├── main.py # Entry point ├── config/ # Hydra configuration files ├── src/codex_preprocessing/ │ ├── pipeline.py # Pipeline orchestration │ ├── nodes.py # Abstract node / parallel execution logic │ ├── data/ │ │ ├── dataset.py # CodexDataset class │ │ └── metadata.py # Experiment metadata parser │ ├── io/ │ │ ├── reader.py # Raw & processed data readers │ │ └── writer.py # TIFF writers │ ├── modules/ # Processing algorithms │ │ ├── deconvolution.py # Richardson-Lucy deconvolution │ │ ├── edof.py # Extended depth of field │ │ ├── illumination.py # BaSiC illumination correction │ │ ├── stitching.py # Ashlar / M2Stitch stitching │ │ ├── background_correction.py │ │ ├── tma_dearray.py # Coreograph TMA dearraying │ │ └── spatialdata_exporter.py # SpatialData Zarr export │ ├── models/coreograph/ # U-Net model for TMA segmentation │ └── utils/ # Image & general utilities ├── weights/coreograph/ # Pre-trained U-Net weights ├── notebooks/ # Jupyter notebooks for testing individual steps ├── env_cuda12.yml # Conda environment (CUDA 12) └── pyproject.toml # Package metadata 

Notebooks

Interactive Jupyter notebooks are provided in notebooks/ for testing and debugging individual pipeline steps:

Notebook Purpose
test_deconvolution.ipynb Richardson-Lucy deconvolution
test_edof.ipynb Extended depth of field
test_illumination.ipynb BaSiC illumination correction
test_stitching.ipynb Tile stitching
test_bg_sub.ipynb Background subtraction
test_dearray.ipynb TMA dearraying
test_ometif.ipynb OME-TIFF export

Contributing

Contributions are welcome. To get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Make your changes and ensure existing functionality is preserved
  4. Submit a pull request

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors