CODEX Preprocessing

A modular, GPU-accelerated image processing pipeline for multiplexed immunofluorescence imaging data from CODEX (Co-Detection by indexing) systems. It transforms raw multi-cycle, multi-channel microscopy images into analysis-ready SpatialData datasets through a configurable sequence of processing steps.

Key Features

Richardson-Lucy Deconvolution — restores image sharpness using Gibson-Lanni PSF models with GPU support via flowdec
Extended Depth of Field (EDoF) — collapses z-stacks into single focused images using Sobel or Dual-Tree Complex Wavelet methods
Illumination Correction — removes spatial shading artifacts with the BaSiC algorithm
Tile Stitching — assembles overlapping tiles into seamless mosaics via Ashlar or a built-in M2Stitch module
Background Correction — subtracts autofluorescence using linear interpolation or adaptive probe-based modeling
TMA Dearraying — automatically detects and extracts tissue cores from Tissue Microarrays using a U-Net segmentation model (Coreograph)
SpatialData Export — writes processed images to the SpatialData Zarr format with multi-resolution pyramids

All steps are optional and independently configurable through Hydra.

Requirements

Python ≥ 3.11
CUDA 12-capable GPU (recommended for deconvolution and EDoF)
~100 GB RAM for large TMA datasets

Installation

1. Create the conda environment

conda env create -f env_cuda12.yml conda activate codex_prep

2. Install the package

pip install -e .

3. Install additional GPU dependencies

PyTorch with CUDA 12.4 support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

pytorch_wavelets (required for wavelet-based EDoF):

pip install git+https://github.com/fbcotter/pytorch_wavelets

Usage

Quick Start

Run the full pipeline on a CODEX dataset:

python main.py data.root_dir=/path/to/raw/data

Using Experiment Configs

Predefined experiment configurations live in config/experiment/. Run one with the +experiment flag:

python main.py +experiment=preprocess_ccc

Skipping Pipeline Steps

Disable individual steps using Hydra's ~ (delete) syntax:

python main.py +experiment=preprocess_ccc \ ~pipeline.deconvolution \ ~pipeline.stitching

Overriding Parameters

Override any config value from the command line:

python main.py +experiment=preprocess_ccc \ pipeline.deconvolution.algorithm.n_iter=30 \ pipeline.deconvolution.algorithm.use_gpu=true \ pipeline.remove_intermediate=true

Configuration

The pipeline is configured via Hydra YAML files under config/.

config/ ├── preprocess.yaml # Main config (entry point) ├── data/ │ └── raw_codex.yaml # Dataset configuration ├── experiment/ │ └── preprocess_ccc.yaml # Experiment presets ├── pipeline/ │ ├── default.yaml # Pipeline defaults │ ├── deconvolution/ # Deconvolution settings │ ├── edof/ # EDoF algorithm selection │ ├── illumination_correction/ # BaSiC parameters │ ├── stitching/ # Ashlar / M2Stitch settings │ ├── background_correction/ # Background subtraction │ ├── tma_dearray/ # Core detection parameters │ └── data_export/ # SpatialData export options └── hydra/ └── default.yaml # Hydra runtime settings

Data Configuration

Point the pipeline to your raw CODEX data directory. The expected layout follows the standard CODEX file convention:

raw_data/ ├── experimentV4.json # Experiment metadata ├── cyc001_reg001/ │ ├── 1_00001_Z001_CH1.tif │ ├── 1_00001_Z001_CH2.tif │ └── ... ├── cyc002_reg001/ │ └── ... └── ...

Set the root directory in your config or on the command line:

# config/data/raw_codex.yaml _target_: codex_preprocessing.data.CodexDataset root_dir: ??? # Required — path to raw data mode: raw # "raw" or "proc" (CODEX Processor format) lazy_loading: false # Enable dask-based lazy loading for large datasets read_markers: false # Load marker names from metadata

Pipeline Configuration

Toggle steps and select algorithms in config/pipeline/default.yaml:

defaults: - deconvolution: default - edof: focus_whiten # or: focus_wavelet - illumination_correction: default - stitching: default - background_correction: default - tma_dearray: null # Enable with: tma_dearray: default - data_export: default remove_intermediate: false # Delete intermediate outputs to save disk space

Project Structure

├── main.py # Entry point ├── config/ # Hydra configuration files ├── src/codex_preprocessing/ │ ├── pipeline.py # Pipeline orchestration │ ├── nodes.py # Abstract node / parallel execution logic │ ├── data/ │ │ ├── dataset.py # CodexDataset class │ │ └── metadata.py # Experiment metadata parser │ ├── io/ │ │ ├── reader.py # Raw & processed data readers │ │ └── writer.py # TIFF writers │ ├── modules/ # Processing algorithms │ │ ├── deconvolution.py # Richardson-Lucy deconvolution │ │ ├── edof.py # Extended depth of field │ │ ├── illumination.py # BaSiC illumination correction │ │ ├── stitching.py # Ashlar / M2Stitch stitching │ │ ├── background_correction.py │ │ ├── tma_dearray.py # Coreograph TMA dearraying │ │ └── spatialdata_exporter.py # SpatialData Zarr export │ ├── models/coreograph/ # U-Net model for TMA segmentation │ └── utils/ # Image & general utilities ├── weights/coreograph/ # Pre-trained U-Net weights ├── notebooks/ # Jupyter notebooks for testing individual steps ├── env_cuda12.yml # Conda environment (CUDA 12) └── pyproject.toml # Package metadata

Notebooks

Interactive Jupyter notebooks are provided in notebooks/ for testing and debugging individual pipeline steps:

Notebook	Purpose
`test_deconvolution.ipynb`	Richardson-Lucy deconvolution
`test_edof.ipynb`	Extended depth of field
`test_illumination.ipynb`	BaSiC illumination correction
`test_stitching.ipynb`	Tile stitching
`test_bg_sub.ipynb`	Background subtraction
`test_dearray.ipynb`	TMA dearraying
`test_ometif.ipynb`	OME-TIFF export

Contributing

Contributions are welcome. To get started:

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Make your changes and ensure existing functionality is preserved
Submit a pull request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODEX Preprocessing

Key Features

Requirements

Installation

1. Create the conda environment

2. Install the package

3. Install additional GPU dependencies

Usage

Quick Start

Using Experiment Configs

Skipping Pipeline Steps

Overriding Parameters

Configuration

Data Configuration

Pipeline Configuration

Project Structure

Notebooks

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
notebooks		notebooks
src/codex_preprocessing		src/codex_preprocessing
weights/coreograph		weights/coreograph
.gitignore		.gitignore
README.md		README.md
env_cuda12.yml		env_cuda12.yml
main.py		main.py
pyproject.toml		pyproject.toml
run.job		run.job

Folders and files

Latest commit

History

Repository files navigation

CODEX Preprocessing

Key Features

Requirements

Installation

1. Create the conda environment

2. Install the package

3. Install additional GPU dependencies

Usage

Quick Start

Using Experiment Configs

Skipping Pipeline Steps

Overriding Parameters

Configuration

Data Configuration

Pipeline Configuration

Project Structure

Notebooks

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages