Skip to content

HeartWise-AI/DeepCORO_CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

659 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepCORO_CLIP

DeepCORO_CLIP is a deep learning model for echocardiography video interpretation using contrastive learning. It leverages a Multiscale Vision Transformer (mVIT) for video encoding and BioMedBERT for text encoding, trained on millions of video-report pairs.

πŸš€ Features

  • Contrastive Learning: Train on video-report pairs using CLIP-style contrastive learning
    • Single video mode: Process one video per study
    • Multi-video mode: Process multiple videos per study with aggregation
  • Linear Probing: Fine-tune the model for specific tasks using linear probing
  • Multi-GPU Training: Support for distributed training across multiple GPUs
  • Hyperparameter Optimization: Built-in support for Weights & Biases sweeps
  • Automatic Mixed Precision: Optimized training with AMP
  • Distributed Data Parallel: Efficient multi-GPU training
  • Patch- vs. Video-level Reasoning: Expose all patch tokens, a single token per video, or a single token per study with two simple flags (aggregate and per_video_pool) in the VideoEncoder.

πŸ› οΈ Environment Setup

Prerequisites

  • CUDA-capable GPU
  • Python 3.11+

Steps

  1. πŸ“₯ Clone the Repository:

    https://github.com/HeartWise-AI/DeepCORO_CLIP.git cd DeepCORO_CLIP
  2. Set up Virtual Environment:

    pip install uv uv sync
  3. Activate Virtual Environment:

    source .venv/bin/activate
  4. Install yq required to run scripts/run_sweep.sh:

    wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && \ chmod +x /usr/bin/yq
  5. Log into Weights & Biases required for sweep:

    wandb login
  6. Make sure you have FFMPEG 4.4.x is installed - required for sweep:

    which ffmpeg conda remove ffmpeg # remove if /opt/conda/bin/ffmpeg exists sudo apt update sudo apt install ffmpeg sudo apt install libavcodec-extra ffmpeg -version

πŸ“„ Configuration Files

The project uses configuration files located in the config/ directory:

Base Configurations

  1. CLIP Training (config/clip/base_config.yaml):

    • Training parameters (epochs, batch size, learning rate)
    • Model architecture settings
    • Data loading parameters
    • Optimization settings
    • Video mode settings (single/multi)
    • Video aggregation parameters
  2. Linear Probing (config/linear_probing/base_config.yaml):

    • Task-specific parameters
    • Head structure configuration
    • Loss function settings
    • Backbone freezing options

Sweep Configurations

  1. CLIP Training (config/clip/sweep_config_*.yaml, config/clip/sweep_siglip_output_dataset_*.yaml):

    • Hyperparameter search space for CLIP training
    • Supports both single and multi-video training
    • sweep_siglip_output_dataset_config.yaml surfaces conservative learning rate, grad-clipping, and SigLIP weighting knobs for unstable SigLIP runs
  2. Linear Probing (config/linear_probing/sweep_config.yaml):

    • Hyperparameter optimization for linear probing tasks
    • Task-specific parameter ranges

πŸ’» Run Modes

1. Contrastive Learning (CLIP)

Train the model on video-report pairs using contrastive learning

Process multiple videos per study with aggregation:

# Single GPU training without logging results to wandb (see scripts/runner.sh) bash scripts/runner.sh --base_config config/clip/base_config.yaml --selected_gpus 0 --use_wandb false --run_mode train # Multi-GPU training with results logging on wandb (see scripts/runner.sh) bash scripts/runner.sh --base_config config/clip/base_config.yaml --selected_gpus 0,1 --use_wandb true --run_mode train # Multi-GPU hyperparameters fine-tuning - RunMode and UseWandb are forced to train and true respectively (see scripts/run_sweep.sh) bash scripts/run_sweep.sh --base_config config/clip/base_config.yaml --sweep_config config/clip/sweep_config_single_video.yaml --selected_gpus 3 --count 5 # SigLIP output dataset stability sweep (tunes grad clipping, AMP, and SigLIP weighting) bash scripts/run_sweep.sh --base_config config/clip/siglip_output_dataset_config.yaml --sweep_config config/clip/sweep_siglip_output_dataset_config.yaml --selected_gpus 0,1 --count 10

Run validation

Not supported

Run inference

Process validation data from input CSV (rows where Split == 'inference') - working on single GPU only

bash scripts/runner.sh --selected_gpus 0 --base_config config/clip/base_config.yaml --run_mode inference --use_wandb false

Run test

Not supported

2. Linear Probing

Fine-tune the model for specific tasks using linear probing - couple of combination examples:

# Single GPU training without logging results to wandb (see script/runner.sh) bash scripts/runner.sh --base_config config/linear_probing/base_config.yaml --selected_gpus 0 --use_wandb false --run_mode train # Multi-GPU training with results logging on wandb (see script/runner.sh) bash scripts/runner.sh --base_config config/linear_probing/base_config.yaml --selected_gpus 0,1 --use_wandb true --run_mode train # Multi-GPU hyperparameters fine-tuning - RunMode and UseWandb are forced to train and true respectively (see scripts/run_sweep.sh) bash scripts/run_sweep.sh --base_config config/linear_probing/base_config.yaml --sweep_config config/linear_probing/sweep_config.yaml --selected_gpus 0,1 --count 5 

Run validation

Process validation data from input CSV (rows where Split == 'val') and compute CI for each head

bash scripts/runner.sh --use_wandb false --base_config config/linear_probing/stenosis/base_config_stenosis_2vue.yaml --run_mode val --selected_gpus 1,2,3

Run test

Process validation data from input CSV (rows where Split == 'test') and compute CI for each head

bash scripts/runner.sh --use_wandb false --base_config config/linear_probing/stenosis/base_config_stenosis_2vue.yaml --run_mode test --selected_gpus 1,2,3

Run inference

Process validation data from input CSV (rows where Split == 'inference')

bash scripts/runner.sh --use_wandb false --base_config config/linear_probing/stenosis/base_config_stenosis_2vue.yaml --run_mode inference --selected_gpus 1,2,3

🐳 Docker Setup

Optionally, you can build a Docker container to run training, validation, and inference pipelines. For the validation pipeline, please set up your huggingface API key in api_key.json as weights will be publicly available only upon publication.

Build Docker Image

docker build -t deepcoro_clip-docker .

Run Docker

Requirements:

  • Make sure your CSV file is in the data folder : $(pwd)/data can be replaced by the absolute path to that folder
  • Create a folder results : $(pwd)/results can be replaced by the absolute path to that folder
  • Make sure you have a column FileName with root path fined as /app/videos : $(pwd)/videos can be replaced by the absolute path to your base video folder path
docker run -it --gpus all --shm-size=32g --memory=64g --ipc=host --network=host -v $(pwd)/data:/app/data -v $(pwd)/results:/app/results -v $(pwd)/videos:/app/videos deepcoro_clip-docker

Inside the container: Once connected to the docker container:

  1. For validation and inference: Follow step 3. only from the Environment Setup section above
  2. For training: Follow step 3. 5. and 6. from the Environment Setup section above
  3. Download pretrained weights:
python utils/download_pretrained_weights.py

The pretrained weights will be in the folder /app/pretrained_models.

  1. Run your pipeline: Select the appropriate command from the Run Modes section above

Model Architecture

Video Encoder

  • Multiscale Vision Transformer (mVIT) backbone
  • Configurable number of heads and layers
  • Support for pretrained weights
  • Optional backbone freezing
  • New flags for fine-grained control over the output:
    • aggregate=True (default) – returns one study-level vector [B, D].
    • aggregate=False, per_video_pool=True – returns one token per video [B, N, D], ready for MIL / linear probing heads.
    • aggregate=False, per_video_pool=False – returns all patch tokens - ONLY Setting that preeservs all the tokens [B, NΒ·L, D] for the most detailed downstream reasoning.

Example (video-level MIL):

from models.video_encoder import VideoEncoder from models.multi_instance_linear_probing import MultiInstanceLinearProbing encoder = VideoEncoder( backbone="mvit", aggregate=False, # skip internal aggregator aggregate_videos_tokens=True, # one token per video ) probe = MultiInstanceLinearProbing( embedding_dim=encoder.embedding_dim, head_structure={"severity": 4}, pooling_mode="attention", ) video_batch = ... # [B, N, T, H, W, C] feats = encoder(video_batch) # [B, N, D] logits = probe(feats) # dict with head output

Text Encoder

  • BioMedBERT for medical text encoding
  • Configurable freezing ratio
  • Contrastive learning with video features

Linear Probing Heads

  • Task-specific classification heads
  • Configurable dropout and architecture
  • Support for multiple output classes per head

Development Setup

We use pre-commit hooks to ensure code quality and consistency:

# Install pre-commit uv pip install pre-commit pre-commit install # Run hooks manually pre-commit run --all-files

Performance Guidelines

Recommended Batch Sizes by GPU Memory

GPU Memory Recommended Batch Size Command
8GB 4-8 --batch-size 8
12GB 8-16 --batch-size 16
16GB 16-24 --batch-size 24
24GB+ 24-32 --batch-size 32

Training Tips

  1. Batch Size Selection:

    • Start with smaller batch sizes and increase if memory allows
    • Larger batch sizes generally allow faster training
    • Reduce if you get OOM errors
  2. Number of Workers:

    • Rule of thumb: num_workers = 4 * num_gpus
    • Reduce if you get memory or file handle errors
    • Example: --num-workers 2 for slower storage systems
  3. Learning Rate:

    • Default (1e-4) works well for most cases
    • For larger batch sizes: lr = 1e-4 * (batch_size/32)
    • Example: --lr 2e-4 for batch size 64
  4. Number of Epochs:

    • Default (50) is good for most cases
    • Increase for better performance: --epochs 100
    • Decrease for quick experiments: --epochs 10

Common Issues

  1. Out of Memory (OOM):

    • Reduce batch size
    • Use gradient accumulation
    • Force single GPU mode
  2. GPU Selection:

    • Use CUDA_VISIBLE_DEVICES to select specific GPUs
    • Monitor GPU usage with nvidia-smi
  3. Training Speed:

    • Multi-GPU isn't always faster due to overhead
    • Start with single GPU and scale up if needed

Monitoring Training

  1. GPU Memory Usage:
nvidia-smi -l 1 # Monitor GPU usage every second
  1. Training Progress:
  • Progress bar shows current epoch and batch
  • Loss values are printed every 10 batches
  • Checkpoints are saved every 5 epochs
  1. WandB Logging:
  • Training metrics are logged to Weights & Biases
  • Includes loss, learning rate, batch size
  • Access via WandB dashboard

Project Structure

heartwise-ai-deepcoro_clip/ β”œβ”€β”€ config/ # Configuration files β”‚ β”œβ”€β”€ clip/ # CLIP training configs β”‚ └── linear_probing/ # Linear probing configs β”œβ”€β”€ dataloaders/ # Data loading modules β”œβ”€β”€ dataset_creation/ # How MHI dataset was built β”œβ”€β”€ docs/ # Documentation on CLS-Token implementation β”œβ”€β”€ models/ # Neural network models β”œβ”€β”€ projects/ # Project implementations β”œβ”€β”€ runners/ # Training runners β”œβ”€β”€ scripts/ # Training scripts β”œβ”€β”€ utils/ # Utility functions └── tests/ # Unit test pipeline 

🀝 Contributing

Contributions to DeepCoro_CLIP repository are welcome! Please follow these steps to contribute:

  1. Fork the repository
  2. Create a new branch for your feature or bug fix
  3. Make your changes and commit them with clear, descriptive messages
  4. Push your changes to your fork
  5. Submit a pull request to the main repository

πŸ“š Citation

If you find this repository useful, please cite our work:

@article{, title={}, author={}, journal={}, year={}, publisher={} } 

About

Text & video embeddings for report generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors