EarlyPDAC-MML: Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Record
Early detection of pancreatic cancer using multimodal learning on EHR data. Combines NCDEs for lab panels, BioGPT-encoded diagnosis trajectories, and cross-attention for risk prediction up to 12 months before clinical diagnosis.
Clone the repository:
git clone https://github.com/MosbahAouad/EarlyPDAC-MML.git cd EarlyPDAC-MMLInstall dependencies (recommended: use conda):
conda env create -f environment.yml conda activate pancreatic # or pip install -r requirements.txtNote: You must provide your own data. Example data format:
- Lab panels: numpy arrays or tensors, shape
[num_samples, num_timesteps, num_features] - Diagnosis codes: padded integer sequences, shape
[num_samples, seq_length] - Labels: binary or multiclass, shape
[num_samples]
See utils/data_utils.py and comments in scripts for details.
Train and evaluate the model:
python scripts/main_cross_validation.py --weights_dir ./weights --results_dir ./results --results_file results.csv --model_type combinedSee args/arg_parser.py for all command-line options.
If you use this code, please cite:
@inproceedings{aouad2025early, title={Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Record}, author={Aouad, Mosbah and Choudhary, Anirudh and Farooq, Awais and Nevers, Steven and Demirkhanyan, Lusine and Harris, Bhrandon and Pappu, Suguna and Gondi, Christopher and Iyer, Ravishankar}, booktitle={Proceedings of Machine Learning for Healthcare}, volume={298}, pages={1--22}, year={2025} } This project is licensed under the MIT License.