Skip to content
View Sally332's full-sized avatar

Block or report Sally332

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sally332/README.md

👩‍🔬 Computational Biologist | Cancer Genomics & Interpretable AI

Building open, reproducible frameworks for multi-omic data integration, spatial transcriptomics, and computational pathology.


🔬 Research Vision

My work focuses on developing interpretable and reproducible AI frameworks for cancer genomics, uniting biological prior knowledge with multi-omic and spatial data. The central goal is to replace black-box prediction with mechanistic understanding—models that not only perform well but explain how genomic alterations, perturbations, and drug responses reshape cellular states. Each framework in this series emphasizes pathway- and network-level interpretability, cross-dataset generalization, and transparent benchmarking, establishing reproducible standards for computational oncology. Through this approach, I aim to bridge machine learning, systems biology, and translational research, advancing models that predict, explain, and validate biological mechanisms.


📂 Key Projects

The following key projects are part of the MM-KPNN framework family, a unified effort to develop concept-bottleneck AI models that embed biological knowledge directly into network architecture—ensuring interpretability, reproducibility, and mechanistic insight across multi-omic and spatial data.


A modular and interpretable graph framework for spatial transcriptomics in the tumor microenvironment.

  • Combines Graph Attention Networks (GAT) with knowledge-primed decoding
  • Explains immune exclusion, stromal remodeling, and therapy-induced rewiring
  • Outputs attention maps, pathway overlays, and ligand–receptor driver rankings

Interpretable multimodal neural network integrating scRNA-seq + scATAC-seq using biological priors.

  • Decoder constrained by pathway and TF nodes
  • Provides mechanistic attributions at the pathway and regulator levels
  • A reproducible framework for multimodal interpretability and benchmarking

Pathway-bottleneck graph neural network for drug-sensitivity prediction across pharmacogenomic panels.

  • Integrates multi-omic features, drug descriptors, and prior knowledge graphs
  • Focuses on cross-panel generalization (e.g., CCLE → GDSC)
  • Provides pathway-level interpretability and reproducible benchmarking

Extends MM-KPNN to model drug and CRISPR perturbation responses at single-cell resolution.

  • Implements pathway and TF bottlenecks for interpretability
  • Measures attribution stability and supports counterfactual pathway editing
  • Designed for robust, cross-dataset perturbation benchmarking

Additional Repositories

A modular framework for computational analysis of organoid systems.

  • Addresses reproducibility, heterogeneity, fidelity, integration, and prediction
  • Integrates RNA and protein modalities with interpretable ML
  • Demonstrates end-to-end reproducibility through documented, result-embedded notebooks

Spatial mapping of tumor and metastatic architecture using 10x Visium transcriptomics.

  • Integrates curated gene programs to define epithelial, immune, stromal, and proliferative regions
  • Reveals spatial organization and regional heterogeneity across breast tumors and lymph node metastases
  • Fully documented, end-to-end notebook with embedded results and biological interpretation

End-to-end pipeline for structural variant discovery and annotation using PacBio long-read sequencing.

  • Implements clinical annotation (ACMG/AMP) and variant filtering
  • Includes functional scoring and visualization modules
  • Designed for scalable deployment in HPC environments

Modular framework for rare-variant burden analysis in genomic cohorts.

  • Supports SKAT, SKAT-O, and extended statistical methods
  • Implements functional weighting and population correction
  • Provides reproducible variant filtering and QC workflows

Systems biology workflow for reconstructing gene-regulatory networks.

  • Integrates TF–target priors and expression-based inference
  • Performs network topology and modularity analysis
  • Identifies functionally enriched regulatory modules

Gene co-expression analysis pipeline using WGCNA.

  • Identifies expression modules and hub genes
  • Evaluates biological function and module preservation
  • Applies to bulk and single-cell RNA-seq datasets

Workflow for secure, efficient genomic data transfer using Globus.

  • Integrates HPC environments and folder structuring
  • Enables checksum validation and metadata tracking
  • Ensures reproducible data sharing for collaborative projects

Contact

Sally Yepes
📧 sallyepes233@gmail.com
🔗 GitHub: Sally332
🔗 Portfolio: sally332.github.io

Pinned Loading

  1. Sally332 Sally332 Public