An Integrated Deep Learning and Multi-Track Genome Visualization Framework for Pangenomic Data Analysis
PanGen-AI Suite is an interactive computational genomics framework that integrates classical bioinformatics algorithms with modern deep learning approaches to explore genomic variation, pangenome structures, and CRISPR genome engineering strategies.
The platform provides a unified interface for:
- Graph-based pangenome exploration
- Deep learningβbased variant impact prediction
- Genome compression and FM-index search algorithms
- CRISPR guide RNA design
- Interactive genome track visualization
PanGen-AI aims to serve both as a research prototyping platform and an educational computational genomics toolkit.
This repository accompanies the preprint:
Nama, Y. (2026).
An Integrated Deep Learning and Multi-Track Genome Visualization Framework for Pangenomic Data Analysis
Understanding genomic variation requires integration of multiple computational approaches:
- Comparative genomics to analyze variation across genomes
- Machine learning to predict functional variant impact
- Genome indexing algorithms for efficient sequence search
- Genome editing design tools for experimental validation
PanGen-AI provides a modular environment where these analyses can be performed within a single computational framework.
The platform addresses questions such as:
- How can graph-based models represent variation across genomes?
- Which genomic positions are predicted to have high functional impact?
- How can computational predictions guide CRISPR editing strategies?
- How can compressed genome indexing enable rapid sequence search?
PanGen-AI follows a modular architecture integrating multiple computational genomics components.
DNA Input β Pangenome Graph Construction β Deep Learning Variant Prediction β Genome Compression & FM-Index Search β CRISPR Guide RNA Design β Multi-Track Genome Visualization Each module can operate independently or as part of an integrated analysis workflow.
Graph-based representation of genomic variation. Features:
- k-mer based pangenome graph construction
- visualization of sequence relationships
- conservation analysis across sequences
- FASTA dataset input support
Applications:
- comparative genomics
- microbial genome analysis
- structural variation visualization
Deep learning model for functional variant prediction.
Capabilities:
- CNN-based DNA sequence analysis
- mutation impact heatmap generation
- gradient-based saliency visualization
- batch variant prediction
Applications:
- regulatory variant discovery
- functional genomics analysis
- mutation hotspot detection
Genome compression and sequence search module.
Implements:
- Burrows-Wheeler Transform (BWT)
- FM-index construction
- backward search algorithm
- compressed genome pattern matching
Applications:
- genome indexing
- sequence alignment preprocessing
- bioinformatics algorithm education
Identification of candidate CRISPR-Cas9 guide RNAs.
Features:
- PAM-aware NGG scanning
- GC content filtering
- off-target similarity estimation
- candidate guide ranking
Applications:
- genome editing experiments
- functional genomics perturbation studies
The Streamlit interface provides real-time interaction with genomic datasets.
Capabilities include:
- dynamic parameter tuning
- real-time visualization of genomic analysis
- mutation heatmaps and genome tracks
- exportable CSV and figure outputs
The interface enables rapid exploration of genomic hypotheses without requiring extensive programming.
Live Web App:
https://pangen-ai-yash.streamlit.app/
PanGen-AI includes curated example datasets for demonstration:
| Dataset | Purpose |
|---|---|
| Bacterial genome fragments | Pangenome graph construction |
| BRCA1 regulatory sequence | Variant impact prediction |
| SARS-CoV-2 genome fragment | FM-index search demonstration |
| Human gene exon region | CRISPR guide design |
PanGen-AI-Suite/ β βββ app.py βββ requirements.txt βββ README.md βββ LICENSE βββ CITATION.cff βββ dashboard.png βββ assets/ 1οΈβ£ Clone the repository
git clone https://github.com/YASH4-HD/PanGen-AI-Suite.git cd PanGen-AI-Suite 2οΈβ£ Install dependencies
pip install -r requirements.txt 3οΈβ£ Launch the dashboard
streamlit run app.py All analyses are reproducible using:
- deterministic model initialization
- defined dataset inputs
- explicit algorithm implementations
- open-source Python libraries
The platform does not require proprietary datasets.
If you use this suite in your research, please cite it as:
Nama, Y. (2026). PanGen-AI Suite: An Integrated Platform for Pangenome Analysis, AI Variant Prediction, and Genome Engineering. (Version 1.0.0) Zenodo. https://doi.org/10.5281/zenodo.18988911. GitHub. https://github.com/YASH4-HD/PanGen-AI
Yashwant Nama
Independent Researcher | Systems Immunology & Computational Modeling
Focus: Systems Immunology, Mechanobiology, Computational Modeling and Reproducible Bioinformatics.
π Connect & Verify:
- ORCID: 0009-0003-3443-4413
- LinkedIn: Yashwant Nama
- Project Website: Streamlit Dashboard
π‘ PanGen-AI combines classical bioinformatics algorithms with modern AI approaches to create an integrated genomic analysis environment.
