AlleleFlux

AlleleFlux is a bioinformatics toolkit for analyzing allele frequency changes in metagenomic time-series data. It identifies genomic targets of natural selection in microbial communities by calculating:

Parallelism scores — detect parallel allele frequency changes across replicates within groups
Divergence scores — quantify allele frequency divergence between experimental groups
dN/dS ratios — measure selection pressure on genes via non-synonymous to synonymous substitution rates

These scores enable direct comparisons of evolutionary dynamics across taxa, genomes, and genes, helping identify loci under strong selection.

Installation

From Bioconda (Recommended)

# Install with conda (or mamba) conda install -c conda-forge -c bioconda alleleflux # Activate the environment conda activate alleleflux

From Source

# Clone the repository git clone https://github.com/MoellerLabPU/AlleleFlux.git cd AlleleFlux # Create environment with dependencies conda env create -f environment.yml conda activate alleleflux # Or install directly with pip pip install -e .

Input Files

File	Description
Reference FASTA	Combined MAG contigs (header format: `<MAG_ID>.fa_<contig_ID>`)
Prodigal genes	Nucleotide ORF predictions (`.fna`) matching reference contig IDs
MAG mapping	TSV with columns: `mag_id`, `contig_id`
Metadata TSV	Sample info with columns: `sample_id`, `bam_path`, `group`, `time`
GTDB taxonomy	`gtdbtk.bac120.summary.tsv` from GTDB-Tk

See Input Preparation Guide for detailed format specifications.

Quick Start

1. Initialize Configuration

# Interactive configuration wizard alleleflux init # Or copy the template manually cp $(python -c "import alleleflux; print(alleleflux.__path__[0])")/smk_workflow/config.template.yml config.yml

2. Edit Configuration

Edit config.yml with your paths and parameters. Here is the complete configuration with all options:

run_name: "alleleflux_analysis" # Input Files input: fasta_path: "" # Reference FASTA file (required) prodigal_path: "" # Prodigal nucleic acid output (.fna) metadata_path: "" # Sample metadata file gtdb_path: "" # GTDB taxonomy file mag_mapping_path: "" # MAG-to-contig mapping file # Output Directory output: root_dir: "./alleleflux_output" log_level: "INFO" # DEBUG, INFO, WARNING, ERROR # Analysis Configuration analysis: data_type: "longitudinal" # "single" or "longitudinal" allele_analysis_only: False # Skip scoring/outlier detection use_lmm: True # Linear Mixed Models use_significance_tests: True # Two-sample and single-sample tests use_cmh: True # CMH test timepoints_combinations: - timepoint: ["pre", "post"] focus: "post" groups_combinations: - ["treatment", "control"] # Quality Control quality_control: min_sample_num: 4 breadth_threshold: 0.1 coverage_threshold: 1 disable_zero_diff_filtering: False # Profiling profiling: ignore_orphans: False min_base_quality: 30 min_mapping_quality: 2 ignore_overlaps: True # Statistics statistics: filter_type: "t-test" # "t-test", "wilcoxon", or "both" preprocess_between_groups: True preprocess_within_groups: True max_zero_count: 4 p_value_threshold: 0.05 fdr_group_by_mag_id: False min_positions_after_preprocess: 1 # dN/dS Analysis dnds: p_value_column: "q_value" dn_ds_test_type: "two_sample_paired_tTest" # Compute Resources resources: threads_per_job: 16 mem_per_job: "8G" time: "24:00:00"

Configuration Parameters

Section	Parameter	Description
input	`fasta_path`	Reference FASTA with combined MAG contigs
	`prodigal_path`	Prodigal nucleotide predictions (`.fna`)
	`metadata_path`	Sample metadata TSV
	`gtdb_path`	GTDB-Tk taxonomy file
	`mag_mapping_path`	MAG-to-contig mapping TSV
analysis	`data_type`	`"longitudinal"` (multiple timepoints) or `"single"`
	`allele_analysis_only`	Skip significance tests, scoring, and outlier detection if `True`
	`use_lmm`	Enable Linear Mixed Models
	`use_significance_tests`	Enable two-sample/single-sample tests
	`use_cmh`	Enable Cochran-Mantel-Haenszel tests
	`timepoints_combinations`	Timepoint pairs with focus timepoint
	`groups_combinations`	Groups to compare
quality_control	`min_sample_num`	Minimum samples required per MAG
	`breadth_threshold`	Minimum coverage breadth (0-1)
	`coverage_threshold`	Minimum average coverage depth
profiling	`min_base_quality`	Minimum base quality score
	`min_mapping_quality`	Minimum mapping quality score
statistics	`filter_type`	Preprocessing filter type
	`p_value_threshold`	Significance threshold
	`fdr_group_by_mag_id`	Apply FDR correction per MAG
dnds	`p_value_column`	`"min_p_value"` or `"q_value"`
	`dn_ds_test_type`	Test type for filtering dN/dS results
resources	`threads_per_job`	Threads allocated per job
	`mem_per_job`	Memory per job (e.g., `"8G"`, `"100G"`)
	`time`	Wall time limit (HH:MM:SS)

See Configuration Reference for complete documentation.

3. Run the Pipeline

# Run locally alleleflux run --config config.yml --threads 16 # Dry run to preview jobs alleleflux run --config config.yml --dry-run

Running on SLURM

For HPC clusters, copy the SLURM profile from the source repository:

# Copy SLURM profile (if installed from source) cp -r $(python -c "import alleleflux; print(alleleflux.__path__[0])")/smk_workflow/slurm_profile ./ # Run with SLURM alleleflux run --config config.yml --profile ./slurm_profile

The SLURM profile automatically submits jobs via sbatch with resources from your config.

How It Works

AlleleFlux is powered by a Snakemake workflow that orchestrates the complete analysis:

Input Files Profile & QC Statistical Analysis ━━━━━━━━━━━ ━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━ • Reference FASTA • Extract alleles • Two-sample tests • Prodigal genes • Quality control • LMM / CMH tests • Metadata TSV • Eligibility checks • dN/dS calculation • MAG mapping ↓ ┌─────────────────┐ │ Scoring & Viz │ ├─────────────────┤ │ • Parallelism │ │ • Divergence │ │ • Outliers │ │ • Trajectories │ └─────────────────┘

Pipeline Steps:

Profiling — Extract allele frequencies from BAM files for each MAG
Quality Control — Filter samples by coverage breadth; determine MAG eligibility
Statistical Testing — Apply appropriate tests based on experimental design
Scoring — Calculate parallelism/divergence scores and identify outlier genes
dN/dS Analysis — Calculate evolutionary rates for genes under selection

The workflow:

Automatically parallelizes across samples and MAGs
Handles checkpointing and restarts gracefully
Supports local execution and HPC clusters (SLURM)
Tracks provenance and ensures reproducibility

Output

Results are organized in the output directory:

alleleflux_output/ ├── profiles/ # Per-sample allele frequency profiles ├── metadata/ # Per-MAG metadata tables ├── eligibility/ # MAG eligibility tables ├── allele_analysis/ # Allele frequency analysis results ├── significance_tests/ # Statistical test results (LMM, CMH, t-tests) ├── scores/ # Parallelism and divergence scores ├── outliers/ # Genes with high scores (selection targets) └── dnds/ # dN/dS analysis results

See Output Reference for file format details.

CLI Tools

AlleleFlux provides 30+ standalone command-line tools:

# List all available tools alleleflux tools # Main commands alleleflux run --help # Run the full pipeline alleleflux init --help # Interactive configuration alleleflux info # Show installation info # Individual analysis tools alleleflux-profile --help # Profile MAGs from BAM files alleleflux-qc --help # Quality control alleleflux-scores --help # Calculate parallelism/divergence scores alleleflux-dnds-from-timepoints --help # Calculate dN/dS ratios

See CLI Reference for the complete list.

Documentation

Full documentation: alleleflux.readthedocs.io

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

AlleleFlux is licensed under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github		.github
alleleflux		alleleflux
conda-recipe		conda-recipe
docs		docs
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlleleFlux

Installation

From Bioconda (Recommended)

From Source

Input Files

Quick Start

1. Initialize Configuration

2. Edit Configuration

Configuration Parameters

3. Run the Pipeline

Running on SLURM

How It Works

Output

CLI Tools

Documentation

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AlleleFlux

Installation

From Bioconda (Recommended)

From Source

Input Files

Quick Start

1. Initialize Configuration

2. Edit Configuration

Configuration Parameters

3. Run the Pipeline

Running on SLURM

How It Works

Output

CLI Tools

Documentation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages