A codeml (PAML package) wrapper to make life easier. Dummy input unaligned multi-species fasta file (a single gene), and output codeml result.
-
Codeml (PAML version 4.10.6)
-
MACSE (.jar form)
-
MUSCLE
-
RAXML
-
biopython (v1.81, python package)
-
newick (v1.9.0, python package)
must be installed beforehand
Simply add ./script to your environment
- A single gene fasta sequence file (multi-species, not aligned).
- A text file which indicate the foreground species. One species each line.
cd to example/test_space/
Change the absolute path in the command lines below to to your path.
type:
Fasta2Codeml.py \ --out_dir /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/test_space_single_gene \ --project_name Simple_test \ --foreground_file /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/foreground.txt \ --fasta /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/single_gene/CLOCK.fasta \ --muscle /beegfs/store4/chenyangkang/software/ParaAT2.0/muscle \ --macse /beegfs/store4/chenyangkang/software/macse_v2.07.jar \ --raxml /beegfs/store4/chenyangkang/software/standard-RAxML/raxml \ --codeml /beegfs/store4/chenyangkang/miniconda3/bin/codeml \ --boostrap 10 \ --codon_frac 0.5 \ --sp_frac 0.5 Fasta2Codeml.py \ --out_dir /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/test_space_multi_cds \ --project_name Simple_multi_test \ --foreground_file /beegfs/store4/chenyangkang/DEV/Fasta2Codeml/example/foreground.txt \ --multi_file \ --multi_file_list cds_list.txt \ --muscle /beegfs/store4/chenyangkang/software/ParaAT2.0/muscle \ --macse /beegfs/store4/chenyangkang/software/macse_v2.07.jar \ --raxml /beegfs/store4/chenyangkang/software/standard-RAxML/raxml \ --codeml /beegfs/store4/chenyangkang/miniconda3/bin/codeml \ --boostrap 10 \ --codon_frac 0.5 \ --sp_frac 0.5 - Remove species that contain only "N"s.
- Run muscle alignment with 5 iterations.
- Refine alignment using MACSE.
- Replace frameshift(!) and stop codon with NNN using MACSE.
- Concatenate files (if in multi-file mode).
- Remove codon columns with more than 50% species missed, and remove species with more than 50% codons as "NNN" or "---".
- Build tree with raxml
-f a -x 42 -p 42 -m GTRGAMMA. - Co-filter fasta file and tree file. Trim and annotate tree with the foreground information provided. Output alignment as phylip format.
- Generate codeml configuration files for both branch-site null model (omega=1) and alternative model.
- Run both codeml model.
- Generate p values and other statistics using scipy.