Gotranseq

Translate nucleic acid sequences to their corresponding peptide sequences. Like EMBOSS transeq, but written in go

Purpose

EMBOSS transeq is a great tool, but can be quite painfull for some use cases, because it silently truncate the sequence ID if it contains chars like ':', or rename the sequence ID if it contains chars like '|'

This tool is an attempt to solve this problem. It's also way faster than EMBOSS transeq because it can be parrallelized:

benchmark on ubuntu 16.04, machine with 2 CPU Intel(R) Core(TM)2 Duo CPU 3.00GHz with a 189MB fasta file:

#EMBOSS transeq time transeq -sequence file.fna -outseq out.faa -frame 6 41,82s user 0,76s system 85% cpu 49,696 total #gotranseq time ./gotranseq --sequence file.fna --outseq out.faa --frame 6 -n 2 7,75s user 0,98s system 159% cpu 5,472 total

Works on Linux, Mac and windows

Installation

Download the binary from the release page

or

Build from source:

First, make sure that go is installed on your machine (see install go for details ). Then clone the repo and build it:

git clone https://github.com/feliixx/gotranseq.git cd gotranseq go build

Usage

use gotranseq --help to print the help:

gotranseq version 0.3.0 Usage: gotranseq --sequence file.fna --outseq out.faa required: -s, --sequence=<filename> Nucleotide sequence(s) filename -o, --outseq=<filename> Protein sequence filename optional: -f, --frame=<code> Frame to translate. Possible values: [1, 2, 3, F, -1, -2, -3, R, 6] F: forward three frames R: reverse three frames 6: all 6 frames (default: 1) -t, --table=<code> NCBI code to use, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1 for details. Available codes: 0: Standard code 2: The Vertebrate Mitochondrial Code 3: The Yeast Mitochondrial Code 4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code 5: The Invertebrate Mitochondrial Code 6: The Ciliate, Dasycladacean and Hexamita Nuclear Code 9: The Echinoderm and Flatworm Mitochondrial Code 10: The Euplotid Nuclear Code 11: The Bacterial, Archaeal and Plant Plastid Code 12: The Alternative Yeast Nuclear Code 13: The Ascidian Mitochondrial Code 14: The Alternative Flatworm Mitochondrial Code 16: Chlorophycean Mitochondrial Code 21: Trematode Mitochondrial Code 22: Scenedesmus obliquus Mitochondrial Code 23: Thraustochytrium Mitochondrial Code 24: Pterobranchia Mitochondrial Code 25: Candidate Division SR1 and Gracilibacteria Code 26: Pachysolen tannophilus Nuclear Code 29: Mesodinium Nuclear 30: Peritrich Nuclear (default: 0) -c, --clean Replace stop codon '*' by 'X' -a, --alternative Define frame '-1' as using the set of codons starting with the last codon of the sequence -T, --trim Removes all 'X' and '*' characters from the right end of the translation. The trimming process starts at the end and continues until the next character is not a 'X' or a '*' -n, --numcpu=<n> Number of worker to use (default: number of CPU) general: -h, --help Show this help message -v, --version Print the tool version and exit

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
ncbicode		ncbicode
transeq		transeq
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gotranseq

Purpose

Installation

Usage

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gotranseq

Purpose

Installation

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages