tools

miscellaneous tools for bioinformatics

count_bam.py

given a list of genomic regions (-i) and a list of BAM files (-b), output the count of reads in each BAM file within these genomic regions.

usage: count_bam.py [-h] [-i INTERVAL] [-b BAMS [BAMS ...]] [-o OUTPUT] [-l LEN] [-n NAME [NAME ...]] count of reads in a list of invervals optional arguments: -h, --help show this help message and exit -i INTERVAL, --interval INTERVAL the bed file contains location information of intervals -b BAMS [BAMS ...], --bams BAMS [BAMS ...] the list of bam files containing mapped reads for MNase-seq -o OUTPUT, --output OUTPUT name prefix of output files: *_count.txt -l LEN, --length LEN choose the center region of this length for each interval to count -n NAME [NAME ...], --name NAME [NAME ...] name of each bam sample (to be wrote on the header) Library dependency: pysam, bedtools,numpy,math

conunt_hits.py

given a list of genomic regions (-i) and a list of BED files with aligned reads (-f & -s), output the count of reads in each BED file within these genomic regions.

Usage: count_hits.py [-h] [-i interval_file] [-f data_folder] [-p ovlp_pct] [-s suffix] mark1 mark2 ... Example:counts.py -i Pn_E14_mm9_nucleosome_peak.bed -f ~/ChIPseq_map_data/ -s _d0_extend_sort.bed mouse_H3K4me3 mouse_H3K4me2 mouse_H3K4me1 > count_epi_nucleosome.txt Arguments: -h, --help Show this help. -i, --interval file contains intervals to be counted -f, --folder folder for bed data -p, --ovlp_pct minimum overlap_percentage to be included as a count -s, --suffix uniform suffix for bed data files within the folder Library dependency: bedtools, getopt

sortbedTOwig.py

convert sorted BED file into WIG file ,which can be potentially single base resolution depending on read depth.
BED file can be sorted using linux commend: sort -k1,1 -k2,2n <unsorted.bed> > <sorted.bed>

Version:1.0 Library dependency: csv Usage: sortbed2wig.py <options> -i input.bed -n name_of_output -e -l extended_read_length -s column_num_for_strand sortbed2wig.py <options> -i input.bed -n name_of_output -e sortbed2wig.py <options> -i input.bed -n name_of_output Example: sortbed2wig.py <options> -i mm9_H3K9me3.bed -n mm9_H3K9me3 -e -l 150 -s 4 Options: -h,--help show help information -i,--inputfile input bed file (with strand information for extend option) -o,--outputFolder folder for output wid file (default: /home/GenomeBrowser/lab_tracks/ -n,--wigname name of the output wig file -e,--extend extend read in bed file or not (default: false) -l,--readlength the extended length of each read (default: 150, effective only when extend=True) -s,--strandLoc the column # for strand information in bed (default: 4, effective only when extend=True)

pairend_fragmentLen.py

Draw distribution of fragment length from a pairend dataset (BAM file, -i)

pairend_fragmentLen.py: draw distribution of fragment lenghs from pair end NGS data and fit with Gaussian Kernel Density Estimation (KDE) Version:1.0 Library dependency: matplotlib, numpy, scipy, pysam Usage: python pairend_fragmentLen.py -i [NGS_pairend_mapped_bam] -x min_x,max_x -n 100000 -o [output_figure] python pairend_fragmentLen.py -i [NGS_pairend_mapped_bam] -o [output_figure] Example: python pairend_fragmentLen.py -i H209_pairend_5mark.sort.bam -x 0,500 -n 100000 -o H209_pairend_5mark_fragmentLen.png Options: -h,--help show help information -i,--inputbam input bam file(with correspnding bai file in same folder -x,--xlim range for x axis: min_x, the left bound (default 0); max_x, the right bound (default 350) -l,--lambda covariance_factor lambda for KDE (default 0.25) -n,--num number of fragments to be processed for plotting -o,--output the output figure file, can be format of emf, eps, pdf, png, ps, raw, rgba, svg, svgz

random_seq_generator.py

generate random sequences from genome specified (not exceeding the chromosome size boundary). One can adjust the mean and SD for size of random sequences. probability to choose each chrom based on the size distribution.

usage: random_seq_generator.py [-h] [-g GENOME] [-m MEAN] [-s SD] [-n NUM] generate random sequences with customized length and number (for random peaks et...) probability to choose each chrom based on the size distribution optional arguments: -h, --help show this help message and exit -g GENOME, --genome GENOME specify genome name to get chromosome info from UCSCGB, default: mm9 -m MEAN, --mean MEAN mean length of each random sequence,default:200 -s SD, --sd SD sd of random sequence lengths,default:20 -n NUM, --num NUM number of sequences to be randomly sampled,default:10000 library dependency: cruzdb (https://github.com/brentp/cruzdb),sqlalchemy

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
BLAST-related		BLAST-related
BaseSpace-tools		BaseSpace-tools
MNase-seq_Analysis		MNase-seq_Analysis
VennDiagram_BED		VennDiagram_BED
plot_flanking_region		plot_flanking_region
.gitignore		.gitignore
README.md		README.md
count_bam.py		count_bam.py
count_hits.py		count_hits.py
generateMAPfile.py		generateMAPfile.py
intensity_flanking_regions.py		intensity_flanking_regions.py
pairend_fragmentLen.py		pairend_fragmentLen.py
random_seq_generator.py		random_seq_generator.py
sortbedTOwig.py		sortbedTOwig.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tools

miscellaneous tools for bioinformatics

count_bam.py

conunt_hits.py

sortbedTOwig.py

pairend_fragmentLen.py

random_seq_generator.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tools

miscellaneous tools for bioinformatics

count_bam.py

conunt_hits.py

sortbedTOwig.py

pairend_fragmentLen.py

random_seq_generator.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages