Skip to content

yu68/tools

Repository files navigation

tools

miscellaneous tools for bioinformatics


count_bam.py

given a list of genomic regions (-i) and a list of BAM files (-b), output the count of reads in each BAM file within these genomic regions.

usage: count_bam.py [-h] [-i INTERVAL] [-b BAMS [BAMS ...]] [-o OUTPUT] [-l LEN] [-n NAME [NAME ...]] count of reads in a list of invervals optional arguments: -h, --help show this help message and exit -i INTERVAL, --interval INTERVAL the bed file contains location information of intervals -b BAMS [BAMS ...], --bams BAMS [BAMS ...] the list of bam files containing mapped reads for MNase-seq -o OUTPUT, --output OUTPUT name prefix of output files: *_count.txt -l LEN, --length LEN choose the center region of this length for each interval to count -n NAME [NAME ...], --name NAME [NAME ...] name of each bam sample (to be wrote on the header) Library dependency: pysam, bedtools,numpy,math 

conunt_hits.py

given a list of genomic regions (-i) and a list of BED files with aligned reads (-f & -s), output the count of reads in each BED file within these genomic regions.

Usage: count_hits.py [-h] [-i interval_file] [-f data_folder] [-p ovlp_pct] [-s suffix] mark1 mark2 ... Example:counts.py -i Pn_E14_mm9_nucleosome_peak.bed -f ~/ChIPseq_map_data/ -s _d0_extend_sort.bed mouse_H3K4me3 mouse_H3K4me2 mouse_H3K4me1 > count_epi_nucleosome.txt Arguments: -h, --help Show this help. -i, --interval file contains intervals to be counted -f, --folder folder for bed data -p, --ovlp_pct minimum overlap_percentage to be included as a count -s, --suffix uniform suffix for bed data files within the folder Library dependency: bedtools, getopt 

sortbedTOwig.py

convert sorted BED file into WIG file ,which can be potentially single base resolution depending on read depth.
BED file can be sorted using linux commend: sort -k1,1 -k2,2n <unsorted.bed> > <sorted.bed>

Version:1.0 Library dependency: csv Usage: sortbed2wig.py <options> -i input.bed -n name_of_output -e -l extended_read_length -s column_num_for_strand sortbed2wig.py <options> -i input.bed -n name_of_output -e sortbed2wig.py <options> -i input.bed -n name_of_output Example: sortbed2wig.py <options> -i mm9_H3K9me3.bed -n mm9_H3K9me3 -e -l 150 -s 4 Options: -h,--help show help information -i,--inputfile input bed file (with strand information for extend option) -o,--outputFolder folder for output wid file (default: /home/GenomeBrowser/lab_tracks/ -n,--wigname name of the output wig file -e,--extend extend read in bed file or not (default: false) -l,--readlength the extended length of each read (default: 150, effective only when extend=True) -s,--strandLoc the column # for strand information in bed (default: 4, effective only when extend=True) 

pairend_fragmentLen.py

Draw distribution of fragment length from a pairend dataset (BAM file, -i)

pairend_fragmentLen.py: draw distribution of fragment lenghs from pair end NGS data and fit with Gaussian Kernel Density Estimation (KDE) Version:1.0 Library dependency: matplotlib, numpy, scipy, pysam Usage: python pairend_fragmentLen.py -i [NGS_pairend_mapped_bam] -x min_x,max_x -n 100000 -o [output_figure] python pairend_fragmentLen.py -i [NGS_pairend_mapped_bam] -o [output_figure] Example: python pairend_fragmentLen.py -i H209_pairend_5mark.sort.bam -x 0,500 -n 100000 -o H209_pairend_5mark_fragmentLen.png Options: -h,--help show help information -i,--inputbam input bam file(with correspnding bai file in same folder -x,--xlim range for x axis: min_x, the left bound (default 0); max_x, the right bound (default 350) -l,--lambda covariance_factor lambda for KDE (default 0.25) -n,--num number of fragments to be processed for plotting -o,--output the output figure file, can be format of emf, eps, pdf, png, ps, raw, rgba, svg, svgz 

random_seq_generator.py

generate random sequences from genome specified (not exceeding the chromosome size boundary). One can adjust the mean and SD for size of random sequences. probability to choose each chrom based on the size distribution.

usage: random_seq_generator.py [-h] [-g GENOME] [-m MEAN] [-s SD] [-n NUM] generate random sequences with customized length and number (for random peaks et...) probability to choose each chrom based on the size distribution optional arguments: -h, --help show this help message and exit -g GENOME, --genome GENOME specify genome name to get chromosome info from UCSCGB, default: mm9 -m MEAN, --mean MEAN mean length of each random sequence,default:200 -s SD, --sd SD sd of random sequence lengths,default:20 -n NUM, --num NUM number of sequences to be randomly sampled,default:10000 library dependency: cruzdb (https://github.com/brentp/cruzdb),sqlalchemy 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors