Skip to content

atsumiando/RNAseq_figure_plotter_python

Repository files navigation

RNAseq_figure_plotter_python

Generate nine different plots (bar, box, density, dot, heatmap, histogram, line, scatter, or violin) from RNAseq result table using seaborn program.

This software runs in python 2.7 environment. Please type this code "conda install -c anaconda seaborn=0.9.0" to update seaborn to use rnaseq_figure_plotter software.

It is python codes and use "python rnaseq_figure_plotter.py -i input_file -t bar -o output_file -g gene_list_file ... -c 5 -s 6" to run!

parameter of rnaseq_figure_plotter

HELP	-h, --help	show this help message and exit 

required function

INPUT	-i, --input	input file name TYPE	-t, --type	choose plot types (bar, box, density, dot, heatmap, histogram, line, scatter, or violin) 

general optional function

OUTPUT	-o, --output	default output; output file name GENE	-g, --gene	file name of specific gene ID list; generate "output"_gene_selection.txt file LOG2	-l, --log	default None; calculate log value (log2; 2, log10; 10, loge; e) LOG2_NUMBER	-lgn, --log_number	default 0.000000001; add number to avoid -inf for log value XAXIS	-x, --xaxis	default samples; choose x-axis (gene, sample, or value) YAXIS	-y, --yaxis	default data; choose y-axis (gene, sample, or value) ZAXIS	-z, --zaxis	default gene; choose z-axis (gene, sample, or value) COLOR	-c, --color	default 1; choose color type (1-10) FIGURE_SAVE_FORMAT	-f, --figure_save_format	default pdf; choose format of figures (eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, or tiff) 

optional parameter for individual plot types

STYLE	-s, --style	default 1; choose style of figures (1-8) ZSCORE	-zs, --zscore	default None; apply z-score transformation in heatmap. Z-score application in column or row is --xaxis (column); 1, and --zaxis (row); 2) CLUSTER_COLUMN	-cc, --cluster_column	default None; apply column cluster function for heatmap (on; 1) CLUSTER_ROW	-cr, --cluster_row	default None; apply row cluster function for heatmap (on; 1) SCATTER_COLUMN	-sc, --scatter_column	default None; type column of two samples for comparison in dot plot. Split samples by comma(,). (example "sample1,sample2") SCATTER_ROW	-sr, --scatter_row	default None; type row of two genes for comparison in dot plot. Split genes by comma(,). (example "geneA,geneB") 

input file format (-i, --input)

Input file requires to be tab delimited file. First column and row should be gene ID and sample name, respectively. Gene expression value starts from second columns and rows.

Example of input file looks like followings;

sample1	sample2	sample3	sample4	sample5	geneA	1	3	5.5	7	2	geneB	100	267	55	79	62	geneC	0.3	0.65	9.5	0.87	2.1	geneD	205	356	78	67	2900	geneE	1001	3001	5500	7001	2001	geneF	2	2	2	2	2	geneG	0.01	0.03	0.5	0.07	0.02 

type of plots (-t, --type)

There are nine types of plot you can choose from bar, box, density, dot, heatmap, histogram, line, scatter, or violin.

All plots are generated by using Seaborn (https://seaborn.pydata.org).

output file name (-o, --output)

Provide output file name.

specific gene id list file format (-g, --gene)

Gene ID should be in first row and split by \n.

Example of specific gene ID list file looks like followings;

geneA	geneD	geneG 

(-g, --gene) function automatically selects expression value consistent with provided specific gene ID, and provides "output"_gene_selection.txt file.

Example of "output"_gene_selection.txt file looks like followings;

geneA	1	3	5.5	7	2	geneD	205	356	78	67	2900	geneG	0.01	0.03	0.5	0.07	0.02 

log2 transformation (-l, --log) and (-lgn, --log_number)

Provide log2, log10, or loge transform for gene expression value by type 2, 10, or e, respectively in (-l, --log) function. Default of (-l, --log) function is off (None).

To avoid -inf for log2 value for generating plots, (-lgn, --log2_number) function add tiny values (defalut 0.000000001). You can customize this value by type number (example 0, 0.000001, 0.000000000000000001, etc...).

axis (-x, --xaxis), (-y, --yaxis), and (-z, --zaxis)

Default of x-axis, y-axis, and z-axis are sample, data, and gene, respectively. Sample, data, and gene refer to sample name, gene expression value, and gene ID, respectively.

Following table shows which axis you can modify.

plots	x-axis	y-axis	legend bar	x	y	z* box	x	y density	x* dot	x	y	z* heatmap	x*	z* histogram	x* line	x*	y(data)	z* scatter violin	x	y 

*(sample or gene)

color settings (-c, --color)

Seaborn color palette (https://seaborn.pydata.org/tutorial/color_palettes.html) is using for color setting. Setting is followings;

settings	palette	color description 1	RdBu_r (default)	red to blue 2	Reds	red to white 3	Blues	blue to white 4	RdYlBu_r	red to yellow to blue 5	RdGy_r	red to glay 6	Paired	read seaborn website 7	cubehelix	read seaborn website 8	muted	read seaborn website 9	hls	read seaborn website 10	Set2	read seaborn website 

save figure format (-f, --figure_save_format)

Provided save figure format. Default is pdf, you can also choose eps, jpeg, jpg, pgf, png, ps, raw, rgba, svg, svgz, tif, or tiff

style settings (-s, --style)

Seaborn set_style and set_context (https://seaborn.pydata.org/tutorial/aesthetics.html) is using for style setting. Setting is followings;

set_style and set_context are background settings and size (paper; small and talk; large), respectively.

settings	set_style	set_context 1	whitegrid	paper 2	whitegrid	talk 3	white	paper 4	white	talk 5	darkgrid	paper 6	darkgrid	talk 7	dark	paper 8	dark	talk 

z-score transformation (-zs, --zscore)

(-zs, --zscore) function can be used for heatmap. Z-score application for column (-x, --xaxis) and row (-z, --zaxis) are 1 and 2, respectively.

cluster function for heatmap (-cc, --cluster_column) and (-cr, --cluster_row)

Apply clustering in column and/or row by type 1.

scatter plot two dataset setting (-sc, --scatter_column) and (-sr, --scatter_row)

Type two dataset settings for column (sample) and row (gene) by (-sc, --scatter_column) and (-sr, --scatter_row) function, respectively. This code is required for scattered plot.

(-sc, --scatter_column) and (-sr, --scatter_row) function required dataset "x-axis,y-axis" for scattered plot and split samples or genes by comma(,). Example of (-sc, --scatter_column) and (-sr, --scatter_row) are "sample1,sample3" and "geneA,geneG", respectively. Color cannot change in scatter plot function.

About

Require one command line! Generate nine common RNAseq figures from RNAseq result table by python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages