1

I am building a workflow in snakemake and would like to recycle one of the rules to two different input sources. The input sources could be either source1 or source1+source2 and depending on the input the output directory would also vary. Since this was quite complicated to do in the same rule and I didn't want to create the copy of the full rule I would like to create two rules with different input/output, but running same command.

Is it possible to make this work? I get the DAG resolved correctly but the job don't go through on the cluster (ERROR : bamcov_cmd not defined).. An example below (both rules use the same command at the end):

this is command

def bamcov_cmd(): return( (deepTools_path+"bamCoverage " + "-b {input.bam} " + "-o {output} " + "--binSize {params.bw_binsize} " + "-p {threads} " + "--normalizeTo1x {params.genome_size} " + "{params.read_extension} " + "&> {log}") ) 

this is the rule

rule bamCoverage: input: bam = file1+"/{sample}.bam", bai = file1+"/{sample}.bam.bai" output: "bamCoverage/{sample}.filter.bw" params: bw_binsize = bw_binsize, genome_size = int(genome_size), read_extension = "--extendReads" log: "bamCoverage/logs/bamCoverage.{sample}.log" benchmark: "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark" threads: 16 run: bamcov_cmd() 

this is the optional rule2

rule bamCoverage2: input: bam = file2+"/{sample}.filter.bam", bai = file2+"/{sample}.filter.bam.bai" output: "bamCoverage/{sample}.filter.bw" params: bw_binsize = bw_binsize, genome_size = int(genome_size), read_extension = "--extendReads" log: "bamCoverage/logs/bamCoverage.{sample}.log" benchmark: "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark" threads: 16 run: bamcov_cmd() 

1 Answer 1

1

What you asked is possible in python. It depends if you have JUST python code in the file, or python and Snakemake. I will answer that first, and then I have a follow up response because I want you to set it up differently so you don't have to do it this way.

Just Python:

 from fileContainingMyBamCovCmdFunction import bamcov_cmd rule bamCoverage: ... run: bamcov_cmd() 

Visually, see how I do it in this file, to reference access to buildHeader and buildSample. These files are being called by a Snakefile. It should work the same for you. https://github.com/LCR-BCCRC/workflow_exploration/blob/master/Snakemake/modules/py_buildFile/buildFile.py

EDIT 2017-07-23 - Updating code segment below to reflect user comment

Snakemake and Python:

 include: "fileContainingMyBamCovCmdFunction.suffix" rule bamCoverage: ... run: shell(bamcov_cmd()) 

EDIT END

If the function is truly specific to the bamCoverage call, if you prefer you can put it back in the rule. This implies it's not being called elsewhere, which may be true. Be careful when annotating files using '.' notation, I use '_' as I find it's easier to prevent creating cyclical dependencies this way. Also, if you do end up leaving the two rules separately, you will likely end up with ambiguity errors. http://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=ruleorder#handling-ambiguous-rules When possible, it's best practice to have rules generating unique outputs.

As for alternatives, consider setting up the code like this?

from subprocess import call rule all: input: "path/to/file/mySample.bw" #OR #"path/to/file/mySample_filtered.bw" bamCoverage: input: bam = file1+"/{sample}.bam", bai = file1+"/{sample}.bam.bai" output: "bamCoverage/{sample}.bw" params: bw_binsize = bw_binsize, genome_size = int(genome_size), read_extension = "--extendReads" log: "bamCoverage/logs/bamCoverage.{sample}.log" benchmark: "bamCoverage/.benchmark/bamCoverage.{sample}.benchmark" threads: 16 run: callString= deepTools_path + "bamCoverage " \ + "-b " + wilcards.input.bam \ + "-o " + wilcards.output \ + "--binSize " str(params.bw_binsize) \ + "-p " + str({threads}) \ + "--normalizeTo1x " + str(params.genome_size) \ + " " + str(params.read_extension) \ + "&> " + str(log) call(callString, shell=True) rule filterBam: input: "{pathFB}/{sample}.bam" output: "{pathFB}/{sample}_filtered.bam" run: callString="samtools view -bh -F 512 " + wildcards.input \ + ' > ' + wildcards.output call(callString, shell=True) 

Thoughts?

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @TBoyarski .. First method doesn't work probably because my function contains snakemake wildcards to pass on to the rules. Third one is not suitable for me since I want to run the same command for multiple rules (changing the input and output). Second one works! I just had to add the shell() function inside the run directive of my rules for that. --> shell(bamcov_cmd())

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.