0

I've got a fully functional Snakemake workflow, but I'd like to add a rule where the input variables are written out as new lines in a newly generated output text file. To briefly summarize, I've included relevant code below:

OUTPUTDIR = config["outputDIR"] SAMPLEID = list(SAMPLE_TABLE.Sample_Name) # Above 2 lines are functional in other parts of script. rule all: input: manifest = OUTPUTDIR + "/manifest.txt" rule write_manifest: input: sampleid = SAMPLEID, loc_r1 = expand("{base}/trimmed/{sample}_1.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST), loc_r2 = expand("{base}/trimmed/{sample}_2.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST) output: OUTPUTDIR + "/manifest.txt" shell: """ echo "{input.sampleid},{input.loc_r1},forward" >> {output} echo "{input.sampleid},{input.loc_r2},reverse" >> {output} """ 

My issue is that Snakemake is reading in files, and I need it to print the file path or sample id that is it detecting instead. Help with syntax?

Desired output file needs to look like this:

depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse 

Trying to write this using echo.

Error message:

Building DAG of jobs... MissingInputException in [write_manifest]: Missing input files for rule write_manifest: sample1 sample2 sample3 

UPDATE: by adding sampleid to params:

rule write_manifest: input: loc_r1 = expand("{base}/trimmed/{sample}_{suf}_1.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$ loc_r2 = expand("{base}/trimmed/{sample}_{suf}_2.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$ output: OUTPUTDIR + "/manifest.txt" params: sampleid = SAMPLEID shell: """ echo "{params.sampleid},{input.loc_r1},forward" >> {output} echo "{params.sampleid},{input.loc_r2},reverse" >> {output} """ 

My output looked like this (which is incorrect)

sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,forward sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,reverse 

This is still not what I want, I need it to look like the below desired output. Can I write it so Snakemake loops through each sample/input/params? Desired output file needs to look like this:

depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse 
7
  • My issue is that Snakemake is reading in files, and I need it to print the file path or sample id that is it detecting instead. - Could you clarify this statement? Commented Mar 6, 2019 at 19:07
  • I updated question to show the desired output, which should clarify. I want to use echo or another way to print into a new text file (called manifest.txt) a line that has 3 strings separated by commas (shown in quotes next to echo statement). Commented Mar 6, 2019 at 19:12
  • What is the problem/roadblock? Is it that snakemake doesn't run this rule when you have new samples? Commented Mar 6, 2019 at 19:18
  • Snakemake gives me a "MissingInputException" error and says that I'm missing input files for the "SAMPLEID", but SAMPLEID is just a list of strings (e.g. "sample1", etc.) So I don't want Snakemake to read in an file, I need it to read in the SAMPLEID as is. I've updated question again to show error message Commented Mar 6, 2019 at 22:09
  • I've actually just figured it out! I need to add SAMPLEID to params, instead of input. However, it is adding everything and then comma separating it, I still need to figure out how to have each sample and associated files listed for each row. Echo needs to loop through maybe? Commented Mar 6, 2019 at 22:20

1 Answer 1

2

You need to use wildcard sample in params instead of variable SAMPLEID. This will use proper sample id specific for that rule when executed.

params: sample = '{sample}' shell: """ echo "{params.sample},{input.loc_r1},forward" >> {output} echo "{params.sample},{input.loc_r2},reverse" >> {output} """ 
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, this is closer to the almost solution, but note two things. (1) {sample} != {sampleid} in my example. (2) the output still does not generate the correct outcome. See in my above question, all the sampleids are list, then all the inputs, and then forward, and then this is repeated. My desired outcome is to have a single line at a time printed for EACH sample (desired outcome shown above)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.