Feeding the output of one task into another in WDL

Question

Hi there I'm new to WDL and I need to complete a two-step workflow,

Basically, what I need to do is to extract and merge two reads dataset (forward and reverse) for a sample and pass the merged, uncompressed file to a different tool for genome inference. I'm able to complete the first step with no major issues; however, when it comes to get my output to the next task I'm facing some problems. Below a copy of the workflow I wrote

version 1.0 workflow step2 { input { String PANGENIE_CONTAINER = "overcraft90/eblerjana_pangenie:2.0.1" File FORWARD_FASTQ # compressed R1 File REVERSE_FASTQ # compressed R2 String NAME = "sample" # how to loop over samples' name in numerical order (maybe grub names' prefix)!? File PANGENOME_VCF # input vcf with variants to be genotyped File REF_GENOME # reference for variant calling File FASTQ_FILE # sample's FASTQ String VCF_PREFIX = "genotype" # string to attach to a sample's genotype String EXE_PATH = "/app/pangenie/build/src/PanGenie" # path to PanGenie executable in Docker Int CORES = 24 # number of cores to allocate for PanGenie execution Int DISK = 300 # storage memory for output files Int MEM = 100 # RAM memory allocated } call reads_extraction_and_merging { input: in_container_pangenie=PANGENIE_CONTAINER, in_forward_fastq=FORWARD_FASTQ, in_reverse_fastq=REVERSE_FASTQ, in_label=NAME, #later can be plural in_cores=CORES, in_disk=DISK, in_mem=MEM } call genome_inference { input: in_container_pangenie=PANGENIE_CONTAINER, # not sure whether Docker needs to be re-run in_pangenome_vcf=PANGENOME_VCF, in_reference_genome=REF_GENOME, in_executable=EXE_PATH, fastq_file=FASTQ_FILE, prefix_vcf=VCF_PREFIX, in_cores=CORES, in_disk=DISK, in_mem=MEM } output { File sample = reads_extraction_and_merging.fastq_file File genotype = genome_inference.vcf_file } } task reads_extraction_and_merging { input { String in_container_pangenie File in_forward_fastq File in_reverse_fastq String in_label Int in_cores Int in_disk Int in_mem } command <<< cat ~{in_forward_fastq} ~{in_reverse_fastq} | gzip -dc > ~{in_label}.fastq cp ~{in_label}.fastq /home/mat/cromwell-executions/step2/ >>> output { File fastq_file = "~{in_label}.fastq" } runtime { docker: in_container_pangenie memory: in_mem + " GB" cpu: in_cores disks: "local-disk " + in_disk + " SSD" } } task genome_inference { input { String in_container_pangenie File in_reference_genome File in_pangenome_vcf String in_executable String prefix_vcf File fastq_file Int in_cores Int in_disk Int in_mem } command <<< echo "vcf: ~{in_pangenome_vcf}" > /app/pangenie/pipelines/run-from-callset/config.yaml echo "reference: ~{in_reference_genome}" >> /app/pangenie/pipelines/run-from-callset/config.yaml echo $'reads:\n sample: ~{fastq_file}' >> /app/pangenie/pipelines/run-from-callset/config.yaml echo "pangenie: ~{in_executable}" >> /app/pangenie/pipelines/run-from-callset/config.yaml echo "outdir: /app/pangenie" >> /app/pangenie/pipelines/run-from-callset/config.yaml cd /app/pangenie/pipelines/run-from-callset snakemake --cores ~{in_cores} >>> output { File vcf_file = "~{prefix_vcf}.vcf" } runtime { docker: in_container_pangenie memory: in_mem + " GB" cpu: in_cores disks: "local-disk " + in_disk + " SSD" preemptible: 1 # can be useful for tools which execute sequential steps in a pipeline generating intermediate outputs } }

I try to summarize what's going on. After getting my two reads dataset, merging them and unzipping the output to ~{in_label}.fastq; I have to get this new file and feed it to the next task, so my idea was to copy it to a directory and form there get its path in the .json I use to run the WDL with Cromwell. First of all, I believe it might not be the right way to do it and if it is the case what should I do instead? Secondly, is there anything else in the code which might be changed? For instance, I realised I cannot use bgzip to speed up the unzipping step taking advantage of multiple threads; this will be useful when scaling to multiple samples. Thanks in advance!

Stack Exchange Network

Feeding the output of one task into another in WDL

0

Hot Network Questions

Feeding the output of one task into another in WDL

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions