0

I am trying to unzip fasta.gz files in order to work with them. I have created a script using cmd base on something I have done before but now I cannot manage to work the newly created function. See below:

import glob import sys import os import argparse import subprocess import gzip #import gunzip def decompressed_files(): print ('starting decompressed_files') #files where the data is stored input_folder=('/home/me/me_files/PB_assemblies_for_me') #where I want my data to be output_folder=input_folder + '/fasta_files' if os.path.exists(output_folder): print ('folder already exists') else: os.makedirs(output_folder) print ('folder has been created') for f in input_folder: fasta=glob.glob(input_folder + '/*.fasta.gz') #print (fasta[0]) #sys.exit() cmd =['gunzip', '-k', fasta, output_folder] my_file=subprocess.Popen(cmd) my_file.wait decompressed_files() print ('The programme has finished doing its job') 

But this give the following error:

TypeError: execv() arg 2 must contain only strings

If I write fasta, the programme looks for a file an the error becomes:

fasta.gz: No such file or directory

If I go to the directory where I have the files and I key gunzip, name_file_fasta_gz, it does the job beautifully but I have a few files in the folder and I would like to create the loop. I have used 'cmd' before as you can see in the code below and I didn't have any problem with it. Code from the past where I was able to put string, and non-string.

cmd=['velveth', output, '59', '-fastq.gz', '-shortPaired', fastqs[0], fastqs[1]] #print cmd my_file=subprocess.Popen(cmd)#I got this from the documentation. my_file.wait() 

I will be happy to learn other ways to insert linux commands within a python function. The code is for python 2.7, I know it is old but it is the one is install in the server at work.

2 Answers 2

1

fasta is a list returned by glob.glob(). Hence cmd = ['gunzip', '-k', fasta, output_folder] generates a nested list:

['gunzip', '-k', ['foo.fasta.gz', 'bar.fasta.gz'], output_folder] 

but execv() expects a flat list:

['gunzip', '-k', 'foo.fasta.gz', 'bar.fasta.gz', output_folder] 

You can use the list concentration operator + to create a flat list:

cmd = ['gunzip', '-k'] + fasta + [output_folder] 
Sign up to request clarification or add additional context in comments.

Comments

0

I haven't tested this but it might solve you unzip problem using command. command gunzip -k is to keep both the compressed and decompressed file then what is the purpose of output directory.

import subprocess import gzip def decompressed_files(): print('starting decompressed_files') # files where the data is stored input_folder=('input') # where I want my data to be output_folder = input_folder + '/output' if os.path.exists(output_folder): print('folder already exists') else: os.makedirs(output_folder) print('folder has been created') for f in os.listdir(input_folder): if f and f.endswith('.gz'): cmd = ['gunzip', '-k', f, output_folder] my_file = subprocess.Popen(cmd) my_file.wait 

print(cmd) will look as shown below

['gunzip', '-k', 'input/sample.gz', 'input/output'] 

I have a few files in the folder and I would like to create the loop

From above quote your actual problem seems to be unzip multiple *.gz files from path in that case below code should solve your problem.

import os import shutil import fnmatch def gunzip(file_path,output_path): with gzip.open(file_path,"rb") as f_in, open(output_path,"wb") as f_out: shutil.copyfileobj(f_in, f_out) def make_sure_path_exists(path): try: os.makedirs(path) except OSError: if not os.path.isdir(path): raise def recurse_and_gunzip(input_path): walker = os.walk(input_path) output_path = 'files/output' make_sure_path_exists(output_path) for root, dirs, files in walker: for f in files: if fnmatch.fnmatch(f,"*.gz"): gunzip(root + '/' + f, output_path + '/' + f.replace(".gz","")) recurse_and_gunzip('files') 

source

EDIT:

Using command line arguments - subprocess.Popen(base_cmd + args) : Execute a child program in a new process. On Unix, the class uses os.execvp()-like behavior to execute the child program

fasta.gz: No such file or directory

So any extra element to cmd list is treated as argument and gunzip will look for argument.gz file hence the error fasta.gz file not found.

ref and some useful examples

Now if you want to pass gz files as command line argument you can still do that with below code( you might need to polish little bit as per your need)

import argparse import subprocess import os def write_to_desired_location(stdout_data,output_path): print("Going to write to path", output_path) with open(output_path, "wb") as f_out: f_out.write(stdout_data) def decompress_files(gz_files): base_path=('files') # my base path output_path = base_path + '/output' # output path if os.path.exists(output_path): print('folder already exists') else: os.makedirs(output_path) print('folder has been created') for f in gz_files: if f and f.endswith('.gz'): print('starting decompressed_files', f) proc = subprocess.Popen(['gunzip', '-dc', f], stdout=subprocess.PIPE) # d:decompress and c:stdout write_to_desired_location(proc.stdout.read(), output_path + '/' + f.replace(".gz", "")) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument( "-gzfilelist", required=True, nargs="+", # 1 or more arguments type=str, help='Provide gz files as arguments separated by space Ex: -gzfilelist test1.txt.tar.gz test2.txt.tar.gz' ) args = parser.parse_args() my_list = [str(item)for item in args.gzfilelist] # converting namedtuple into list decompress_files(gz_files=my_list) 

execution:

python unzip_file.py -gzfilelist test.txt.tar.gz 

output

folder already exists ('starting decompressed_files', 'test.txt.tar.gz') ('Going to write to path', 'files/output/test.txt.tar') 

You can pass multiple gz files as well for example

python unzip_file.py -gzfilelist test1.txt.tar.gz test2.txt.tar.gz test3.txt.tar.gz 

3 Comments

The purpose of the output_file is to store the unzip files in it. Your script is quite nice but it does not answer the question on how to pass a command line within a python function.
Yes, tha's what I would like to. I have used it before, as you can see by the code I posted where I used it for an assembly with velvet. But this particular one is not working, and keep sending me the posted error messages.
@Ana I have made an edit in above solution can you see if it helps ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.