- Most of your comments aren't that great. Commenting about PEP8 compliance shouldn't be needed, and saying you're instantiating an object before doing it duplicates the amount we have to read for no actual gain.
os.path.join is much better at joining file locations by the OSes separator than '{}/{}'.format. Please use it instead.
An alternate to this in Python 3.4+ could be pathlib, as it allows you to extend the path by using the / operator. I have however not tested that this works with the functions you're using.
Here's an example of using it: (untested)
DATA_DIR = pathlib.PurePath('raw_data') ... os.remove(DATA_DIR / 'All.wav')
- You should move
chunk out of the for loop, making it a function argument may be a good idea too. Making a function to infinitely read your wf may ease reading slightly, and giving it a good name such as cycle_wave would allow people to know what it's doing. As it'd work roughly the same way as itertools.cycle. This could be implemented as:
def cycle_wave(wf): while True: data = wf.readframes(chunk) if data == b'': wf.rewind() time.sleep(1) data = wf.readframes(chunk) yield data
For your spec_filename you can use re.match to get a single match, rather than all numbers in the file name. You also don't need to use str on the object as format will do that by default.
Rather than removing a file from your directory, to then search the directory, you can instead remove the file from the resulting list from glob.glob. Since it returns a normal list, you can go about this the same way you would otherwise.
One way you can do this, is as followed:
files = glob.glob('D:/*') try: files.remove('D:/$RECYCLE.BIN') except ValueError: pass
If you have multiple files you want to remove you could instead use sets, and instead use:
files = set(glob.glob('D:/*')) - {'D:/$RECYCLE.BIN'}
All of this together can get you:
import glob import sys import os import shutil import wave import time import re from threading import Thread import scipy.io.wavfile import pyaudio DATA_DIR = 'raw_data' LABELED_DIR = 'labeled_data' answer = None def cycle_wave(wf): while True: data = wf.readframes(chunk) if data == b'': wf.rewind() time.sleep(1) data = wf.readframes(chunk) yield data def classify_files(chunk=1024): global answer join = os.path.join p = pyaudio.PyAudio() files = set(glob.glob(join(DATA_DIR, '*.wav'))) - {join(DATA_DIR, 'ALL.wav')} for filename in files: wf = wave.open(filename, 'rb') stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), channels=wf.getnchannels(), rate=wf.getframerate(), output=True) for data in cycle_wave(wf): if answer is not None: break stream.write(data) # don't know how to classify, skip sample if answer == '.': answer = None continue # sort spectogram based on input spec_filename = 'spec{}.jpeg'.format(re.match(r'\d+', filename)[0]) os.makedirs(join(LABELED_DIR, answer), exist_ok=True) shutil.copyfile( join(DATA_DIR, spec_filename), join(LABELED_DIR, answer, spec_filename) ) # reset answer field answer = None #stop stream stream.stop_stream() stream.close() #close PyAudio p.terminate() if __name__ == '__main__': join = os.path.join try: # exclude file from glob files = set(glob.glob(join(DATA_DIR, '*.wav'))) - {join(DATA_DIR, 'ALL.wav')} num_files = len(files) Thread(target = classify_files).start() for _ in range(0, num_files): answer = input("Enter letter of sound heard: ") except KeyboardInterrupt: sys.exit()
But I've left out proper handling of streams, in most languages, that I've used streams in, it's recommended to always close the steam. In Python it's the same. You can do this normally in two ways:
Use with, this hides a lot of the code, so it makes using streams seamless. It also makes people know the lifetime of the stream, and so people won't try to use it after it's been closed.
Here's an example of using it:
with wave.open('<file location>') as wf: print(wf.readframes(1024))
Use a try-finally. You don't need to add an except clause to this, as if it errors you may not want to handle it here, but the finally is to ensure that the stream is closed.
Here's an example of using it:
p = pyaudio.PyAudio() try: stream = p.open(...) try: # do some stuff finally: stream.stop_stream() stream.close() finally: p.terminate()
I'd personally recommend using one of the above in your code. I'd really recommend using with over a try-finally, but pyaudio doesn't support that interface. And so you'd have to add that interface to their code, if you wanted to go that way.