I'm trying to run some code to simply go through a bunch of files and write those that happen to be .txt files into the same file, removing all the spaces. Here's some simple code that should do the trick:
for subdir, dirs, files in os.walk(rootdir): for file in files: if '.txt' in file: f = open(subdir+'/'+file, 'r') line = f.readline() while line: line2 = line.split() if line2: output_file.write(" ".join(line2)+'\n') line = f.readline() f.close() But instead, I get the following error:
File "/usr/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xfe in position 0: unexpected code byte
It turns out these .txt files are all in UTF-16 (according to FireFox, at any rate). I thought Python 3.x was supposed to be able to handle any sort of character encoding??
Best, Georgina
output_file.write(input_file.read().decode('utf-16').replace(u" ", u"").encode('desired encoding'))