I tried to write a list to txt file with the encoding utf-8 without BOM. But have a problem is: If i create that file with utf-8 withou BOM encoding:
ポ 1 田 11 直 11 子 11 and use my function to read it to a list:
import codecs def file_to_list(file_name): results = [] f = codecs.open(file_name, encoding='utf-8') for line in f: results.append(line.replace('\r\n', '')) return results list_1 = file_to_list('test_read.txt') print(list_1) I got the ok result : ['ポ', '1', '田', '11', '直', '11', '子', '11'] But after this i using another function to write back to file and read that file again, a problem appear:
def list_to_file(file_name, thelist): f = codecs.open(file_name, "w", encoding='utf-8') for item in thelist: f.write("%s\n" % item) list_to_file('test_read.txt', list_1) list_2 = file_to_list('test_read.txt') print(list_2) the return of print is :['ポ\n', '1\n', '田\n', '11\n', '直\n', '11\n', '子\n', '11\n'] So, what wrong to make '\n'?
line.replace('\r\n', '')doesn't touch the plain'\n'bytes written bylist_to_file. BTW, you should always mention the Python version with Unicode questions, since Py2 and Py3 have major differences in Unicode handling. Also, a UTF-8 encoded file should never start with a BOM unless its required by some broken software that you're forced to use.