I am doing some scripts in python. I create a string that I save in a file. This string got lot of data, coming from the arborescence and filenames of a directory. According to convmv, all my arborescence is in UTF-8.
I want to keep everything in UTF-8 because I will save it in MySQL after. For now, in MySQL, which is in UTF-8, I got some problem with some characters (like é or è - I'am French).
I want that python always use string as UTF-8. I read some informations on the internet and i did like this.
My script begin with this :
#!/usr/bin/python # -*- coding: utf-8 -*- def createIndex(): import codecs toUtf8=codecs.getencoder('UTF8') #lot of operations & building indexSTR the string who matter findex=open('config/index/music_vibration_'+date+'.index','a') findex.write(codecs.BOM_UTF8) findex.write(toUtf8(indexSTR)) #this bugs! And when I execute, here is the answer : UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2171: ordinal not in range(128)
Edit: I see, in my file, the accent are nicely written. After creating this file, I read it and I write it into MySQL. But I dont understand why, but I got problem with encoding. My MySQL database is in utf8, or seems to be SQL query SHOW variables LIKE 'char%' returns me only utf8 or binary.
My function looks like this :
#!/usr/bin/python # -*- coding: utf-8 -*- def saveIndex(index,date): import MySQLdb as mdb import codecs sql = mdb.connect('localhost','admin','*******','music_vibration') sql.charset="utf8" findex=open('config/index/'+index,'r') lines=findex.readlines() for line in lines: if line.find('#artiste') != -1: artiste=line.split('[:::]') artiste=artiste[1].replace('\n','') c=sql.cursor() c.execute('SELECT COUNT(id) AS nbr FROM artistes WHERE nom="'+artiste+'"') nbr=c.fetchone() if nbr[0]==0: c=sql.cursor() iArt+=1 c.execute('INSERT INTO artistes(nom,status,path) VALUES("'+artiste+'",99,"'+artiste+'/")'.encode('utf8') And artiste who are nicely displayed in the file writes bad into the BDD. What is the problem ?