The nutshell of my problem is that my script cannot write complete unicode strings (retrieved from a db) to a csv, instead only the first character of each string is written to the file. eg:
U,1423.0,831,1,139 Where the output should be:
University of Washington Students,1423.0,831,1,139 Some background: I'm connecting to an MSSQL database using pyodbc. I have my odbc config file set up for unicode, and connect to the db as follows:
p.connect("DSN=myserver;UID=username;PWD=password;DATABASE=mydb;CHARSET=utf-8") I can get data no problem, but the issue arises when I try to save query results to the csv file. I've tried using csv.writer, the UnicodeWriter solution in the official docs, and most recently, the unicodecsv module I found on github. Each method yields the same results.
The weird thing is I can print the strings in the python console no problem. Yet, if I take that same string and write it to csv, the problem emerges. See my test code & results below:
Code to highlight issue:
print "'Raw' string from database:" print "\tencoding:\t" + whatisthis(report.data[1][0]) print "\tprint string:\t" + report.data[1][0] print "\tstring len:\t" + str(len(report.data[1][0])) f = StringIO() w = unicodecsv.writer(f, encoding='utf-8') w.writerows(report.data) f.seek(0) r = unicodecsv.reader(f) row = r.next() row = r.next() print "Write/Read from csv file:" print "\tencoding:\t" + whatisthis(row[0]) print "\tprint string:\t" + row[0] print "\tstring len:\t" + str(len(row[0])) Output from test:
'Raw' string from database: encoding: unicode string print string: University of Washington Students string len: 66 Write/Read from csv file: encoding: unicode string print string: U string len: 1 What could be the reason for this issue and how might I resolve it? Thanks!
EDIT: the whatisthis function is just to check the string format, taken from this post
def whatisthis(s): if isinstance(s, str): print "ordinary string" elif isinstance(s, unicode): print "unicode string" else: print "not a string"