0

I'm querying a table in a SQL Server database and exporting out to a CSV using pandas:

import pandas as pd df = pd.read_sql_query(sql, conn) df.to_csv(csvFile, index=False) 

Is there a way to remove non-ascii characters when exporting the CSV?

1
  • df.to_csv(csvFile, index=False, encoding='ascii') ? Commented Jan 27, 2022 at 23:13

2 Answers 2

1

You can read in the file and then use a regular expression to strip out non-ASCII characters:

df.to_csv(csvFile, index=False) with open(csvFile) as f: new_text = re.sub(r'[^\x00-\x7F]+', '', f.read()) with open(csvFile, 'w') as f: f.write(new_text) 
Sign up to request clarification or add additional context in comments.

2 Comments

thank you for the fast response. If you have time, is it possible to change the encoding of the CSV from ANSI to UTF-8? I tried appending "encoding='utf-8'" to the second open function, but the CSV remains in ANSI
Hmm...well, I'm not really sure how to help with that. Perhaps the writing to the file (after the to_csv call) is the culprit?
0

This was the case I ran into. Here's what worked for me:

import re regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile: for line in infile: #keep looping until we hit EOF (meaning there's no more lines to read) outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.