Linked Questions
34 questions linked to/from Stripping non printable characters from a string in python
0 votes
5 answers
2k views
Python: remove ^A in a string [duplicate]
I get a string from database that contains strange characters, and the characters break the json string. Here is the json string: {"id":13,"code":"cflw`2B2[h1s`lNzF@sPC1FtaCiK0VF@","label":"...
865 votes
32 answers
1.2m views
Best way to strip punctuation from a string
It seems like there should be a simpler way than: import string s = "string. With. Punctuation?" # Sample string out = s.translate(string.maketrans("",""), string.punctuation) Is there?
386 votes
16 answers
548k views
How to remove \xa0 from string in Python?
I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...
334 votes
12 answers
314k views
Replace non-ASCII characters with a single space
I need to replace all non-ASCII (\x00-\x7F) characters with a space. I'm surprised that this is not dead-easy in Python, unless I'm missing something. The following function simply removes all non-...
19 votes
3 answers
11k views
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
I am trying to convert the html entity to unichar, the html entity is 󮠖 when i try to do the following: unichr(int(976918)) I got error that: ValueError: unichr() arg not in range(...
15 votes
4 answers
25k views
Python - how to delete hidden signs from string?
Sometimes I have a strings with strange characters. They are not visible in browser, but are part of the string and are counted in len(). How can I get rid of it? Strip() deletes normal space but not ...
10 votes
4 answers
16k views
How to remove escape sequence like '\xe2' or '\x0c' in python
I am working on a project (content based search), for that I am using 'pdftotext' command line utility in Ubuntu which writes all the text from pdf to some text file. But it also writes bullets, now ...
3 votes
2 answers
14k views
Python Pandas: Error tokenizing data. C error: EOF inside string starting when reading 1GB CSV file
I'm reading a 1 GB CSV file in chunks of 10,000 rows. The file has 1106012 rows and 171 columns, other smaller sized file does not show any error and finish off successfully but when i read this 1 GB ...
3 votes
3 answers
14k views
How to split line at non-printing ascii character in Python
How can I split a line in Python at a non-printing ascii character (such as the long minus sign hex 0x97 , Octal 227)? I won't need the character itself. The information after it will be saved as a ...
5 votes
4 answers
9k views
How to find/replace non printable / non-ascii characters using Python 3?
I have a file, some lines in a .csv file that are jamming up a database import because of funky characters in some field in the line. I have searched, found articles on how to replace non-ascii ...
3 votes
2 answers
6k views
How do I remove a \ from a string in python
I'm having trouble getting a replace() to work I've tried my_string.replace('\\', '') and re.sub('\\', '', my_string), but neither one works. I thought \ was the escape code for backslash, am I ...
3 votes
2 answers
12k views
Remove ASCII control characters from text file Python
I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the ...
0 votes
1 answer
4k views
Python: Remove non ascii characters from csv
I have a csv file only 4 out of 4000 records has some non-ASCII chars. For example ['com.manager', '2016012300', '16.1.23', 'en', 'kinzie', '2015-04-11T17:36:23Z', '1428773783781', '2016-03-11T09:53:...
3 votes
2 answers
4k views
How to delete non-printable character in rdd using pyspark
Need to remove non-printable characters from rdd. Sample data is below "@TSX•","None" "@MJU•","None" expected output @TSX,None @MJU,None Tried below code but its not working sqlContext.read....
0 votes
2 answers
7k views
Python: Extract text from multiple pdf and paste on excel
i'm a total new in python, could you help me correct this code? I would like to add 2 things: do the operation on multiple pdf and not just one and pasting the content in A2,A3 A4 and so on if ...