Linked Questions

0 votes
5 answers
2k views

I get a string from database that contains strange characters, and the characters break the json string. Here is the json string: {"id":13,"code":"cflw`2B2[h1s`lNzF@sPC1FtaCiK0VF@","label":"...
user3444101's user avatar
865 votes
32 answers
1.2m views

It seems like there should be a simpler way than: import string s = "string. With. Punctuation?" # Sample string out = s.translate(string.maketrans("",""), string.punctuation) Is there?
Redwood's user avatar
  • 69.8k
386 votes
16 answers
548k views

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...
zhuyxn's user avatar
  • 7,121
334 votes
12 answers
314k views

I need to replace all non-ASCII (\x00-\x7F) characters with a space. I'm surprised that this is not dead-easy in Python, unless I'm missing something. The following function simply removes all non-...
dotancohen's user avatar
  • 31.8k
19 votes
3 answers
11k views

I am trying to convert the html entity to unichar, the html entity is 󮠖 when i try to do the following: unichr(int(976918)) I got error that: ValueError: unichr() arg not in range(...
Aamir Rind's user avatar
  • 39.8k
15 votes
4 answers
25k views

Sometimes I have a strings with strange characters. They are not visible in browser, but are part of the string and are counted in len(). How can I get rid of it? Strip() deletes normal space but not ...
robos85's user avatar
  • 2,554
10 votes
4 answers
16k views

I am working on a project (content based search), for that I am using 'pdftotext' command line utility in Ubuntu which writes all the text from pdf to some text file. But it also writes bullets, now ...
vaibhav1312's user avatar
3 votes
2 answers
14k views

I'm reading a 1 GB CSV file in chunks of 10,000 rows. The file has 1106012 rows and 171 columns, other smaller sized file does not show any error and finish off successfully but when i read this 1 GB ...
Wcan's user avatar
  • 888
3 votes
3 answers
14k views

How can I split a line in Python at a non-printing ascii character (such as the long minus sign hex 0x97 , Octal 227)? I won't need the character itself. The information after it will be saved as a ...
d-cubed's user avatar
  • 1,112
5 votes
4 answers
9k views

I have a file, some lines in a .csv file that are jamming up a database import because of funky characters in some field in the line. I have searched, found articles on how to replace non-ascii ...
user10664542's user avatar
  • 1,346
3 votes
2 answers
6k views

I'm having trouble getting a replace() to work I've tried my_string.replace('\\', '') and re.sub('\\', '', my_string), but neither one works. I thought \ was the escape code for backslash, am I ...
Joshua Olson's user avatar
  • 3,823
3 votes
2 answers
12k views

I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the ...
atmaere's user avatar
  • 345
0 votes
1 answer
4k views

I have a csv file only 4 out of 4000 records has some non-ASCII chars. For example ['com.manager', '2016012300', '16.1.23', 'en', 'kinzie', '2015-04-11T17:36:23Z', '1428773783781', '2016-03-11T09:53:...
Dutta's user avatar
  • 683
3 votes
2 answers
4k views

Need to remove non-printable characters from rdd. Sample data is below "@TSX•","None" "@MJU•","None" expected output @TSX,None @MJU,None Tried below code but its not working sqlContext.read....
LUZO's user avatar
  • 1,039
0 votes
2 answers
7k views

i'm a total new in python, could you help me correct this code? I would like to add 2 things: do the operation on multiple pdf and not just one and pasting the content in A2,A3 A4 and so on if ...
Gabry's user avatar
  • 33

15 30 50 per page