0

I have a text file with Hindi text lines(about 5400000 lines) in it. I want to save these lines in a string array in python. I tried this code:

 f = open("cleanHindi_Translated.txt" , "r") array = [] for line in f: array.append(line) print(array) 

But I am getting an error:

 Traceback (most recent call last): File "hindi.py", line 11, in <module> for line in f: File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined> PS C:\Users\Preeti\Downloads\Compressed> python hindi.py Traceback (most recent call last): File "hindi.py", line 11, in <module> for line in f: File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined> 

I don't understand on what I did wrong here.

5
  • Haven try something like this before, but I guess should be .append(line). Commented Jun 30, 2019 at 11:51
  • I tried to include: encoding="utf8" but I am not able to include the read mode - "r" in that case. So I don not think it is a duplicate of that question as the solutions given there have not worked for me. Commented Jun 30, 2019 at 11:56
  • Have you tried this stackoverflow.com/questions/3277503/… like this? Commented Jun 30, 2019 at 12:00
  • Yes I did try that but ended up getting a similar error. Commented Jun 30, 2019 at 12:10
  • Edit question to show your additional attempts. open definitely takes an encoding and your error message shows that the encoding is wrong. Commented Jun 30, 2019 at 12:14

1 Answer 1

1

'lines' is the array (list) you are looking for

import io with io.open('my_file.txt','r',encoding='utf-8') as f: lines = f.readlines() 
Sign up to request clarification or add additional context in comments.

7 Comments

I am still getting an error Traceback (most recent call last): File "hindi.py", line 9, in <module> lines = f.readlines() File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined>
@PraveenIyer I have updated the code.
This code seems to work but when I tried print(lines) I am getting outputs with question marks in it instead of hindi text.
Thank you so much this seems to be working. I'll try and put this all together and get the results i wanted.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.