15

I want to read a file that contains also German and not only characters. I found that i can do like this

 >>> import codecs >>> file = codecs.open('file.txt','r', encoding='UTF-8') >>> lines= file.readlines() 

This is working when i try to run my job in Python IDLE but when i try to run it from somewhere else does not give correct result. Have a idea?

16
  • What version of python are you using? Commented Jun 18, 2012 at 16:10
  • 1
    It depends what encoding the file was saved with. iso8859-1 is probably good guess if it's not UTF-8. Commented Jun 18, 2012 at 16:10
  • python3.1. Really how we see the current version we use? Commented Jun 18, 2012 at 16:11
  • 1
    @indiag, Try reading the file in binary mode using open('file.txt', 'rb').readlines(), and then use print(repr(line)) for a line that you know contains the German characters, as well as what you expect it to be. This should help us determine what the encoding is. Commented Jun 18, 2012 at 16:19
  • 1
    @indiag, it just occurred to me that readlines() probably doesn't work in binary mode, try print(repr(open('file.txt', 'rb').read())), and then post all or a portion of the output. Commented Jun 18, 2012 at 16:27

2 Answers 2

24

You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:

$ pip install chardet 

Then, for example reading the file in binary mode:

>>> import chardet >>> chardet.detect(open("file.txt", "rb").read()) {'confidence': 0.9690625, 'encoding': 'utf-8'} 

So then:

>>> import codecs >>> import unicodedata >>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines() 
Sign up to request clarification or add additional context in comments.

1 Comment

You have to import codecs at the top of your file: import codecs
0

I believe the file is being read correctly but is using the wrong encoding when output. This is based on the fact that you get the proper results in IDLE.

I would suggest trying to use print(line.encode('utf-8')) but I'm afraid I don't know if Python 3 will print a bytes object properly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.