9

I'm having a problem emailing unicode characters using smtplib in Python 3. This fails in 3.1.1, but works in 2.5.4:

 import smtplib from email.mime.text import MIMEText sender = to = '[email protected]' server = 'smtp.DEF.com' msg = MIMEText('€10') msg['Subject'] = 'Hello' msg['From'] = sender msg['To'] = to s = smtplib.SMTP(server) s.sendmail(sender, [to], msg.as_string()) s.quit() 

I tried an example from the docs, which also failed. http://docs.python.org/3.1/library/email-examples.html, the Send the contents of a directory as a MIME message example

Any suggestions?

1
  • To clarify, in 2.5.4, it sends without an error message, but replaces '€' with '?'. Commented Sep 15, 2009 at 21:17

2 Answers 2

15

The key is in the docs:

class email.mime.text.MIMEText(_text, _subtype='plain', _charset='us-ascii') 

A subclass of MIMENonMultipart, the MIMEText class is used to create MIME objects of major type text. _text is the string for the payload. _subtype is the minor type and defaults to plain. _charset is the character set of the text and is passed as a parameter to the MIMENonMultipart constructor; it defaults to us-ascii. No guessing or encoding is performed on the text data.

So what you need is clearly, not msg = MIMEText('€10'), but rather:

msg = MIMEText('€10'.encode('utf-8'), _charset='utf-8') 

While not all that clearly documented, sendmail needs a byte-string, not a Unicode one (that's what the SMTP protocol specifies); look to what msg.as_string() looks like for each of the two ways of building it -- given the "no guessing or encoding", your way still has that euro character in there (and no way for sendmail to turn it into a bytestring), mine doesn't (and utf-8 is clearly specified throughout).

Sign up to request clarification or add additional context in comments.

3 Comments

That sends without generating an error message. I sent to Thunderbird and gmail. Thunderbird only showed 10 as the text of the message. Gmail showed the full €10. Python sends as 'content-transfer-encoding: base64' while Thunderbird sends €10 as 'content-transfer-encoding: 8-bit' and gmail sends as 'multipart/alternative; boundary=...' Any suggestions for generating a message that Thunderbird can interpret?
I'm no Thunderbird expert, but try other encodings such as iso-8859-15. Though any program these days that can't do utf-8 properly IS well worth throwing into the dustbin of history, mind!-)
The problem does not seem to be iso-8859-15 or utf-8, it seems to be content-transfer-encoding. Everything else I checked uses 8-bit, while python uses base64. Coercing the header to 8-bit doesn't help. Using quopri.encodestring() might work to get 8-bit encoding, but I haven't been able to figure out how to make it work.
2

_charset parameter of MIMEText defaults to us-ascii according to the docs. Since is not from us-ascii set it isn't working.

example in the docs that you've tried clearly states:

For this example, assume that the text file contains only ASCII characters.

You could use .get_charset method on your message to investigate the charset, there is incidentally .set_charset as well.

2 Comments

As you say, the charset is us-ascii, which does not include €. Using set_charset on the msg does not fix the problem. The problem (I should have been more precise) is on the sendmail line - UnicodeEncodeError: 'ascii' codec can't encode character '\x80' in position 161: ordinal not in range(128) I read this to mean that I have to encode the text so that everything is in range(128), but I haven't been able to figure out how to.
I was looking at the 3rd example on the examples page, sending an entire directory. I tried sending a directory consisting of a single zip file using the example. This failed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.