1

I have a ruby file with only these two lines:

# encoding: utf-8 puts "—" 

When I run it with ruby test_enc.rb it fails with:

test_enc.rb:2: invalid multibyte char (UTF-8) test_enc.rb:2: unterminated string meets end of file 

I don't know how to properly specify the character code of (emdash), but vim tells me it is 151, Hex 97, Octal 227. It fails the same way with other characters like ã as well, so I doubt it is related specifically to that character. I am running on Windows XP and the version of ruby I'm using is:

ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32] 

I feel like there is something very obvious I am missing here. Any ideas?

EDIT: Learned a valuable lesson about assumptions today - specifically assuming your editor IS using UTF-8 without actually checking it. Oops!

Thanks for the quick and accurate replies all!

EDIT AGAIN: The 'setting up vim properly for utf-8' grew too big and wasn't really relevant to this question, so it is now a separate question.

3
  • Are you sure it's not coding: utf-8? (rather than encoding). Commented Mar 29, 2011 at 16:52
  • Both do the same thing. You can actually put asdfgibberishcoding: utf-8 and it works just the same. Commented Mar 29, 2011 at 16:54
  • What does 'puts ENCODING' say? (add one 2 _ each part of ENCODING). Commented Mar 29, 2011 at 16:57

2 Answers 2

5

Given that Ruby is explicitly calling your attention to UTF-8, I strongly suspect that you haven't actually written out a UTF-8 file to start with. Make sure that Vim (or whatever text editor you're using to create the file) is really set to write out UTF-8.

Note that in UTF-8, any non-ASCII character will be represented by multiple bytes, not a single byte as you've described from the Vim diagnostics. I'd recommend using a binary file editor (or dump, or whatever) to really show what's in the text file though. Something that doesn't already have some preconceived notion of the encoding - something that isn't even trying to think of it as a text file.

Notepad lets you write out a file in UTF-8, so you might want to try that just to see what happens. (I don't have Ruby installed myself, otherwise I'd try it for you.)

Sign up to request clarification or add additional context in comments.

6 Comments

I just had the same thought - what is vim actually saving the file as? When I checked I saw its encoding was set to latin1. I was wondering why those numbers didn't match up to what I saw in here.
Setting the encoding to ISO-8859-1 (to match what my editor is actually using) appears to fix it. I still see ù when I print it out, but I'm pretty sure that's just a windows terminal issue.
@Nick: Rather than change the encoding in the file, why not change what your editor uses? Then you won't be limited to just Latin-1, which is a pretty small range of characters. I'm sure Vim must support other encodings...
I think you're right, and that will help me long term as well. I'm remembering now a previous time I had problems with encoding and I'm pretty sure the cause was this same darn thing. For any other vim users that see this, put set encoding=utf-8 in your .vimrc and you'll be set.
Also, thanks very much for your help, and wow you answer questions quickly.
|
3

Your file is in latin1. Ruby is right.

emdash would be encoded on two bytes not one in UTF-8.

3 Comments

Thanks, your comment is spot on. :)
Three, actually: 0xE2 0x80 0x94.
@Jörg : that's what I guess for not checking ;)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.