0

We had a bug in a library, that was caused by one of the inputs being Unicode.

It was fixed by adding use utf8; to the script using that library.

However, adding use utf8; to the library itself (so ALL scripts using that library would be fixed) had no effect.

Why? Can this be addressed?

3
  • It's your job to tell to perl if it's unicode or not Commented Apr 4, 2019 at 20:34
  • 1
    The short answer is that use utf-8 doesn't do what you think it does. There are a variety of answers on the site for how to handle utf8 properly in perl. Commented Apr 4, 2019 at 20:35
  • 1
    stackoverflow.com/questions/6162484/… Commented Apr 4, 2019 at 20:35

2 Answers 2

3

From the documentation:

The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope.

In other words, this pragma applies to the current package only. You need to put it in every package whose source code might contain Unicode characters. If your input comes from somewhere else, then you need to ensure that it is properly decoded: the pragma will have no effect on that.

PS: I understand that you meant use utf8, not use utf-8 (the latter is not a valid pragma).

Sign up to request clarification or add additional context in comments.

3 Comments

More specifically, it only applies to source code, so it needs to go in the file with the UTF-8 source code. If your input comes from somewhere else, like a file, STDIN, database, then you will need to ensure it is properly decoded in the manner appropriate to that medium; use utf8 will have no effect on that.
@Grinnz: yes it is sure worth mentioning this, I updated my answer accordingly. Thanks!
Thank you. Ironically, my main problem was that our minimal test was actually testing wrong thing (by hard coding utf test string as part of script), this answer explained what the issue was AND that our original solution was indeed solving the wrong problem (we needed use open and perl 5.12 upgrade instead)
3

use utf8; tells Perl that the current file is encoded using UTF-8.

You have a script that's encoded using UTF-8, so you had to add use utf8; to the script. (Without it, you might think you have my $x = "é";, but you're telling Perl my $x = "é";.)

Adding it to a module makes no sense if it's the script that's encoded using UTF-8. The directive must be added to each file (script or module) that's encoded using UTF-8. (If you pass the bad $x to a module, and the module produces junk because of that, it's still the script that needs to be fixed.)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.