56

So matz made the decision to keep upcase and downcase limited to /[A-Z]/i in ruby 1.9.1.

ActiveSupport::Multibyte has long had great i18n case jiggering in ruby 1.8.x via String#mb_chars.

However, when tried under ruby 1.9.1, it doesn't seem to work. Here's a simple test script I wrote, along with the output I'm getting:

$ cat test.rb # encoding: UTF-8 puts("@ #{RUBY_VERSION} " + (__ENCODING__ rescue $KCODE).to_s) sd, su = "Iñtërnâtiônàlizætiøn", "IÑTËRNÂTIÔNÀLIZÆTIØN" def ps(u, d, k); puts "%-30s: %24s / %-24s" % [k, u, d] end ps sd.upcase, su.downcase, "Plain ruby" require 'rubygems'; require 'active_support' ps sd.upcase, su.downcase, "With active_support" ps sd.mb_chars.upcase.to_s, su.mb_chars.downcase.to_s, "With active_support mb_chars" $ ruby -KU test.rb @ 1.8.7 UTF8 Plain ruby : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn With active_support : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn With active_support mb_chars : IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn $ ruby1.9 test.rb @ 1.9.1 UTF-8 Plain ruby : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn With active_support : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn With active_support mb_chars : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn 

So, how do I get internationalized upcase and downcase with ruby 1.9.1?

update

I should add that I also tested with ActiveSupport from the current master, 2-3-* and 3-0-unstable rails branches at GitHub. Same results.

4 Answers 4

58

for anybody coming from Google by ruby upcase utf8:

> "your problem chars here çöğıü Iñtërnâtiônàlizætiøn".mb_chars.upcase.to_s => "YOUR PROBLEM CHARS HERE ÇÖĞIÜ IÑTËRNÂTIÔNÀLIZÆTIØN" 

solution is to use mb_chars.

Documentation:

Sign up to request clarification or add additional context in comments.

2 Comments

No, that is not the solution since case conversion is locale dependent. For example, 'ß' would upcase to 'SS' in a German locale but not a US locale.
Just the link to the ActiveSupport::Multibyte::Chars class: api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html
39

Case conversion is locale dependent and doesn't always round-trip, which is why Ruby 1.9 doesn't cover it (see here and here)

The unicode-util gem should address your needs.

1 Comment

Cool, they even use the infamous capital turkish I with a dot in the README, which exemplifies the locale dependency you mention.
14

Case conversion is complicated and locale-dependent. Fortunately, Martin Dürst added full Unicode case mapping in Ruby 2.4:

puts RUBY_DESCRIPTION sd, su = "Iñtërnâtiônàlizætiøn", "IÑTËRNÂTIÔNÀLIZÆTIØN" def ps(u, d, k); puts "%-30s: %24s / %-24s" % [k, u, d] end ps sd.upcase, su.downcase, "Ruby 2.4 (default)" ps sd.upcase(:ascii), su.downcase(:ascii), "Ruby 2.4 (ascii)" ps sd.upcase(:turkic), su.downcase(:turkic), "Ruby 2.4 (turkic)" ps sd.upcase(:lithuanian), su.downcase(:lithuanian), "Ruby 2.4 (lithuanian)" ps "-", su.downcase(:fold), "Ruby 2.4 (fold)" 

Output:

ruby 2.4.0dev (2016-06-24 trunk 55499) [x86_64-linux] Ruby 2.4 (default) : IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn Ruby 2.4 (ascii) : IñTëRNâTIôNàLIZæTIøN / iÑtËrnÂtiÔnÀlizÆtiØn Ruby 2.4 (turkic) : IÑTËRNÂTİÔNÀLİZÆTİØN / ıñtërnâtıônàlızætıøn Ruby 2.4 (lithuanian) : IÑTËRNÂTIÔNÀLIZÆTIØN / iñtërnâtiônàlizætiøn Ruby 2.4 (fold) : - / iñtërnâtiônàlizætiøn 

Comments

0

If you just wanna display the strings in uppercase in HTML, CSS text-transform could be a better solution:

text-transform: uppercase 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.