Skip to main content
deleted 124 characters in body
Source Link
F'x
  • 10.9k
  • 3
  • 53
  • 93

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData = (FromDigits[#, 16] & /@ # &) /@ Select[unicodeData, (Length[#] > 12) && (StringLength[#[[13]]] > 0) &][[;; , {1, 13}]]; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := Module[{r}, r = Select[upperCaseData, #[[1]] == i &]; Return[If[Length[r] > 0, FromCharacterCode[r[[1, 2]]], FromCharacterCode[i]]]; ] upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which seems to work (although more testing would be required; please post examples or counter-examples in the comments if you find any)works:

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒÆIJĶNJĐӼԾŸ" 

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData = (FromDigits[#, 16] & /@ # &) /@ Select[unicodeData, (Length[#] > 12) && (StringLength[#[[13]]] > 0) &][[;; , {1, 13}]]; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := Module[{r}, r = Select[upperCaseData, #[[1]] == i &]; Return[If[Length[r] > 0, FromCharacterCode[r[[1, 2]]], FromCharacterCode[i]]]; ] upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which seems to work (although more testing would be required; please post examples or counter-examples in the comments if you find any):

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒÆIJĶNJĐӼԾŸ" 

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData = (FromDigits[#, 16] & /@ # &) /@ Select[unicodeData, (Length[#] > 12) && (StringLength[#[[13]]] > 0) &][[;; , {1, 13}]]; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := Module[{r}, r = Select[upperCaseData, #[[1]] == i &]; Return[If[Length[r] > 0, FromCharacterCode[r[[1, 2]]], FromCharacterCode[i]]]; ] upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which works:

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒÆIJĶNJĐӼԾŸ" 
Fixed code
Source Link
F'x
  • 10.9k
  • 3
  • 53
  • 93

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData =  If[Length[#] > 12, {FromDigits[#[[1]](FromDigits[#, 16], FromDigits[#[[13]],& 16]}, /@ # &) /@  {FromDigits[#[[1]]Select[unicodeData, 16],(Length[#] FromDigits[#[[1]],> 16]}]12) &&& /@(StringLength[#[[13]]]   > 0) &][[;; unicodeData;, {1, 13}]]; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := Module[{r}, FromCharacterCode[Select[upperCaseDatar = Select[upperCaseData, #[[1]] == i &][[1&]; Return[If[Length[r] > 0, 2]]];FromCharacterCode[r[[1, 2]]], FromCharacterCode[i]]]; upperCase[s_] := ] upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which seems to work (although more testing would be required; please post examples or counter-examples in the comments if you find any):

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒ\.00IJĶNJĐӼԾŸ""FOÉÀÇŸŒÆIJĶNJĐӼԾŸ" 

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData =  If[Length[#] > 12, {FromDigits[#[[1]], 16], FromDigits[#[[13]], 16]},  {FromDigits[#[[1]], 16], FromDigits[#[[1]], 16]}] & /@    unicodeData; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := FromCharacterCode[Select[upperCaseData, #[[1]] == i &][[1, 2]]]; upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which seems to work (although more testing would be required; please post examples or counter-examples in the comments if you find any):

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒ\.00IJĶNJĐӼԾŸ" 

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData = (FromDigits[#, 16] & /@ # &) /@  Select[unicodeData, (Length[#] > 12) && (StringLength[#[[13]]] > 0) &][[;; , {1, 13}]]; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := Module[{r}, r = Select[upperCaseData, #[[1]] == i &]; Return[If[Length[r] > 0, FromCharacterCode[r[[1, 2]]], FromCharacterCode[i]]]; ] upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which seems to work (although more testing would be required; please post examples or counter-examples in the comments if you find any):

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒÆIJĶNJĐӼԾŸ" 
Source Link
F'x
  • 10.9k
  • 3
  • 53
  • 93

I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!

unicodeData = StringSplit[#, ";"] & /@ StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"]; upperCaseData = If[Length[#] > 12, {FromDigits[#[[1]], 16], FromDigits[#[[13]], 16]}, {FromDigits[#[[1]], 16], FromDigits[#[[1]], 16]}] & /@ unicodeData; unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"]; upperCaseChar[i_] := FromCharacterCode[Select[upperCaseData, #[[1]] == i &][[1, 2]]]; upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]]; 

Which seems to work (although more testing would be required; please post examples or counter-examples in the comments if you find any):

In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"] Out[127]= "FOÉÀÇŸŒ\.00IJĶNJĐӼԾŸ"