1

Problem

I want a Shortcut to convert text with diacritical characters like café to ASCII text like cafe, using sed. Here’s a simplified example:

sed -e 'y/é/e/' 

This runs fine in Terminal. But when I paste it into the Run Shell Script field in Shortcuts and run, it returns an error “sed: 1: "y/é/e/ ": transform strings are not the same length.”

error dialog

I think the entered script text is fine, because it will continue to work in Terminal if I copy/paste it back from Shortcuts.

I am guessing that Shortcuts decomposes the “é” into two UTF-8 characters “e´” when it sends it to the shell, so the single character counts as two.

Question

How can I enter UTF-8 text with diacritical characters so sed accepts them as single characters?

1 Answer 1

0

Compare 5.9 Multibyte characters and Locale Considerations from the GNU sed documentation:

GNU sed processes valid multibyte characters in multibyte locales (e.g. UTF-8).

Apparently, the LANG environment variable is set to a UTF-8 locale in the Terminal, but not when executing Scripts in the Shortcuts app. The solution is to explicitly set LANG in the script:

LANG=en_US.UTF-8 sed -e 'y/é/e/' 

or

export LANG=en_US.UTF-8 sed -e 'y/é/e/' 
1
  • Thank you! That is working for me now. Commented Nov 20, 2024 at 22:06

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.