7

I'm looking for easiest way to determine if a character in Rust is between two Unicode values.

For example, I want to know if a character s is between [#x1-#x8] or [#x10FFFE-#x10FFFF]. Is there a function that does this already?

2 Answers 2

9

The simplest way for me to match a character was this

fn match_char(data: &char) -> bool { match *data { '\x01'...'\x08' | '\u{10FFFE}'...'\u{10FFFF}' => true, _ => false, } } 

Pattern matching a character was the easiest route for me, compared to a bunch of if statements. It might not be the most performant solution, but it served me very well.

Sign up to request clarification or add additional context in comments.

3 Comments

Performance should be the same with this as with my answer. Personally, I prefer my technique to this one in this case (if there were a non-boolean output or more possibilities I would use match), but either will do as well.
It's readability for me. I have a lot of those conditions (opinions differ - shocking), I'd either go the route of match patterns for readability or go for char-'\x01'<=7 for speed see aosabook-xml at speed of light
@DanielFath Thanks for the awesome snippet! As of 2021, the ... range patterns are deprecated, consider using ..= for an inclusive range (E0783).
8

The simplest way, assuming that they are not Unicode categories (in which case you should be using std::unicode) is to use the regular comparison operators:

(s >= '\x01' && s <= '\x08') || s == '\U0010FFFE' || s == '\U0010FFFF' 

(In case you weren't aware of the literal forms of these things, one gets 8-bit hexadecimal literals \xXX, 16-bit hexadecimal literals \uXXXX, and 32-bit hexadecimal literals \UXXXXXXXX. Matter of fact, casts would work fine too, e.g. 0x10FFFE as char, and would be just as efficient; just less easily readable.)

1 Comment

Note that the integer-to-char casts are unsafe and may disappear: e.g. 0xFFFF_FFFF as char is allowed, even though it's not a valid codepoint. (Also std::unicode is currently private; (most of) its functionality is accessed via std::char and std:::str.)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.