2

Go has the unicode package, containing useful functions such as IsGraphic or IsPrint. One function that is missing though is IsAssigned. Of course I could write my own function by using the other functions. But I would rather expect the standard library to provide this function. In Java, writing this function is easy:

boolean isAssigned(int codePoint) { return Character.getType(codePoint) != Character.UNASSIGNED; } 

In Go there is no function unicode.Type(rune) or unicode.IsAssigned(rune). The closest I could find is this:

func IsAssigned(r rune) bool { return unicode.IsControl(r) || unicode.IsGraphic(r) || unicode.IsSymbol(r) } 

But that code thinks that U+00AD (soft-hyphen) is unassigned, which is wrong.

How can I get correct information about unassigned code points?

0

1 Answer 1

1

I think you can verify a code point is assigned or not using unicode.Is and unicode.Categories (though it is not efficient), i.e.

func IsAssigned(r rune) bool { for _, v := range unicode.Categories { if unicode.Is(v, r) { return true } } return false } 

Working example is in The Go Playground.

Sign up to request clarification or add additional context in comments.

2 Comments

are you sure that all assigned characters belong to some category? (it is kind of obvious, but maybe you have a quick proof)
actually code below shows 260+K runes belong to categories, while there is only ~130K codepoints assigned. Do I do something wrong? play.golang.org/p/gIcNVHa5PG6

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.