Questions tagged [utf-8]
UTF-8 (Unicode Transformation Format, 8 bits) is a character encoding that describes each Unicode code point using a byte sequence of one to six bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.
41 questions
6 votes
1 answer
799 views
Transcoding UTF-8 to UTF-16-LE in VBA
VBA is a language that's lacking a lot of basic functionality. (Pun intended) Most libraries, if they exist in the first place, are OS-specific, and even some of the inbuilt functions don't work on ...
3 votes
1 answer
368 views
Determining if a file is UTF-8 text by looking at its first n bytes
I'm trying to find out whether a particular file is UTF-8 encoded readable text, by which I mean printable symbols, whitespaces, \n, ...
2 votes
2 answers
401 views
Wielding .NET masterfully to encode non-alphanumeric characters into utf-8 hex representation
I have these two methods that work, but I also hate because they almost certainly can be improved. I'm hoping to gain some guidance from others who are more knowledgable of .NET's offering for ...
0 votes
1 answer
769 views
I wrote a header file to write German umlauts in a textfile properly
This function is about the fact that a std::wstring was used in another cpp file in order to be able to read strings with German umlauts from the console. Since it is difficult to get wstrings into a ...
7 votes
3 answers
3k views
C++ UTF-8 decoder
While writing simple text rendering I found a lack of utf-8 decoders. Most decoders I found required allocating enough space for decoded string. In worse case that would mean that the decoded string ...
2 votes
1 answer
126 views
Validator and Sanitizer for HTML 5 attribute regex according to current HTML living standard
According to https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 an HTML 5 attribute name is defined like this: Attribute names must consist of one or more characters other than controls, ...
3 votes
2 answers
672 views
Find the UTF-8 Length of a given codepoint
A codepoint in UTF-8 can be represented by one to four bytes. Given a codepoint, I need to determine the length (in bytes) of the codepoint if it were represented in UTF-8. For this, I've written the ...
2 votes
1 answer
724 views
A C++ function to read Code Points from an UTF-8 Stream
I've written a function that reads and returns one UTF-8 code point from an istream. I am wondering if the code is efficient or if there are some obvious problems with the implementation. ...
2 votes
1 answer
2k views
The conversion from UTF-16 to UTF-8
I have created a function that converts from UTF-16 to UTF-8. This function converts from UTF-16 to codepoint firstly, then from codepoint to UTF-8. ...
22 votes
6 answers
4k views
Transcode UCS-4BE to UTF-8
Below is my entire program. You can read what it does thanks to the comments and specifications in particular. My question is: can it be improved? Would it be possible, for example, to avoid writing a ...
3 votes
1 answer
229 views
Save on typing while using UTF8 encoding
Typing in something like Encoding.UTF8.GetString(...) and Encoding.UTF8.GetBytes(...) everywhere in your code could be ...
8 votes
1 answer
185 views
Checking whether a string fragment could be part of a longer UTF-8 string
Although UTF-8 validation is a common task, I'm trying to solve a slightly different task; given a string of bytes, work out whether it could potentially be a fragment of a valid UTF-8 string. That's ...
8 votes
1 answer
207 views
myUTF-8 small lib (validate UTF-8, guess language, count chars)
I'm new to C language and never got my self into the details of UTF-8, and after reading some articles about it, I wanted to try and play with UTF-8 with C language for both fun and practicing ...
5 votes
1 answer
13k views
Convert to UTF-8 all files in a directory
1. Summary I can't find, how I can to refactor multiple with open for one file. 2. Expected behavior of program Program detect encoding for each file in the ...
9 votes
5 answers
8k views
Convert UTF8 string to UTF32 string in C
I'm doing some recreational programming in C (after spending some time in C++, but professionally using only PHP/JavaScript). I wrote a UTF8 to UTF32 converter and just wanted to know if I made some ...