Skip to main content
Better formatting.
Source Link
syb0rg
  • 21.9k
  • 10
  • 113
  • 193

Function to convert ISO-8859-1 to UTF-8. What to change?

I wrote this function last year to convert between the two encodings and just found it.

  It takes a text buffer and its size, then converts to UTF-8 if there's enough space.

What should be changed to improve quality?

int iso88951_to_utf8(unsigned char *content, size_t max_size) { unsigned char *copy; size_t conversion_count; //number of chars to convert / bytes to add copy = content; conversion_count = 0; //first run to see if there's enough space for the new bytes while(*content) { if(*content >= 0x80) { ++conversion_count; } ++content; } if(content - copy + conversion_count >= max_size) { return ERROR; } while(content >= copy && conversion_count) { //repositioning current characters to make room for new bytes if(*content < 0x80) { *(content + conversion_count) = *content; } else { *(content + conversion_count) = 0x80 | (*content & 0x3f); //last byte *(content + --conversion_count) = 0xc0 | *content >> 6; //first byte } --content; } return SUCCESS; } 

Function to convert ISO-8859-1 to UTF-8. What to change?

I wrote this function last year to convert between the two encodings and just found it.

  It takes a text buffer and its size, then converts to UTF-8 if there's enough space.

What should be changed to improve quality?

int iso88951_to_utf8(unsigned char *content, size_t max_size) { unsigned char *copy; size_t conversion_count; //number of chars to convert / bytes to add copy = content; conversion_count = 0; //first run to see if there's enough space for the new bytes while(*content) { if(*content >= 0x80) { ++conversion_count; } ++content; } if(content - copy + conversion_count >= max_size) { return ERROR; } while(content >= copy && conversion_count) { //repositioning current characters to make room for new bytes if(*content < 0x80) { *(content + conversion_count) = *content; } else { *(content + conversion_count) = 0x80 | (*content & 0x3f); //last byte *(content + --conversion_count) = 0xc0 | *content >> 6; //first byte } --content; } return SUCCESS; } 

Function to convert ISO-8859-1 to UTF-8

I wrote this function last year to convert between the two encodings and just found it. It takes a text buffer and its size, then converts to UTF-8 if there's enough space.

What should be changed to improve quality?

int iso88951_to_utf8(unsigned char *content, size_t max_size) { unsigned char *copy; size_t conversion_count; //number of chars to convert / bytes to add copy = content; conversion_count = 0; //first run to see if there's enough space for the new bytes while(*content) { if(*content >= 0x80) { ++conversion_count; } ++content; } if(content - copy + conversion_count >= max_size) { return ERROR; } while(content >= copy && conversion_count) { //repositioning current characters to make room for new bytes if(*content < 0x80) { *(content + conversion_count) = *content; } else { *(content + conversion_count) = 0x80 | (*content & 0x3f); //last byte *(content + --conversion_count) = 0xc0 | *content >> 6; //first byte } --content; } return SUCCESS; } 
fixing title so that searches in the future will find this... not fixing method name
Link
rolfl
  • 98.1k
  • 17
  • 220
  • 419

Function to convert ISO-88958859-1 to UTF-8. What to change?

edited tags
Link
200_success
  • 145.7k
  • 22
  • 191
  • 481
Source Link
2013Asker
  • 2k
  • 3
  • 18
  • 23
Loading