How can I convert string encoded with Windows Codepage 1251 to a Unicode string

Question

The cyrllic string my app receives uses(I believe) the table below: enter image description here

said I believe, because all the chars I tested fit this table.

Question: How do I convert such thing to a string, which is unicode by default in my delphi? Or better yet: Is there a ready-to-use converter in delphi or should I write one?

You have to tell us what version of Delphi you have and what data structure holds the input string — David Heffernan
– David Heffernan, Commented Aug 28, 2011 at 17:54

Rudy Velthuis · Accepted Answer · 2016-12-04 19:57:29Z

7

If you are using Delphi 2009 or later, this is done automatically:

type CyrillicString = type AnsiString(1251); procedure TForm1.FormCreate(Sender: TObject); var UnicodeStr: string; CyrillicStr: CyrillicString; begin UnicodeStr := 'This is a test.'; // Unicode string CyrillicStr := UnicodeStr; // ...converted to 1251 CyrillicStr := 'This is a test.'; // Cryllic string UnicodeStr := CyrillicStr; // ...converted to Unicode end;

edited Dec 4, 2016 at 19:57

Rudy Velthuis

28.9k5 gold badges50 silver badges97 bronze badges

answered Aug 28, 2011 at 18:02

Andreas Rejbrand

110k8 gold badges298 silver badges404 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

David Heffernan Over a year ago

I got the impression that the Cyrillic strings were not available as literals.

Andreas Rejbrand Over a year ago

@David: No, I think they are not, but then again, I have never said they are. But you can always 'encode' your 1251 string manually:

var UnicodeStr: string; CryllicStr: CryllicString; begin SetLength(CryllicStr, 1); CryllicStr[1] := char(255); UnicodeStr := CryllicStr; ShowMessage(UnicodeStr);

Or, even stranger-looking: CryllicStr := 'ÿ';. Both approaches will display a я character.

Andreas Rejbrand Over a year ago

@Daivd: Anyhow, I got the impression that the OP somehow already got a 1251-encoded string...

David Heffernan Over a year ago

Exactly, but probably not in an AnsiString(1251) variable.

Torbins Over a year ago

"I got the impression that the Cyrillic strings were not available as literals." That depends on the codepage of your source files.

David Heffernan · Accepted Answer · 2011-08-28 18:55:00Z

First of all I recommend you read Marco Cantù's whitepaper on Unicode in Delphi. I am also assuming from your question (and previous questions), that you are using a Unicode version of Delphi, i.e. D2009 or later.

You can first of all define an AnsiString with codepage 1251 to match your input data.

type CyrillicString = type Ansistring(1251);

This is an important step. It says that any data contained inside a variable of this type is to be interpreted as having been encoded using the 1251 codepage. This allows Delphi to perform correct conversions to other string types, as we will see later.

Next copy your input data into a string of this variable.

function GetCyrillicString(const Input: array of Byte): CyrillicString; begin SetLength(Result, Length(Input)); if Length(Result)>0 then Move(Input[0], Result[1], Length(Input)); end;

Of course, there may be other, more convenient ways to get the data in. Perhaps it comes from a stream. Whatever the case, make sure you do it with something equivalent to a memory copy so that you don't invoke code page conversions and thus lose the 1251 encoding.

Finally you can simply assign a CyrillicString to a plain Unicode string variable and the Delphi runtime performs the necessary conversion automatically.

function ConvertCyrillicToUnicode(const Input: array of Byte): string; begin Result := GetCyrillicString(Input); end;

The runtime is able to perform this conversion because you specified the codepage when defining CyrillicString and because string maps to UnicodeString which is encoded with UTF-16.

user160694 · Accepted Answer · 2011-08-29 07:42:00Z

Windows API MultiByteToWideChar() and WideCharToMultiByte() can be used to convert to and from any supported code page in Windows. Of course if you use Delphi >= 2009 it is easier to use the native unicode support.

Collectives™ on Stack Overflow

How can I convert string encoded with Windows Codepage 1251 to a Unicode string

3 Answers 3

5 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Linked

Related