Serialize a string in binary with C# and deserialize it with C++

Question

I'm struggling to find an effective way to serialize a string that could contain both unicode and non-unicode characters into a binary array which I then serialize to a file that I have to deserialize using C++.

I have already implemented a serializer/deserializer in C++ which I use to do most of my serialization which can handle both unicode and non-unicode characters (basically I convert non-unicode characters into their unicode equivalent and serialize everything as a unicode string, not the most effective way since every string now has 2 bytes per character but works).

What I'm trying to achieve is to transform an arbitrary string into a 2 byte per character string that I can then deserialize from C++.

What would be the most effective effective way to achieve what I'm looking for?

Also, any suggestion regarding the way I'm serializing strings is well accepted of course.

Sorry @Evk, your comment is spot on (can't believe I tried every encoding a missed Unicode...) but I can't accept a comment as answer :(. If you add it as an answer I'll gladly accept that, but for now I'll accept kinimod answer — zeb
– zeb, Commented Apr 27, 2018 at 6:51
That's not a problem that you can't accept, main thing you got the answer. Note that you might consider using UTF-8 instead (Encoding.Unicode in .net is UTF-16), because UTF-8 encodes ascii range as one byte, and that range is quite common. For that you need to adjust C++ part of course. — Evk
– Evk, Commented Apr 27, 2018 at 6:52
Yes, in fact I was using UTF-8 and I was getting only 1 byte per character (testing with ASCII chars). I wonder how I should adapt my code on the C++ side as it would save half of the space on most of the strings — zeb
– zeb, Commented Apr 27, 2018 at 6:55
I bet C++ has it's own standard ways to work with encodings, so "I have already implemented a serializer/deserializer in C++" should not be necessary — Evk
– Evk, Commented Apr 27, 2018 at 6:58

Kinimod · Accepted Answer · 2018-04-27 06:39:09Z

Encoding.Unicode.GetBytes("my string") encodes the string as UTF-16, which has a size of 2 Bytes for each character. So if you are searching still an alternative consider the encoding.

Collectives™ on Stack Overflow

Serialize a string in binary with C# and deserialize it with C++

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related