2

I'm struggling to find an effective way to serialize a string that could contain both unicode and non-unicode characters into a binary array which I then serialize to a file that I have to deserialize using C++.

I have already implemented a serializer/deserializer in C++ which I use to do most of my serialization which can handle both unicode and non-unicode characters (basically I convert non-unicode characters into their unicode equivalent and serialize everything as a unicode string, not the most effective way since every string now has 2 bytes per character but works).

What I'm trying to achieve is to transform an arbitrary string into a 2 byte per character string that I can then deserialize from C++.

What would be the most effective effective way to achieve what I'm looking for?

Also, any suggestion regarding the way I'm serializing strings is well accepted of course.

6
  • 1
    Encoding.Unicode.GetBytes("my string") Commented Apr 27, 2018 at 6:21
  • Sorry @Evk, your comment is spot on (can't believe I tried every encoding a missed Unicode...) but I can't accept a comment as answer :(. If you add it as an answer I'll gladly accept that, but for now I'll accept kinimod answer Commented Apr 27, 2018 at 6:51
  • That's not a problem that you can't accept, main thing you got the answer. Note that you might consider using UTF-8 instead (Encoding.Unicode in .net is UTF-16), because UTF-8 encodes ascii range as one byte, and that range is quite common. For that you need to adjust C++ part of course. Commented Apr 27, 2018 at 6:52
  • Yes, in fact I was using UTF-8 and I was getting only 1 byte per character (testing with ASCII chars). I wonder how I should adapt my code on the C++ side as it would save half of the space on most of the strings Commented Apr 27, 2018 at 6:55
  • I bet C++ has it's own standard ways to work with encodings, so "I have already implemented a serializer/deserializer in C++" should not be necessary Commented Apr 27, 2018 at 6:58

1 Answer 1

0

Encoding.Unicode.GetBytes("my string") encodes the string as UTF-16, which has a size of 2 Bytes for each character. So if you are searching still an alternative consider the encoding.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.