1

is there a way i can convert a .txt file into unicode by using c#?

5 Answers 5

6

Only if you know the original encoding used to produce the .txt file (and that's not a restriction of C# or the .NET language either, it's a general problem).

Read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) to learn why "plain text" is meaningless if you don't know the encoding.

Sign up to request clarification or add additional context in comments.

1 Comment

Joachim: Fantastic article from Joel, thanks for the link. I now have it in my armory of links to redistribute liberally to those what seem to need 'em... Cheers. - T.J.
5

Provided you're only using ASCII characters in your text file, they're already Unicode, encoded as UTF-8.

In you want a different encoding of the characters (UTF16/UCS2, etc), any language that supports Unicode should be able to read in one encoding and write out another.

The System.Text.Encoding stuff will do it as per the following example - it outputs UTF16 as both UTF8 and ASCII and then back again (code gratuitously stolen from here).

using System; using System.IO; using System.Text; class Test { public static void Main() { using (StreamWriter output = new StreamWriter("practice.txt")) { string srcString = "Area = \u03A0r^2"; // PI.R.R // Convert the UTF-16 encoded source string to UTF-8 and ASCII. byte[] utf8String = Encoding.UTF8.GetBytes(srcString); byte[] asciiString = Encoding.ASCII.GetBytes(srcString); // Write the UTF-8 and ASCII encoded byte arrays. output.WriteLine("UTF-8 Bytes: {0}", BitConverter.ToString(utf8String)); output.WriteLine("ASCII Bytes: {0}", BitConverter.ToString(asciiString)); // Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded // string and write. output.WriteLine("UTF-8 Text : {0}", Encoding.UTF8.GetString(utf8String)); output.WriteLine("ASCII Text : {0}", Encoding.ASCII.GetString(asciiString)); Console.WriteLine(Encoding.UTF8.GetString(utf8String)); Console.WriteLine(Encoding.ASCII.GetString(asciiString)); } } } 

3 Comments

thank u so much for ur help.i figured out i m quite illetrate in unicode and encoding stuff. cheers :)
@intrinsic, the vast majority of people are illiterate with regards to Unicode, especially those that think they're not :-) I only discovered how really complex it is in the last couple of years (we now ship software which is localized to twenty-plus different major locales and even more minor ones).
@Pax,amazing :) i wanted the .txt to unicode thing for my semester project(making a lexical analyizer for c#)actually we are trying many things to get the job done.thanx again
2

Here is an example:

using System; using System.Collections.Generic; using System.Text; using System.IO; namespace utf16 { class Program { static void Main(string[] args) { using (StreamReader sr = new StreamReader(args[0], Encoding.UTF8)) using (StreamWriter sw = new StreamWriter(args[1], false, Encoding.Unicode)) { string line; while ((line = sr.ReadLine()) != null) { sw.WriteLine(line); } } } } } 

Comments

1

There is a nice page on MSDN about this, including a whole example:

 // Specify the code page to correctly interpret byte values Encoding encoding = Encoding.GetEncoding(737); //(DOS) Greek code page byte[] codePageValues = System.IO.File.ReadAllBytes(@"greek.txt"); // Same content is now encoded as UTF-16 string unicodeValues = encoding.GetString(codePageValues); 

Comments

1

If you do really need to change the encoding (see Pax's answer about UTF-8 being valid Unicode), then yes, you can do that quite easily. Check out the System.Text.Encoding class.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.