encoding | Exercises in .NET with Andras Nemes

Getting the byte array of a string depending on Encoding in C# .NET

December 27, 2016 Leave a comment

You can take any string in C# and view its byte array data depending on the Encoding type. You can get hold of the encoding type using the Encoding.GetEncoding method. Some frequently used code pages have their short-cuts:

Encoding.ASCII
Encoding.BigEndianUnicode
Encoding.Unicode – this is UTF16
Encoding.UTF7
Encoding.UTF32
Encoding.UTF8

Reading a text file using a specific encoding in C# .NET

December 30, 2014 2 Comments

In this post we saw how to save a text file specifying an encoding in the StreamWriter constructor. You can indicate the encoding type when you read a file with StreamWriter’s sibling StreamReader. Normally you don’t need to worry about specifying the code page when reading files. .NET will automatically understand most encoding types when reading files.

Here’s an example how you can read a file with a specific encoding type:

string filename = string.Concat(@"c:\file-utf-seven.txt");
StreamWriter streamWriter = new StreamWriter(filename, false, Encoding.UTF7);
streamWriter.WriteLine("I am feeling great.");
streamWriter.Close();

using (StreamReader reader = new StreamReader(filename, Encoding.UTF7))
{
	Console.WriteLine(reader.ReadToEnd());
}

Read all posts dedicated to file I/O here.

Filed under .NET, Files and directories Tagged with c#, encoding, file, I/O

Saving a text file using a specific encoding in C# .NET

December 26, 2014 1 Comment

The StreamWriter object constructor lets you indicate the encoding type when writing to a text file. The following method shows how simple it is:

private static void SaveFile(Encoding encoding)
{
	Console.WriteLine("Encoding: {0}", encoding.EncodingName);
	string filename = string.Concat(@"c:\file-", encoding.EncodingName, ".txt");
	StreamWriter streamWriter = new StreamWriter(filename, false, encoding);
	streamWriter.WriteLine("I am feeling great.");
	streamWriter.Close();
}

We saw in this post how to get hold of a specific code page. We also saw that if you only use characters in the ASCII range, i.e. positions 0-127 then most encoding types will handle the string in a uniform way.

Call the above method like this:

SaveFile(Encoding.UTF7);
SaveFile(Encoding.UTF8);
SaveFile(Encoding.Unicode);
SaveFile(Encoding.UTF32);

So we’ll have 4 files at the end each named after the encoding type. Depending on the supported code pages on your PC Notepad may or may not be able to handle the encoding types. Notepad should not have any problem with UTF8 and UTF16. The UTF7 file will probably look OK, whereas UTF32 will most likely look strange. In my case the UTF32 file content looked like this:

I a m f e e l i n g g r e a t .

…i.e. with some bonus white-space in between the characters. Notepad was not able to correctly read UTF32.

The default encoding type is UTF-16 which will suffice in most situations. If you’re unsure then select this code page.

Providing an encoding type which cannot handle certain characters will result in replacement characters to be shown. If we change the string to be saved to “öåä I am feeling great.” and call the SaveFile method like

SaveFile(Encoding.ASCII);

…then you’ll see the following content in Notepad:

??? I am feeling great. ASCII could not handle the Swedish characters öåä and replaced them with question marks.

Read all posts dedicated to file I/O here.

Filed under .NET, Files and directories Tagged with c#, encoding, file, I/O

Getting the byte array of a string depending on Encoding in C# .NET

December 24, 2014 Leave a comment

Encoding.ASCII
Encoding.BigEndianUnicode
Encoding.Unicode – this is UTF16
Encoding.UTF7
Encoding.UTF32
Encoding.UTF8

Once you’ve got hold of an encoding you can call its GetBytes method to return the byte array representation of a string. You can use this method whenever another method requires a byte array input instead of a string.

For backward compatibility the positions 0-127 are the same in most encoding types. These cover the standard English alphabet – both lower and upper case -, the numbers, punctuation plus some other characters. So if you only take characters from this range then the byte values in the array will be the same. You can view the ASCII characters here: ASCII character set.

The following function will print the same values for both the ASCII and Chinese encoding types:

string input = "I am feeling great";
byte[] asciiEncoded = Encoding.ASCII.GetBytes(input);
Console.WriteLine("Ascii");
foreach (byte b in asciiEncoded)
{
	Console.WriteLine(b);
}

Encoding chinese = Encoding.GetEncoding("Chinese");
byte[] chineseEncoded = chinese.GetBytes(input);
Console.WriteLine("Chinese");
foreach (byte b in chineseEncoded)
{
	Console.WriteLine(b);
}

If you’re trying to ASCII-encode a Unicode string which contains non-ASCII characters then you’ll get see the ASCII byte value of 63, i.e. ‘?’:

string input = "öåä I am feeling great";
byte[] asciiEncoded = Encoding.ASCII.GetBytes(input);
Console.WriteLine("Ascii");
foreach (byte b in asciiEncoded)
{
	Console.WriteLine(b);
}

The first 3 positions will print 63 as the Swedish ‘öåä’ characters cannot be handled by ASCII. E.g. whenever you visit a website and see question marks and other funny characters instead of proper text then you know that there’s an encoding problem: the page has been encoded with an encoding type that’s not available on the user’s computer when viewed.

View all posts related to Globalization here.

Filed under .NET, Globalization Tagged with c#, encoding, globalization

Getting the list of supported Encoding types in .NET

December 23, 2014 Leave a comment

Every text file and string is encoded using one of many encoding standards. Normally .NET will handle encoding automatically but there are times when you need to dig into the internals for encoding and decoding. It’s very simple to retrieve the list of supported encoding types, a.k.a code pages in .NET:

EncodingInfo[] codePages = Encoding.GetEncodings();
foreach (EncodingInfo codePage in codePages)
{
	Console.WriteLine("Code page ID: {0}, IANA name: {1}, human-friendly display name: {2}", codePage.CodePage, codePage.Name, codePage.DisplayName);
}

Example output:

Code page ID: 37, IANA name: IBM037, human-friendly display name: IBM EBCDIC (US-Canada)
Code page ID: 852, IANA name: ibm852, human-friendly display name: Central European (DOS)

View all posts related to Globalization here.

Filed under .NET, Globalization Tagged with c#, encoding, globalization

Exercises in .NET with Andras Nemes

Getting the byte array of a string depending on Encoding in C# .NET

Reading a text file using a specific encoding in C# .NET

Saving a text file using a specific encoding in C# .NET

Getting the byte array of a string depending on Encoding in C# .NET

Getting the list of supported Encoding types in .NET

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Share:

Share:

Share:

Share:

Share:

My profile

Verified Services

Follow my blog via email

Top Posts & Pages

History

Keywords

Blogs I Follow