Getting the byte array of a string depending on Encoding in C# .NET
December 27, 2016 Leave a comment
You can take any string in C# and view its byte array data depending on the Encoding type. You can get hold of the encoding type using the Encoding.GetEncoding method. Some frequently used code pages have their short-cuts:
- Encoding.ASCII
- Encoding.BigEndianUnicode
- Encoding.Unicode – this is UTF16
- Encoding.UTF7
- Encoding.UTF32
- Encoding.UTF8
Once you’ve got hold of an encoding you can call its GetBytes method to return the byte array representation of a string. You can use this method whenever another method requires a byte array input instead of a string.
For backward compatibility the positions 0-127 are the same in most encoding types. These cover the standard English alphabet – both lower and upper case -, the numbers, punctuation plus some other characters. So if you only take characters from this range then the byte values in the array will be the same. You can view the ASCII characters here: ASCII character set.
The following function will print the same values for both the ASCII and Chinese encoding types:
string input = "I am feeling great"; byte[] asciiEncoded = Encoding.ASCII.GetBytes(input); Console.WriteLine("Ascii"); foreach (byte b in asciiEncoded) { Console.WriteLine(b); } Encoding chinese = Encoding.GetEncoding("Chinese"); byte[] chineseEncoded = chinese.GetBytes(input); Console.WriteLine("Chinese"); foreach (byte b in chineseEncoded) { Console.WriteLine(b); }
If you’re trying to ASCII-encode a Unicode string which contains non-ASCII characters then you’ll get see the ASCII byte value of 63, i.e. ‘?’:
string input = "öåä I am feeling great"; byte[] asciiEncoded = Encoding.ASCII.GetBytes(input); Console.WriteLine("Ascii"); foreach (byte b in asciiEncoded) { Console.WriteLine(b); }
The first 3 positions will print 63 as the Swedish ‘öåä’ characters cannot be handled by ASCII. E.g. whenever you visit a website and see question marks and other funny characters instead of proper text then you know that there’s an encoding problem: the page has been encoded with an encoding type that’s not available on the user’s computer when viewed.
View all posts related to Globalization here.