c# - Why are ASCII values of a byte different when cast as Int32? -
i'm in process of creating program scrub extended ascii characters text documents. i'm trying understand how c# interpreting different character sets , codes, , noticing oddities.
consider:
namespace asciitest { class program { static void main(string[] args) { string value = "slide™1½”c4®"; byte[] asciivalue = encoding.ascii.getbytes(value); // byte array char[] array = value.tochararray(); // char array console.writeline("char\tbyte\tint32"); (int = 0; < array.length; i++) { char letter = array[i]; byte bytevalue = asciivalue[i]; int32 int32value = array[i]; // console.writeline("{0}\t{1}\t{2}", letter, bytevalue, int32value); } console.readline(); } } } output program
char byte int32 s 83 83 l 108 108 105 105 d 100 100 e 101 101 t 63 8482 <- trademark symbol 1 49 49 ½ 63 189 <- fraction " 63 8221 <- smartquotes c 67 67 4 52 52 r 63 174 <- registered trademark symbol in particular, i'm trying understand why extended ascii characters (the ones notes added right of third column) show correct value when cast int32, show 63 when cast byte value. what's going on here?
ascii.getbytes conversion replaces all characters outside of ascii range (0-127) question mark (code 63).
so since string contains characters outside of range asciivalue have ? instead of interesting symbols ™ - char (unicode) repesentation 8482 indeed outside of 0-127 range.
converting string char array not modify values of characters , still have original unicode codes (char int16) - casting longer integer type int32 not change value.
below possible conversion of character byte/integers:
var value = "™"; var ascii = encoding.ascii.getbytes(value)[0]; // 63(`?`) - outside 0-127 range var casttobyte = (byte)(value[0]); // 34 = 8482 % 256 var int16 = (int16)value[0]; // 8482 var int32 = (int16)value[0]; // 8482 details available @ asciiencoding class
asciiencoding corresponds windows code page 20127. because ascii 7-bit encoding, ascii characters limited lowest 128 unicode characters, u+0000 u+007f. if use default encoder returned encoding.ascii property or asciiencoding constructor, characters outside range replaced question mark (?) before encoding operation performed.
Comments
Post a Comment