Now, you're an experienced C programmer, so you know you'll have to deal with endianness: the JVM is big-endian, and x86 is little-endian. Every time you find 0xff00 in your file, you need to flip those bytes around to read 0x00ff. So you write a simple loop -- in pseudocode,
int accumulator = 0; while( b = file.readByte() ) { accumulator = accumulator | b; accumulator = accumulator << 8; }
And low and behold, 0xff00 becomes ... -1.
You might have guessed it already, but the problem is that Java doesn't support unsigned integers and that first byte, 0xff, is read as -1 per 2's complement.
Now, if you can visualize all that bit-shifting and or'ing, you might not see this as a problem -- the 0xff will just be OR'ed into your short, then shifted. This is not what happens.
Instead, our byte b, which has been interpreted as -1, is silently coerced into a short -- 0xff becomes 0xffff, which is then OR'ed with 0x0000 to yield 0xffff, aka -1. Our next byte, 0x00, is also coerced and OR'ed to no effect.
[Why anyone would want a signed byte is a whole 'nother question -- next time I want to store a value between -128 and 127, I guess I know what to use.]
The fix is to undo the 2's complement -- to turn -1 into 255, like so:
int accumulator = 0; while( b = file.readByte() ) { short s = (short) b; if( s < 0 ) { s = s + 0x100; } accumulator = accumulator | s; accumulator = accumulator << 8; }
That's how you read a single unsigned short in Java. Just so intuitive, don't you think? Compare with Python:
s = struct.unpack("B", file.read(1))
No comments:
Post a Comment