I also bring you back to my older argument where if you tried to port this code to the most modern Cray T3C supercomputer or to any SPARC machine it would fail to run because these machines *require* aligned addresses.
In any event, when the CPU accesses memory in qword blocks, it has to write an entire qword (You can't write half a qword! The intel docs say specificaly entire qwords!). So to write back to memory properly it has to first read the memory area that it is modifying:
Code: Select all
byte: 76543210
^^^^ these bytes are being modified.
Code: Select all
byte: 76543210
^^^^ these bytes are being modified.
The Pentium IV memory circutry is not really that complicated. From what I've seen in the intel documentation they say there is still a performance hit if any value bigger than a byte is mis-aligned. Since the modest majority of software keeps its memory values aligned it is not in Intel's interests to support a programming habbit that hasn't been around since the 80386.
Such a system of masking the bits that are latched from the memory bus just isn't worth the time to design since it only makes a difference in the oldest software and requires the co-operation of the entire memory system.