The inability to use 64-bit registers in 32-bit code is mostly a design flaw by AMD. When Intel moved from 16 bits to 32 bits, they also created overrides that made it possible to use 32-bit registers & addressing in 16-bit code.thewrongchristian wrote:A big address space also makes things simpler. Think something like having a memory mapped file representing your database. Sure, you can have windows into that file in a 32-bit address space, but with 64-bits to play with, you can reasonable map the entire database file into memory and simply use pointers to navigate. Simpler code often means fewer bugs, and lower maintenance costs.rdos wrote:I think hardware task-switching was usable (and efficient) back when CPUs only had a single core. I don't think Intel originally thought about the problems their hardware task-switching would cause in multicore systems. I only dropped hardware task-switching when I moved to multicore.
I question a bit the need for long mode today. I had an application that uses up to 100GB of physical memory, but I also found out that by creating a smart algorithm that analyzed only part of the data at a time, then mapping this large physical area into 2M windows in 3G linear memory really isn't a problem.
I think long mode is mostly a need for applications that are poorly designed. My 32-bit OS certainly didn't stop me from analyzing 100GB of data I streamed over PCI.
After all, PAE paging can map just as much physical memory as long mode can.
Plus, it sounds like your algorithm could analyze your stream in self contained chunks. What if you can't do that, and you need random access to your data (such as the database example above.)
It's not just about address space, though. Long mode opened up other opportunities, such as adding extra registers, and the jump from 8 to 16 GPR probably had a big positive effect on performance.
Too simplistic code often creates slow code. When code uses random access the cache is poorly utilized with many cache misses. The cost of cache misses is also doubled in long mode since long mode use four paging levels to handle 48-bit addresses. So, I wouldn't exactly say that using random access in long mode is always a good solution for good performance. It might be if random access is inevitable, but if accesses can be more localized, fewer cache misses will be the result and the code would run much faster.