Page tables, long mode and other thing (looking for ideas.)
Posted: Mon Feb 27, 2012 5:00 pm
I just kicked Bochs into long mode (hooray.) I'm currently grappling with the sheer enormity of the address space! Thinking about this has led me to several conclusions:
There are lots of ways to abuse the scale of the address space. One thing I have wanted to do for a long time is write a very lightweight HPC kernel that effectively maps entire disks into the address space. This is not possible (without using lots of segments) on a 32 bit machine. On x64 there's room enough for even the largest hard drive to be directly addressable.
The idea is very simple: just treat the RAM as cache for the disk. Notionally, the disk is the storage space from which code executes (you literally can branch to any address on the drive.) One obvious advantage to this is that after a reboot the machine picks up exactly where it left off (providing one has code to reliably checkpoint disk and task state.) One obvious downside is that the disk must be mapped to the same virtual address, because any code it contains will be linked assuming a fixed linear address (unless one wishes to make everything position independent.)
also one can execute:
... and goodbye disk (without task isolation) Which brings me to my second point, I'd like to do away with task isolation.
For a single purpose box (a database for instance) this situation is fine. One layer 0 kernel (that performs the disk to address space mapping) and one layer 1 task (the database) that runs in ring 0 but relies on the mapping provided by layer 0. If you have a bug in your database engine, you (and your database) are probably borked anyway (even on a task isolated machine.)
I'm not sure what the best way to accomplish the disk to address space mapping is. Simpler is better, the aim is to maximise throughput and minimise latency. The simplest solution is a 1:1 mapping, from the master boot record right through to the last sector of the disk. Pressing the reset button with such a scheme would obviously leave the disk in an undefined state; what I'm wondering at the moment is whether it is possible to deal with this at level 1.
I'm also thinking about how to arrange the network interface. I really don't want to have any system calls to level 0, just traps and exceptions. One idea is to arrange a ring buffer in a predefined location and use page faults to trigger transmission of packets. The interface would be very simple, write a packet to the next page aligned address in the right buffer and then cause a page fault (a read would suffice) at the page after that. Once a packet has entered the send queue the pages would be marked in the page table and if the level 1 code wraps around the transmit buffer and causes a page fault on a queued packet, the process is blocked until the packet has been transmitted. A similar scheme would work for packet reception.
Thoughts?
There are lots of ways to abuse the scale of the address space. One thing I have wanted to do for a long time is write a very lightweight HPC kernel that effectively maps entire disks into the address space. This is not possible (without using lots of segments) on a 32 bit machine. On x64 there's room enough for even the largest hard drive to be directly addressable.
The idea is very simple: just treat the RAM as cache for the disk. Notionally, the disk is the storage space from which code executes (you literally can branch to any address on the drive.) One obvious advantage to this is that after a reboot the machine picks up exactly where it left off (providing one has code to reliably checkpoint disk and task state.) One obvious downside is that the disk must be mapped to the same virtual address, because any code it contains will be linked assuming a fixed linear address (unless one wishes to make everything position independent.)
also one can execute:
Code: Select all
mov rdi, START_OF_MAPPED_DRIVE
mov rcx, SIZE_OF_DRIVE_IN_QWORDS
mov rax, 0xfeeddeadfeeddead
cld
rep stosq
For a single purpose box (a database for instance) this situation is fine. One layer 0 kernel (that performs the disk to address space mapping) and one layer 1 task (the database) that runs in ring 0 but relies on the mapping provided by layer 0. If you have a bug in your database engine, you (and your database) are probably borked anyway (even on a task isolated machine.)
I'm not sure what the best way to accomplish the disk to address space mapping is. Simpler is better, the aim is to maximise throughput and minimise latency. The simplest solution is a 1:1 mapping, from the master boot record right through to the last sector of the disk. Pressing the reset button with such a scheme would obviously leave the disk in an undefined state; what I'm wondering at the moment is whether it is possible to deal with this at level 1.
I'm also thinking about how to arrange the network interface. I really don't want to have any system calls to level 0, just traps and exceptions. One idea is to arrange a ring buffer in a predefined location and use page faults to trigger transmission of packets. The interface would be very simple, write a packet to the next page aligned address in the right buffer and then cause a page fault (a read would suffice) at the page after that. Once a packet has entered the send queue the pages would be marked in the page table and if the level 1 code wraps around the transmit buffer and causes a page fault on a queued packet, the process is blocked until the packet has been transmitted. A similar scheme would work for packet reception.
Thoughts?