Let's create a list of problems:staringlizard wrote:I have attached a new version here:
- There's no space for a BPB (needed for floppy) and no space for a partition table (needed for hard disk); and if it's not for booting from floppy and not for booting from hard disk (and not network or CD-ROM) then it's not useful for booting from anything.
- It's using 32-bit instructions (for no reason) and C calling conventions (for no reason), and the instruction selection is mostly bad (I'm guessing GCC generated without optimisations). All of this means that the code is probably taking up twice as much space as necessary (which is a severe problem when space is limited to less than 512 bytes).
- There's little error checking and no sane error messages:
- When enabling A20 you need to check if A20 was actually enabled (or if the BIOS doesn't support the function and/or is buggy) and display an "ERROR: Failed to enable A20" error message.
- Before enabling long mode you need to check if the CPU supports long mode and display an "ERROR: CPU is too old" error message if long mode isn't supported.
- When loading data from disk you should have about 10 different error messages so the user can know what went wrong (and not just that something went wrong - e.g. was it a read error, was the disk removed, was the function not supported, were the parameters wrong, etc).
- Before you display anything you should make sure something can be displayed (e.g. make sure the video mode is 80*25 text mode and hasn't been set to something else by anything else). Note: if/when something is wrong and you can't boot, it's also a good idea to make PC speaker beep (in addition to a good descriptive error message) in case the system has no video card and/or no monitor.
- When your code is first started, DL contains the correct device number that you need to use. You ignore this and assume the boot device is device 0x80. Given that almost all boot managers support booting from other disk drives this is a mistake (e.g. your OS may be installed on a partition on device 0x81).
- The code to get a memory map looks very fragile to me. For example:
- It's "hard-coded" for 24-byte entries (even though recent ACPI specs have extended it to 28 bytes)
- Some BIOSs set EBX to zero to indicate that the entry is the last entry; and you don't check for that (which may result in an endless loop that fills up all RAM until SI (not ESI) overflows and wraps around, resulting in your code being overwritten and crashing)
- You don't check any of the returned entries (e.g. one of the many common problems is that some BIOSs return entries with "area size = 0" that can/should be ignored).
- The "extended disk read" function is limited to 127 sectors on some systems. This means that if/when the kernel you're loading grows larger than 63.5 KiB you will be unable to load all of it safely. Also note that your kernel will probably grow to be several MiB, and therefore will not fit in the (less than 640 KiB) RAM that can be accessed while you're in real mode. Your boot code needs to load part of the kernel (e.g. 63.5 KiB) and switch to protected/long mode and copy that part elsewhere and switch back to real mode; then load/copy the next part; and so on until the entire kernel is loaded. It can't load the entire thing in one go.
- Your "mov $0x7bff,%esp" will cause every stack push/pop to be misaligned (it should be "mov $0x7C00,%esp")
- Your code to enable long mode assumes there's a valid PML4 at 0x2000. I can't see any code to create a valid PML4 at 0x2000 (or any of the tables it will require - PDPT, PD, PT, etc). Note that the kernel seems to be loaded at 0x8000 (and not 0x2000) so the necessary paging structures can't be hard-coded into the kernel either.
- It's a good idea to check if the kernel is sane (and not corrupt/trashed) before starting it. For example, I typically include a CRC32 check (and an "ERROR: Kernel is corrupt" error message). I also tend to use compression (where boot code loads a compressed file and then decompresses it) as this improves boot times (e.g. half as much slow disk IO involved).
- Text modes suck (should've been banned/deprecated in 1990). You may want to consider setting a graphical video mode (e.g. using VBE) before passing control to the kernel (e.g. while you're in real mode and still able to use BIOS/VBE functions) so that your OS is able to use a graphical video mode during boot (and possibly after boot).
- UEFI is something that should be on your mind. Ideally, the boot code for BIOS and the boot code for UEFI would both start the same kernel and hide the differences between BIOS and UEFI from that kernel. This may include providing all information needed by the kernel (e.g. memory map, information describing video framebuffer, etc) in the same way. For example; it'd be nice to parse/sanitise the memory map given to you by the BIOS and transform it into a "standardised for your OS" format (where that "standardised for your OS" format is a super-set of the information that could be provided by both BIOS and UEFI, and not a sub-set).
Now; you'll notice I haven't found the cause of your symptoms (and all of the above may seem "off topic"). What I'm saying is that all of your code needs to be redesigned and then rewritten. Mostly, you have a choice: you can fix the current bug, and then redesign and rewrite everything; or you can forget about the current bug and then redesign and rewrite everything. Obviously, the latter is more efficient. Basically, there's no point finding or fixing the bug.
Cheers,
Brendan