Debugging new hardware configurations
Posted: Thu Nov 17, 2016 5:29 pm
Anybody else that have left the "emulator" stage, and is trying to boot their OS on various hardware platforms, sometimes finding the USB chips are not working, that there is a new network chip that you don't support (or that doesn't work), and then you have no easy way to debug the problems? Faulty USB drivers typically mean you have no input device, so you are unable to do interactive debugging. If the network doesn't work either, then remote debugging over ethernet won't work either.
Even worse, with new boot variants, like EFI, the processor might run in some strange mode, and then fault early in the boot process, leaving you wondering what is actually wrong. Or EFI / BIOS might hand you strange memory maps that won't work, and how will that be debugged?
How do Linux or Windows developers handle this? They can't be using printf-debugging, can they??
I've just redesigned my "panic handler" code so it runs in it's own address space, and I also hooked my x86 emulator to it so I can single-step faulty code by emulating the execution. I also redesigned the whole system so the handler can be invoked already before the kernel is loaded so I can emulate the boot process. Since it is an emulator, I can also emulate turning off paging, and switching between real mode, protected mode and long mode. This is things I could only do in a freestanding emulator before, but now it actually runs in the target system, and can interact with the real hardware. Just like before, it's also a nice tool for debugging SMP-issues, and I actually can single step multiple cores too, checking various synchronization scenarios. The "panic handler" is either invoked by a fault in the boot-process, or by planting an interrupt instruction in the code while the kernel is running. The latter can be done in IRQs as well as in the scheduler, and typically is done when a faulty condition is detected. The handler is also invoked if an IRQ faults or if the kernel stack overflows because of too much interrupt nesting.
The problem however is with input devices. The emulator is no good if there is no input device, and many modern PCs have USB keyboards. The PS/2 keyboard input device is easy to support in an interrupt free environment, but supporting USB keyboards is more or less a nightmare. Most keyboards should have a "boot" mode that means you don't need a HID parser, but you still have to support four different types of USB controllers (UHCI, OHCI, EHCI and XHCI), and their hubs. Additionally, if the key problem of the hardware is that the USB stacks don't work, then reprogramming the USB controller will make it impossible to also debug problems with the USB hardware.
Serial ports are also attractive because of their simplicity, but they are as rare as PS/2 keyboards, so that will not work for a majority of systems either.
Maybe the way to go is to implement simplified versions of the network chip drivers, and then only supporting UDP, listening for commands on a fixed port, and then sending answers with single UDP frames?
I wish there were some common hardware in a majority of PCs that could work as an input device, like there used to be, but I cannot see any.
Anybody done this, or that have other ideas how to make it work better?
Even worse, with new boot variants, like EFI, the processor might run in some strange mode, and then fault early in the boot process, leaving you wondering what is actually wrong. Or EFI / BIOS might hand you strange memory maps that won't work, and how will that be debugged?
How do Linux or Windows developers handle this? They can't be using printf-debugging, can they??
I've just redesigned my "panic handler" code so it runs in it's own address space, and I also hooked my x86 emulator to it so I can single-step faulty code by emulating the execution. I also redesigned the whole system so the handler can be invoked already before the kernel is loaded so I can emulate the boot process. Since it is an emulator, I can also emulate turning off paging, and switching between real mode, protected mode and long mode. This is things I could only do in a freestanding emulator before, but now it actually runs in the target system, and can interact with the real hardware. Just like before, it's also a nice tool for debugging SMP-issues, and I actually can single step multiple cores too, checking various synchronization scenarios. The "panic handler" is either invoked by a fault in the boot-process, or by planting an interrupt instruction in the code while the kernel is running. The latter can be done in IRQs as well as in the scheduler, and typically is done when a faulty condition is detected. The handler is also invoked if an IRQ faults or if the kernel stack overflows because of too much interrupt nesting.
The problem however is with input devices. The emulator is no good if there is no input device, and many modern PCs have USB keyboards. The PS/2 keyboard input device is easy to support in an interrupt free environment, but supporting USB keyboards is more or less a nightmare. Most keyboards should have a "boot" mode that means you don't need a HID parser, but you still have to support four different types of USB controllers (UHCI, OHCI, EHCI and XHCI), and their hubs. Additionally, if the key problem of the hardware is that the USB stacks don't work, then reprogramming the USB controller will make it impossible to also debug problems with the USB hardware.
Serial ports are also attractive because of their simplicity, but they are as rare as PS/2 keyboards, so that will not work for a majority of systems either.
Maybe the way to go is to implement simplified versions of the network chip drivers, and then only supporting UDP, listening for commands on a fixed port, and then sending answers with single UDP frames?
I wish there were some common hardware in a majority of PCs that could work as an input device, like there used to be, but I cannot see any.
Anybody done this, or that have other ideas how to make it work better?