Page 1 of 1

Debugging new hardware configurations

Posted: Thu Nov 17, 2016 5:29 pm
by rdos
Anybody else that have left the "emulator" stage, and is trying to boot their OS on various hardware platforms, sometimes finding the USB chips are not working, that there is a new network chip that you don't support (or that doesn't work), and then you have no easy way to debug the problems? Faulty USB drivers typically mean you have no input device, so you are unable to do interactive debugging. If the network doesn't work either, then remote debugging over ethernet won't work either.

Even worse, with new boot variants, like EFI, the processor might run in some strange mode, and then fault early in the boot process, leaving you wondering what is actually wrong. Or EFI / BIOS might hand you strange memory maps that won't work, and how will that be debugged?

How do Linux or Windows developers handle this? They can't be using printf-debugging, can they??

I've just redesigned my "panic handler" code so it runs in it's own address space, and I also hooked my x86 emulator to it so I can single-step faulty code by emulating the execution. I also redesigned the whole system so the handler can be invoked already before the kernel is loaded so I can emulate the boot process. Since it is an emulator, I can also emulate turning off paging, and switching between real mode, protected mode and long mode. This is things I could only do in a freestanding emulator before, but now it actually runs in the target system, and can interact with the real hardware. Just like before, it's also a nice tool for debugging SMP-issues, and I actually can single step multiple cores too, checking various synchronization scenarios. The "panic handler" is either invoked by a fault in the boot-process, or by planting an interrupt instruction in the code while the kernel is running. The latter can be done in IRQs as well as in the scheduler, and typically is done when a faulty condition is detected. The handler is also invoked if an IRQ faults or if the kernel stack overflows because of too much interrupt nesting.

The problem however is with input devices. The emulator is no good if there is no input device, and many modern PCs have USB keyboards. The PS/2 keyboard input device is easy to support in an interrupt free environment, but supporting USB keyboards is more or less a nightmare. Most keyboards should have a "boot" mode that means you don't need a HID parser, but you still have to support four different types of USB controllers (UHCI, OHCI, EHCI and XHCI), and their hubs. Additionally, if the key problem of the hardware is that the USB stacks don't work, then reprogramming the USB controller will make it impossible to also debug problems with the USB hardware.

Serial ports are also attractive because of their simplicity, but they are as rare as PS/2 keyboards, so that will not work for a majority of systems either.

Maybe the way to go is to implement simplified versions of the network chip drivers, and then only supporting UDP, listening for commands on a fixed port, and then sending answers with single UDP frames?

I wish there were some common hardware in a majority of PCs that could work as an input device, like there used to be, but I cannot see any.

Anybody done this, or that have other ideas how to make it work better?

Re: Debugging new hardware configurations

Posted: Thu Nov 17, 2016 10:39 pm
by Brendan
Hi,
rdos wrote:I wish there were some common hardware in a majority of PCs that could work as an input device, like there used to be, but I cannot see any.
Just buy a PCI card (a USB controller or NIC or serial or...), make sure your driver for that card works, then use it as "common hardware" by plugging that card into any computer you like.


Cheers,

Brendan

Re: Debugging new hardware configurations

Posted: Fri Nov 18, 2016 3:02 am
by rdos
Brendan wrote:Hi,
rdos wrote:I wish there were some common hardware in a majority of PCs that could work as an input device, like there used to be, but I cannot see any.
Just buy a PCI card (a USB controller or NIC or serial or...), make sure your driver for that card works, then use it as "common hardware" by plugging that card into any computer you like.


Cheers,

Brendan
That typically doesn't work on many portables, as well as pads and similar hardware. Stationary computers are pretty easy to get up and working anyway (they usually have PS/2 connectors, either as real connectors, or as connectors on the motherboard). If you design your own PC, you can also make sure you get hardware that you support. The problem is that stationary computers are getting less common, and they are replaced with portables and pads. Those typically have no way to extend them with PCI cards, lack real serial ports and have USB-based input devices. Many also come with WiFi network adapters, which typically need to be reverse engineered because there is no official specifications.

Re: Debugging new hardware configurations

Posted: Fri Nov 18, 2016 3:21 am
by Kevin
rdos wrote:How do Linux or Windows developers handle this? They can't be using printf-debugging, can they??
Yes, they can. And analysing crash dumps, I guess.

As long as only one component is new, you're good anyway. As long as you can work locally on the system, writing and debugging the network driver shouldn't be too bad. And if you have network (or serial) access, you can work remotely on things like input or storage drivers. Only if all of them are missing at the same time, it gets kind of nasty. If you want to avoid lots of reboots and printf debugging then, you'd have to find another machine where you can deal with only one of the problematic components at a time.
Maybe the way to go is to implement simplified versions of the network chip drivers, and then only supporting UDP, listening for commands on a fixed port, and then sending answers with single UDP frames?
I doubt that this will be helpful unless you're actually debugging your generic TCP layer or something. If you can support UDP, the actual NIC driver should be good enough for handling everything else as well.

Well, unless you have a NIC like the on-board sis900 on one of my PCs, with which I never managed to survive sending packets of more than 128 bytes (yes, 128 itself is fine, oddly) for some reason, it just didn't do anything any more after that. Maybe I should get back to that one sometime, I'm still kind of curious what the problem is there... Back then I had to reboot for every attempt, but I guess I could play with it more easily now.

Re: Debugging new hardware configurations

Posted: Fri Nov 18, 2016 3:47 am
by rdos
Kevin wrote:
rdos wrote:How do Linux or Windows developers handle this? They can't be using printf-debugging, can they??
Yes, they can. And analysing crash dumps, I guess.
That's horrible.
Kevin wrote: As long as only one component is new, you're good anyway. As long as you can work locally on the system, writing and debugging the network driver shouldn't be too bad. And if you have network (or serial) access, you can work remotely on things like input or storage drivers. Only if all of them are missing at the same time, it gets kind of nasty. If you want to avoid lots of reboots and printf debugging then, you'd have to find another machine where you can deal with only one of the problematic components at a time.
That's true. Still, the more possibilities you have to debug things, the faster you can solve things and the easier it gets.
Kevin wrote:
Maybe the way to go is to implement simplified versions of the network chip drivers, and then only supporting UDP, listening for commands on a fixed port, and then sending answers with single UDP frames?
I doubt that this will be helpful unless you're actually debugging your generic TCP layer or something. If you can support UDP, the actual NIC driver should be good enough for handling everything else as well.
I wouldn't need it for debugging the TCP layer. I can do that with the kernel debugger running under the OS. Actually, most issues can be debugged with it, except for boot failures, scheduler issues, SMP issues, mode switches and crashes in IRQs. At least as long as the standard input and output works. I can run the kernel debugger over a network, but that uses the OS network card driver since it is all happening in the context of a functioning OS.

So what I had in mind was to create an application that runs on another machine on the network, and then send UDPs containing key press information, and getting back the display. That would make the "panic" debugger work even in the absence of a usable input device.

Re: Debugging new hardware configurations

Posted: Fri Nov 18, 2016 9:39 am
by StephanvanSchaik
For a while I have actively spent time on documenting and figuring out how to get (mainline) Linux to boot on various ARM Chromebooks where they use u-boot as the boot loader of choice (nowadays they use Coreboot with Depthcharge as payload). When you can't get past u-boot to display a console or load the Linux kernel, you are pretty much stuck with a blank screen and are unable to debug the booting process. I quickly learnt that the developers that have to work on u-boot (or Coreboot) have cables attached to the serial pins of the ARM SoC to be able to get meaningful output. I suspect that many manufacturers of ARM SoCs and the mobile phones and embedded hardware those SoCs are used in do the same. Similarly many routers also have a serial port on the inside for debugging output in case you are developing the firmware for it.


Yours sincerely,
Stephan.