Hi,
Matthew wrote:I implemented serial port logging for output from the emulators but I've come to realize that none of my machines have serial ports.
Unless you're using nothing but laptops, you might want to check the motherboard - desktop/server motherboards that don't have serial ports are *extremely* rare. It's possible that the motherboard has serial ports, but whoever built the computer didn't install the cable/adapter from the motherboard to serial port socket on the back of the case. Maybe you could find a suitable cable/adapter and simply plug it into the motherboard. For example:
You might need to do a little research for this though, as different motherboards use different connections, and an adapter (like the picture above) for one motherboard might not work for another motherboard.
Another alternative would be to buy an I/O card. You can get them for almost all types of bus (ISA, PCI, PCI express, etc), and something with a parallel port and 2 serial ports usually only costs about $25 (Aust).
Of course another idea would be to simply buy another computer - you can never have too many test computers...
Matthew wrote:At this point I've got printf() and a whole lot of guessing to go on. Does anyone have any tips for dealing with this kind of situation? Is there some way of making any of the emulators "more realistic" so that I can get a feel for the problem in a debuggable environment?
Most emulators do some instructions for one emulated CPU, then switch to another emulated CPU and do some more instructions, then the next CPU, etc. For Bochs you can control how many emulated instructions are done on each CPU before the emulator switches to another CPU. For example (in "bochsrc.txt"):
Code: Select all
cpu: count=2, [b]quantum=1[/b], ips=400000000, reset_on_triple_fault=0
This tells Bochs to do one instruction on each CPU, which is bad for performance but makes it much more likely that race conditions and re-entrancy bugs will be detected. For Qemu I think you're screwed (I think Qemu spends 1 ms emulating one CPU, then another 1 ms emulating the next CPU, etc, and the chance of detecting any race conditions and re-entrancy bugs is a lot lot less because of this). For real computers CPUs execute instructions at the same time, so these types of bugs are much more likely to be detected; and some bugs (e.g. forgetting to use a "lock" prefix where it's necessary) will only effect a real computer.
For intermittent page faults, make sure that you're invalidating TLBs correctly. I've seen OSs that never invalidate TLBs at all that run find on Pentium and older CPUs (and emulators), but fail miserably (in random ways) on any modern real computer.
Finally, sometimes the best debugging tool is a pen and paper (and that squishy thing between your ears
). Some bugs are almost impossible to debug using any other technique, because adding something like a "printf()" changes the timing and makes the bug disappear.
Cheers,
Brendan