Page 1 of 1

Confirmation of Bug diagnosis sought: Solved

Posted: Sun May 06, 2018 9:41 am
by MichaelFarthing
I have been plugging away at my OS in the background and had it installed on its own partition on a hard disk that also contained a Windows and two Linux installations with booting controlled by Grub2. My OS was simply chain loaded by Grub. I recently copied the system to the first (and only) partition of a USB drive and after some tweaking got my machine to boot up from the USB. The OS was booted fine but there was strange behaviour by the volume boot record whose first job was to clear the screen, which it did only partially and not with the requested background colour. Inspection of the vbr code showed a mistakenly loaded byte at 0x7c24 which should have been 0xcd, the start of a bios call to hide the video cursor. Instead, 0x80 was present which was causing an unwanted register load that was clearly preventing the interrupt and which was also spilling over to the following instructions that were setting up the screen clear.

Problem apparently solved - disk corruption. However, inspecting the disk content itself, both in my own OS and using a utility in Windows, consistently showed the relevant byte to be correctly recorded as 0xcd. So the next step was to see if early code in the vbr was corrupting memory - hardly anything was done earlier on: the segment registers and stack were set up, a far jump was done to ensure cs was correctly zero, dl was saved to a memory location and the screen mode was set. Code was the same as for the well-established hard disk version but nevertheless I removed elements individually but without solving the problem.

At this point, inspecting the disk again in a utility Disk Editor on Windows, I noticed that that 0x7c24 corresponded to a location reserved by FAT boot records for a physical disk number - and that my corruption of 0x80 in place of 0xcd fitted well with some machine boot code "helpfully" correcting my vbr to a "valid" value.

This made further sense as on the hard disk this probaby would not occur because by the time my vbr had loaded grub had previously taken control of the boot process whereas on my usb I was depending on the native boot sequence. Anyway, I buried the offending byte location in a pointless register load instruction and sacrificed the cursor hiding interrupt: all problems disappeared.

However, I am conscious that there's quite a bit of supposition in this solution. The machine involved is a Toshiba Satellite Pro and I wondered if anyone had come across anything similar to this, or, indeed, has knowledge that this sort of thing does happen?

Re: Confirmation of Bug diagnosis sought

Posted: Sun May 06, 2018 10:59 am
by Octocontrabass
This is not the first report here of the firmware "helpfully" updating the BPB and corrupting the boot code as a result, but this is the first time anyone has claimed to see it on a partitioned disk. I have seen at least one PC that skipped the MBR and directly loaded the VBR, so it's not too surprising to see one that "fixes" the BPB in the VBR.

What filesystem type is indicated in the partition table? It's possible the firmware thinks it's trying to boot DOS, and DOS would be very unhappy if the BPB doesn't match INT 0x13.

Re: Confirmation of Bug diagnosis sought

Posted: Mon May 07, 2018 1:54 am
by MichaelFarthing
Octocontrabass wrote:This is not the first report here of the firmware "helpfully" updating the BPB and corrupting the boot code as a result, but this is the first time anyone has claimed to see it on a partitioned disk. I have seen at least one PC that skipped the MBR and directly loaded the VBR, so it's not too surprising to see one that "fixes" the BPB in the VBR.


What filesystem type is indicated in the partition table? It's possible the firmware thinks it's trying to boot DOS, and DOS would be very unhappy if the BPB doesn't match INT 0x13.
Thanks very much for this.
I originally got my stuff on to the usb from linux using dd to move everything to the first partition and didn't think to mess with the partition table afterwards.

I hadn't paid much attention to the partition table, but having now done so doesn't give a plausible excuse for the firmware's behaviour (assuming that diagnosis is correct). The partition has a type identifier of 0x0b which is FAT32 and Disk Editor correctly presented a FAT32 template when inspecting the relevant VBR. This uses byte 0x40 of the VBR for the physical drive not byte 0x24, the offending location in my case. 0x24 is used by 16 bit FAT VBR type identifier 0x04. Still, I thought I didn't want my filesystem to continue masquerading as a FAT of any sort and so tried changing the partition type to 0x20 which is described in the Andries Brouwer list as "Unused. Rumoured to be used by Willowsoft Overture File System (OFS1), if there is such a thing". It would have been nice if some agreement could have been reached to dedicate a number to "Experimental and wacky systems only used by ludicrously hopeful one-man bands in their garden sheds"

Rebooting with this revision made not a bit of difference and byte 0x7c24 continues to appear altered to 0x80.
However, it's doing its job, if messily. I think the OS is a more interesting project than the bootloader!

Re: Confirmation of Bug diagnosis sought: Solved

Posted: Mon May 07, 2018 10:52 am
by MichaelFarthing
OK I misrepreented Toshiba. 'Twas not the fault of the computer. No! It was the USB flash disk itself (supplied by ... Toshiba).

The MBR of the flash disk does all the usual things and then decides it needs to discover what the drive number is. It is clearly afraid that the firmware will not be able to decide whether the drive should be treated as a floppy or a hard disk, so asks the bios which it is and then, instead of using dl, it picks either 0 or 0x80 depending on the result. Clearly when it boots the partition it doesn't want the partition to have the same non-existent problem and so inserts its decision into the loaded vbr before calling it.

And this time, having inspected the actual code, I am confident that the mystery is adequately solved.

I don't know why I didn't think to look at the mbr of the usb drive to begin with - sometimes the obvious just hides during debugging. It finally came to light when I decided to turn the drive into a hybrid MBR/GPT. I turned the first partition into the GPT protection partition and moved the boot partition to 2. It didn't then boot at all and that led me to look at the mbr code - which in addition to its playing with drive numbers also only looks at the first partition as a potential boot.