Page 1 of 2

BIOS drive number

Posted: Thu Jul 19, 2012 1:23 am
by suslik
I know that BIOS drive numbers doesn't correspond to ATA master/slave 1-st/2-d channel. I've read post http://forum.osdev.org/viewtopic.php?f= ... 23&start=0. There Brendan suggested to use INT 13 Extensions - GET DRIVE PARAMETERS function to find out the correspondence between BIOS and ATA.
But I want to fit into 512 bytes, so I can't use it.

May be it is useful to write into the special field of kernel header the HDD serial number when installing the kernel? Is it convenient? Unfortunately, when kernel is installed on a CD this method won't work.

Re: BIOS drive number

Posted: Thu Jul 19, 2012 2:49 am
by Brendan
Hi,
suslik wrote:But I want to fit into 512 bytes, so I can't use it.
Why?

The only thing that will fit in 512 bytes (minus the BPB and/or partition table) is a worthless piece of junk; with missing features, missing capabilities and severely deficient error handling.

The only things the first 512 bytes should do is:
  • (Optionally) relocate itself to a lower address
  • Initialise the CPU's segment registers and stack
  • Setup the video to ensure that the system is using something (e.g. 80 * 25 text mode) and not something else
  • An "abort boot" routine that displays an error message and causes the system speaker to beep (in case the computer has no video card or no monitor)
  • Load the second 512 bytes, with the "least bad" error handling you can cram into it. Due to space limitations the error handling will probably be a simple message saying "failed to load second sector".
The second 512 bytes should contain better code to load a variable number of sectors from disk (e.g. LBA to CHS conversion, built in retries, intelligently splitting large reads into multiple smaller reads, etc), with much better error handling. In this case you're looking at around 20 different error strings that say exactly what failed and why (e.g. see the list of errors here). Of course the second 512 bytes should also have code to load the remaining sectors of the boot loader. If you're lucky, you might have enough space to do an "is the CPU an 80386 or later" check before loading the remaining sectors of the boot loader (so that your "load sectors" routine can safely use 80386 or later instructions and 32-bit registers).

The actual work involved in loading a kernel (or second stage); and detecting memory, setting up a graphics video mode, etc doesn't/shouldn't start until the third sector of the boot loader (not the second sector, and definitely not the first).


Cheers,

Brendan

Re: BIOS drive number

Posted: Thu Jul 19, 2012 4:31 am
by tom9876543
I think Brendan summed up the situation nicely.

Why do you need to determine the specific hardware mapping of the boot drive in the first 512 bytes?

You only need to worry about that when you change to protected mode (and the BIOS isn't available any more).

Re: BIOS drive number

Posted: Thu Jul 19, 2012 7:18 am
by egos
Yes. My kernel gets BIOS drive number from boot loader in RM and converts it into special structure that is used in PM to find boot device (if boot driver doesn't register boot device obviously through RegBootDevice).
suslik wrote:May be it is useful to write into the special field of kernel header the HDD serial number when installing the kernel? Is it convenient? Unfortunately, when kernel is installed on a CD this method won't work.
You are wrong. CD is detected through function 48h perfectly (look at my 3rd post here). If error occurs try to use IDE mode.

Re: BIOS drive number

Posted: Thu Jul 19, 2012 11:40 pm
by suslik
To Brendan: thank you a lot for your answer, but I don't think that boot-loader must check monitor existence and VGA-mode, do more than one attempt to read from HDD (this makes sense when reading floppy), access HDD in CHS (LBA is enough) print detail message about disk error (it is enough only to print Read Error) and check CPU model.

I think that the limitations mentioned above are pretty reasonable and my boot-loader (you can see it in http://forum.osdev.org/viewtopic.php?f=1&t=25568) is good enough for me.

I fit my boot-loader in 512 bytes but suddenly I realized that BIOS drive number has no correspondence with ATA :( I can tolerate that I must try to turn on A20 by means of 3 methods but I can't understand why I must do so much work (using INT 13 Extensions - GET DRIVE PARAMETERS) to do a simple task - give a kernel the information from what drive it has been loaded. I think writing HDD serial number while installing kernel on HDD is enough to do this task. Of course this method won't work in case of CD, but I don't plan to install kernel on CD now, but if this is happen I will write zeroes in the HDD serial field to tell kernel that it has been loaded from CD.

Do I make a big mistake?

Re: BIOS drive number

Posted: Fri Jul 20, 2012 1:45 am
by egos
Hah!
>>> do more than one attempt to read from HDD (this makes sense when reading floppy)
I do this.
>>> access HDD in CHS (LBA is enough)
I do this if no EDD support is available for the drive.
>>> print detail message about disk error (it is enough only to print Read Error)
In stage 0/1 I use 2-3 error messages.
>>> and check CPU model
In stage 0/1 I use 8086 instruction set only.
suslik wrote:I fit my boot-loader in 512 bytes but suddenly I realized that BIOS drive number has no correspondence with ATA :( I can tolerate that I must try to turn on A20 by means of 3 methods but I can't understand why I must do so much work (using INT 13 Extensions - GET DRIVE PARAMETERS) to do a simple task - give a kernel the information from what drive it has been loaded. I think writing HDD serial number while installing kernel on HDD is enough to do this task. Of course this method won't work in case of CD, but I don't plan to install kernel on CD now, but if this is happen I will write zeroes in the HDD serial field to tell kernel that it has been loaded from CD.

Do I make a big mistake?
Switching to PM with enabling A20 in stage 1 is a big mistake! Detecting boot device through BIOS drive number is even more complex work than you think. Label marking (writing HDD serial number and so on) has own disadvantages.

Re: BIOS drive number

Posted: Fri Jul 20, 2012 2:07 am
by suslik
egos, you just wrote what things you do in your boot-loader. But it is very useful for me would be the explanation WHY I should do the same and WHY my boot-loader restrictions are so bad.

I remind my boot-loader restrictions:
1) Don't check monitor existence and current VGA mode: I hardly can imagine the
situation when there is no monitor or it works in monochrome mode
2) Don't try read HDD after read error: I think that trying to reread is only useful for floppy
3) In case of HDD read error only print "Read Error" and hang: I don't think that the precise HDD error is useful for user
4) Don't check CPU model (I need 80386) (I have seen only 80486 OverDrive in my life)
5) Don't use CHS - only LBA: this may be the most significant restriction. But my
Samsung HDD 1.08 GB dated 1995 supports LBA (I know that with old BIOS can be problems)
6) Don't do a lot of work to get a correspondence between BIOS drive number and
ATA: just use drive serial number hard coded in kernel (it is written by kernel
installer)

Re: BIOS drive number

Posted: Fri Jul 20, 2012 7:05 am
by egos
Well, I will try:
1) while you are using function 0Eh don't worry about this; but including BELL character into error message would be useful;
2) maybe you are right; sometimes I think about this but a traditional technique still prevails;
3) I think about user too so I use two error kinds: "logical errors" that are inherently not errors but they make a boot process unsuccessful (system file not found, not enough memory for loading system files and so on), "physical errors" such as disk read error, file system error and so on;
4) almost everybody does as you do;
5) the problem depends just on BIOS;
6) label marking is not so good because it requires reading each storage device during boot device detection.

Re: BIOS drive number

Posted: Fri Jul 20, 2012 8:25 am
by Brendan
Hi,
suslik wrote:To Brendan: thank you a lot for your answer, but I don't think that boot-loader must check monitor existence and VGA-mode, do more than one attempt to read from HDD (this makes sense when reading floppy), access HDD in CHS (LBA is enough) print detail message about disk error (it is enough only to print Read Error) and check CPU model.
If something goes wrong reading from disk, do you want the end user to be able to figure out if the problem was a bug in your software, a failing disk drive or bad disk, or some other problem? Do you want end users to send you bug reports that say nothing useful, so that there's no way for you to start trying to find problems?

There is nothing that says the video exists, and (for server rooms, etc) often there is no keyboard and no monitor. This makes it harder to report useful error messages (most OSs support things like sending errors to serial and/or network but this can't be done very early during boot as it takes extra code). There is also nothing to say that the video (if it exists) will be in 80*25 text mode, and if it's not then there's no guarantee that the BIOS function you're using to display characters will work (it almost never works in VBE modes).
suslik wrote:I think that the limitations mentioned above are pretty reasonable and my boot-loader (you can see it in http://forum.osdev.org/viewtopic.php?f=1&t=25568) is good enough for me.
Let's look at this boot loader then. It doesn't say what it's for (e.g. floppy, hard disk, PXE/network, whatever), so to start with I'll need to guess:
  • There is no BPB, it doesn't use the old "int 0x13, ah = 0x02" BIOS function (most systems don't support the extended functions for floppy) and doesn't do retries; therefore the boot sector won't work on floppy disks
  • There is no partition table and no attempt to find the start of the partition that the OS has been installed into; therefore the boot sector won't work for legacy hard drive
  • it's limited to 512 bytes; therefore it's obviously not intended for booting from "no emulation El Torito" CD or booting from network (as the 512 byte limit doesn't apply in these cases)
  • it's real mode code; therefore it's obviously not intended for UEFI
There are no other cases that I'm aware of - it's virtually useless for booting from anything. I'm going to assume that it's intended for booting from (unpartitioned!) hard disks.

Now for the list of problems:
  • Doesn't support partitions
  • Doesn't support redundancy (e.g. software raid mirrors, etc)
  • Doesn't check if the CPU is 80386 or later before using 32-bit registers (may crash with undefined behaviour and no sane error message if a user tries to boot it on an ancient computer)
  • The memory detection is entirely inadequate (e.g. doesn't even try to use "int 0x15, eax=0xE820")
  • The "enable A20" code is also inadequate (e.g. doesn't test if A20 is already enabled, doesn't try to use "int 0x15, ax=0x2401", assumes that the system supports the relatively unsafe "fast A20" method for no reason)
  • Error messages are completely useless (do you honestly think the end user is going to be able to understand a single letter?)
  • Assumes the code before it left the video in text mode for no reason
  • Doesn't set a decent video mode (e.g. maybe something like 1024 * 768 with 16 million colours) for the OS to use (which means that the OS will probably be stuck with obsolete/crappy text mode until you write native video drivers)
  • Loads the kernel one sector at a time (the slowest way) rather than reading multiple sectors
  • Doesn't check if the kernel will actually fit in memory (another cause of crash on ancient computers, no graceful "kernel too big" error message).
suslik wrote:Do I make a big mistake?
I don't know. Mostly it's all about the quality of the end product, and there are valid reasons for creating very low quality code (e.g. you might just want some experience before attempting a real OS, and might only want to see a test kernel boot on one computer, and/or might be planning to rewrite it all properly later on). ;)

Also note that it's almost never a good idea to use the same boot code for different cases - the different cases are just too different. Typically you'd want one boot sector for floppy (if you support floppy), one (or more) for hard disk and USB flash, one for "no emulation" CD, one for PXE/network (plus maybe 2 more for 32-bit and 64-bit UEFI one day).


Cheers,

Brendan

Re: BIOS drive number

Posted: Fri Jul 20, 2012 8:44 am
by sandras
Brendan wrote: [*]Error messages are completely useless (do you honestly think the end user is going to be able to understand a single letter?)
http://en.wikipedia.org/wiki/Lilo_bootloader
Output

When LILO loads itself it displays the word “LILO”. Each letter is printed before or after some specific action. If LILO fails at some point, the letters printed so far can be used to identify the problem.

(nothing)
No part of LILO has been loaded. LILO either isn't installed or the partition on which its boot sector is located isn't active. The boot media is incorrect or faulty.
L
The first stage boot loader has been loaded and started, but it can't load the second stage boot loader. The two-digit error codes indicate the type of problem. This condition usually indicates a media failure or bad disk parameters in the BIOS.
LI
The first stage boot loader was able to load the second stage boot loader, but has failed to execute it. This can be caused by bad disk parameters in the BIOS.
LIL
The second stage boot loader has been started, but it can't load the descriptor table from the map file. This is typically caused by a media failure or by bad disk parameters in the BIOS.
LIL?
The second stage boot loader has been loaded at an incorrect address. This is typically caused by bad disk parameters in the BIOS.
LIL-
The descriptor table is corrupt. This can be caused by bad disk parameters in the BIOS.
LILO
All parts of LILO have been successfully loaded.
I think there's two kinds of simplicity in in developing any kind of software, that has some sort of user interface - the simplicity for the developer('s) (the software IS simple), and the simplicity for user('s) (the software APPEARS simple).

Re: BIOS drive number

Posted: Fri Jul 20, 2012 2:13 pm
by egos
Brendan wrote:
  • There is no BPB, it doesn't use the old "int 0x13, ah = 0x02" BIOS function (most systems don't support the extended functions for floppy) and doesn't do retries; therefore the boot sector won't work on floppy disks
  • There is no partition table and no attempt to find the start of the partition that the OS has been installed into; therefore the boot sector won't work for legacy hard drive
  • it's limited to 512 bytes; therefore it's obviously not intended for booting from "no emulation El Torito" CD or booting from network (as the 512 byte limit doesn't apply in these cases)
This sounds like a description of boot image for "No emulation" mode.
doesn't try to use "int 0x15, ax=0x2401"
Do you use it? I don't.

Re: BIOS drive number

Posted: Fri Jul 20, 2012 7:01 pm
by Brendan
Hi,
Sandras wrote:I think there's two kinds of simplicity in in developing any kind of software, that has some sort of user interface - the simplicity for the developer('s) (the software IS simple), and the simplicity for user('s) (the software APPEARS simple).
Sure. I'd also say that often these are opposites - it can take a lot of work (and complexity) to make things simple for the users.

LILO is an example of the former (simple for developers, until they get a bug report that lacks any useful information).
egos wrote:
doesn't try to use "int 0x15, ax=0x2401"
Do you use it? I don't.
I do use it.

For A20, I test if it's already enabled (and do nothing if I can); then I try the BIOS function and test if it's enabled again; then (only if I must) I try the keyboard controller method and test again; and finally (if there's no other choice) I try the "fast A20" method and test again. If none of that works I boot with A20 disabled (which is easy for me, as I enable paging before kernel is loaded and don't care what physical addresses are used ;) ).


Cheers,

Brendan

Re: BIOS drive number

Posted: Fri Jul 20, 2012 11:18 pm
by Love4Boobies
Brendan wrote:
egos wrote:
doesn't try to use "int 0x15, ax=0x2401"
Do you use it? I don't.
I do use it.

For A20, I test if it's already enabled (and do nothing if I can); then I try the BIOS function and test if it's enabled again; then (only if I must) I try the keyboard controller method and test again; and finally (if there's no other choice) I try the "fast A20" method and test again. If none of that works I boot with A20 disabled (which is easy for me, as I enable paging before kernel is loaded and don't care what physical addresses are used ;) ).
Something perhaps worth mentioning (because I've had to explain it twice on IRC): The interrupt function that does the A20 check is useful---you can't get away with only calling the function that enables it and assume it's disabled when it returns failure because it may do so specifically due to it having been previously enabled, perhaps by a boot loader that chain loaded yours.

Re: BIOS drive number

Posted: Sat Jul 21, 2012 3:15 am
by egos
Brendan wrote:If none of that works I boot with A20 disabled (which is easy for me, as I enable paging before kernel is loaded and don't care what physical addresses are used ;) ).
It's wonderful for me if this works :) If so, why do you try to enable A20?

Re: BIOS drive number

Posted: Sat Jul 21, 2012 3:38 am
by Antti
My "Legacy BIOS" boot loader currently makes a couple of assumptions of the availability of certain features. At first, I verify the availability of the INT 13h Extensions. If they are available, I assume that the CPU supports 32-bit registers. Then I have "int 0x15, EAX=0xE820" memory map gathering routine. If it works, I assume that the cpuid opcode should be available and I check the x86-64 compatibility (with needed cpu features). Then I switch to the VBE video mode after finding a suitable one.

If I have gone so far with my booting, I check the A20. If it is disabled, I try to enable it with "int 0x15, AX=2401". If it failed or is not available, I would be very disappointed. I think that it should be available if all the previous things have passed succesfully (meaning that the machine is quite modern). As a last resort, I would do "Fast A20" and jump to "is-A20-enabled-loop". If it hanged, I would give up.

Of course, I have a lot more things in my boot loader code (two stages) but these steps are relevant for a feature assumption discussion. The same code base is used to make builds for a usb/hardrive boot image and El-torito CD boot image (no emulation). Both images support UEFI booting too and that's what I am focused on.
Brendan wrote:If none of that works I boot with A20 disabled (which is easy for me, as I enable paging before kernel is loaded and don't care what physical addresses are used).
I clearly haven't taken the error tolerance so in-depth. You clearly have prepared for everything. Good job!