Skip the MBR

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

How many error messages there should be? Not too many because there is no space. Of course, strings should be "compressed". No need to have "errors" duplicated three times. Beeping PC speaker would be nice.

Code: Select all

E_INVALID_GPT:  db 'GPT error.',0
E_INVALID_MBR:  db 'MBR error.',0
E_DISK_ERROR:   db 'Disk error.',0
E_REBOOT:       db ' Please reboot your computer.',0
An open issue: whether or not to use CHS at all? This MBR would probably be used on USB sticks and if these are emulated as floppies... Maybe there still should be CHS available?
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

This is the latest draft version:

Code: Select all

; Public Domain Master Boot Record

                ORG 0x0600

        ; Code start at physical address 0x00007C00. The exact CS and IP
        ; values are unknown. Register dl contains the boot drive number.
        ; Code runs in real mode and BIOS owns the hardware.

Addr_7C00:      xor cx,cx               ; cx = 0
                mov si,0x7C00           ; si = source
                mov di,0x0600           ; di = destination
                cli                     ; disable interrupts
                mov ss,cx               ; ss = 0
                mov sp,si               ; sp = 0x7C00
                sti                     ; enable interrupts
                mov es,cx               ; es = 0
                mov ds,cx               ; ds = 0
                mov ch,0x01             ; cx = 0x0100
                cld                     ; clear direction flag
                rep movsw               ; relocate 0x07C00 -> 0x00600 (512)
                mov bx,0x55AA           ; bx = 0x55AA (for INT 13h check)
                push dx                 ; save register dx (dl=boot drive)
                jmp 0x0000:Addr_0620    ; far jump to the relocated code

        ; Far jump sets cs = 0x0000 and ip = 0x0620
; ____________________________________________________________________________
Addr_0620:      mov ah,0x41             ; INT 13h installation check
                stc                     ; preset carry flag
                int 0x13                ; bios disk service call
                pop dx                  ; restore register dx (dl=boot drive)
                jc ReadSecCHSInit       ; if cf = 1, no INT 13h extensions
                cmp bx,0xAA55           ; compare return value
                jne ReadSecCHSInit      ; if not equal, no INT 13h extensions
                shr cx,1                ; check bit 0, function support

        ; Note: cx was set to zero before INT 13h call. If bit 0 was set,
        ; this gives very good confidence that INT 13h extensions are
        ; available. This also gives good confidence that the CPU is >= 80386.
                jnc ReadSecCHSInit      ; if bit 0 = 0, no INT 13h extensions
                ; jmp ReadSecLBAInit    ; fall through to ReadSecLBAInit

; ____________________________________________________________________________
ReadSecLBAInit: ; NOT IMPLEMENTED
                ; ...
                ; ...
                ; ...
                ; ...

        ; Update "ReadSector" function pointer.
                mov word [ReadSector],ReadSecLBA
                jmp CheckPartTable      ; jump to CheckPartTable
; ____________________________________________________________________________
ReadSecCHSInit: ; NOT IMPLEMENTED
                ; ...
                ; ...
                ; ...
                ; ...

                ; jmp CheckPartTable    ; fall through to CheckPartTable
; ____________________________________________________________________________
CheckPartTable: ; NOT IMPLEMENTED




ReadSecCHS:     ; NOT IMPLEMENTED
ReadSecLBA:     ; NOT IMPLEMENTED

        ; By default, assume that INT 13h extensions are not available. This
        ; pointer will be changed if extensions are available.
align 2
ReadSector:     dw (ReadSecCHS)

Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

Here is a simplified disk layout drawing. I am interested in writing bytes from 0 to 439 (LBA 0).
Attachments
Layout.png
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Skip the MBR

Post by Brendan »

Hi,
Antti wrote:That missing label will be fixed but the point was to underline this nice 32-byte aligned code section before jumping to the relocated code. It is pointless, I know, but if that code section would be used at the beginning of millions of storage drives, why not made it like a "signature boilerplate".

Brendan, you are right but I doubt I/we are able to finish this if we extend the scope too much. I think it could be realistic to make this 512-byte fully polished and bug-free.
Yes - I'm partly (but not necessarily entirely) illustrating my previous "While "minimum required functionality" is trivial and relatively well defined; "maximum desired functionality" is a hornet's nest" comment.
Antti wrote:Using the first track is possible but I would like to reserve it for GUID partition tables. The primary GPT header is at LBA 1, so it is the next sector.
For GPT, I'd want the boot manager to have its own partition (that isn't a UEFI system partition and isn't FAT).

Also note that (other than "some OSs are poo") there's no reason you can't have a tiny partition in the first cylinder. For example, you could use LBA 1 to LBA 31 for GPT (enough for 120 partitions), and then have a "boot manager partition" from LBA 32 to the end of the first cylinder. Then (for redundancy) do similar for the last cylinder of the disk. Don't forget that (for legacy OSs that only understand "MBR partitions") the boot manager can modify the MBR's partition table before starting the OS's boot loader, so that the legacy OS sees 2 partitions protecting the start and end of the disk (your boot manager and the GPT areas) plus up to 2 other partitions (that can be any 2 partitions from the GPT partition table).
Antti wrote:How many error messages there should be? Not too many because there is no space. Of course, strings should be "compressed". No need to have "errors" duplicated three times. Beeping PC speaker would be nice.

Code: Select all

E_INVALID_GPT:  db 'GPT error.',0
E_INVALID_MBR:  db 'MBR error.',0
E_DISK_ERROR:   db 'Disk error.',0
E_REBOOT:       db ' Please reboot your computer.',0
Imagine a user sends you a bug report that says "Disk error" and nothing else. After wondering how the user managed to send the bug report when their only computer doesn't boot, do you:
  • a) Tell the user their hard drive is faulty and needs to be replaced
    b) Tell the user their BIOS is incompatible and/or buggy and they should try a BIOS update
    c) Tell the user they've probably got a virus or something and malicious code has tampered with something somewhere maybe
    d) Assume your code is buggy, apologise to the user and tell them you'll send them a fixed version as soon as you find the bug
    e) Assume the user is booting from removable media and removed it at the wrong time
    f) Ask the user if the BIOS itself said "Disk error" or your MBR said "Disk error" or if the OS's boot loader said "Disk error", and then get annoyed when the user has no way to tell
    g) Be honest; and admit to the user that your error handling is so bad that it's impossible for you to help them in any way at all :roll:
If you try to squeeze everything into 512 bytes you'll need to severely reduce functionality and have completely useless error messages that fail to help the user (or developers) figure out what to do about any problem. The only way around that is to shift functionality elsewhere, so that the first sector contains very little functionality and requires much less error handling.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

Brendan wrote:The only way around that is to shift functionality elsewhere, so that the first sector contains very little functionality and requires much less error handling.
Absolutely true. I have to think about this. In hindsight, those error messages I suggested are so bad that it feels bad. It is impossible to have that much functionality and have acceptable error handling procedures.

It is better to simplify this and discard INT 13h extensions and unconditionally use traditional disk access methods. The tiny partition in the first cylinder is something I have to look at.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

There is so much controversy that this project probably does not make sense. Actually, the project goal makes sense in itself but there are too many things to take into consideration. Maybe it is better to leave the MBR as a part of the real boot manager. The project is halted for now.

The conclusion:
  • Worldwide acceptance? A nice dream.
  • Boot Manager Partition? Probably a good idea.
  • Standard interface between MBR and Boot Manager? A good idea.
  • Same MBR code for "Legacy partitions" and "GUID partitions"? Maybe a bad idea.
  • Chain of trust? An unknown area.
A standard MBR code section is not hopeless but too hard.
embryo

Re: Skip the MBR

Post by embryo »

Brendan wrote:GRUB's MBR ignores the partition table and loads GRUB's stage 2; it doesn't check the partition table and load the first sector of an active partition (which I assume is the goal here).
But how it can load something from an active partition without checking partition table?
Antti wrote:There is so much controversy that this project probably does not make sense. Actually, the project goal makes sense in itself but there are too many things to take into consideration. Maybe it is better to leave the MBR as a part of the real boot manager. The project is halted for now.
May be it is better to say that the project is paused for some short time?

I mean it is a nice idea to show to all osdevers an open process of a boot manager development (or at least it's first stage).

And to encourage you for resuming your efforts I want to share my understanding of the booting process. There is a common starter for every OS and it sits right in the first disk's sector. It's also true for USB and floppies. But may be UEFI has something different in some incompatible mode, I just don't know. However, to be compatible with many existing bootloaders, UEFI just has to have support for the same common starter and preserve the MBR as it is. So, the first thing to decrease the level of "so much controversy", is the clear understanding of the common starter and MBR roles.

The common starter's role is just to start some more specific starter (or loader). The goal is relatively simple and then it allows us to make a common starter nice enough to have some descriptive error messages in it. But before the goal hunting starts there should be some understanding of the whole picture with the common starter and a more specific ones. First of all we need to remember that the variety of things that can be loaded by a bootloader it too wide to deal with it in just 512 bytes. So, in our picture we can draw some small and very restricted area with the name "common starter" and leave all the rest of the picture for other things. The other things are mostly those more specific loaders, mentioned above. They are created with the information about the actual system to be loaded into computer's memory. When the actual system is, for example, Linux the more specific starter knows the sequence of actions it should execute and is able to locate additional code, that implements those actions. It is also true for any other system, like GRUB, Windows, your OS or whatever - the more specific loader always has information (it knows) about the actual loading process and all the helper code, required during the loading.

And here we can see an important difference between the common starter and a more specific one. The difference is in the amount of information they have to accomplish their tasks. The first has enough information to select and load an appropriate more specific loader. And the second has enough information to select and load all the code, that is required to properly organize and support a target system's data in the computer's memory.

Finally, after there is the clear picture with common loader, more specific ones and actual system bootstrapping code, we can look at some technical details. The first thing you need to consider is the location of the more specific loader. To find such information it is required to understand the partition table. It shows us at least one very useful thing - the location of an active partition. When we know the place where the active partition is, we can relatively safely assume that the bootloader in it's VBR (volume boot record) is just the actual more specific loader we are looking for. But if there's no active partition (and partitions at all, like is the case with floppies), it is much better to place the more specific loader right in the first sector of such disk, because the common starter just have no enough information to find the more specific one. So, we have to restrict the usage area of the common loader to the disks that have partition tables. And if somebody wants to load his system from a floppy, then he should know that it is better to use a more specific loader instead of the common one in such case.

Next it is possible to discuss the actual technics for the tasks of parsing partition table, loading sectors and dealing with errors, but I leave those areas to more experienced developers (like Brendan) because my scope of acquittance with such details is narrower than other people have.
Antti wrote: The conclusion:
  • Worldwide acceptance? A nice dream.
  • Boot Manager Partition? Probably a good idea.
  • Standard interface between MBR and Boot Manager? A good idea.
  • Same MBR code for "Legacy partitions" and "GUID partitions"? Maybe a bad idea.
  • Chain of trust? An unknown area.
  • Worldwide acceptance? A nice dream.

    Why not? The open development process for the common loader is a rare bird. May be this one is even the first on earth.
  • Boot Manager Partition? Probably a good idea.

    It's good for a general boot manager, but not for a specific system loader. Just because of the nature of a specialized and a generalized code.
  • Standard interface between MBR and Boot Manager? A good idea.

    The standard interface here is the environment left after the bootloder's code is loaded by the common loader. And this environment has been already defined long ago.
  • Same MBR code for "Legacy partitions" and "GUID partitions"? Maybe a bad idea.

    Why not? The difference is just in detection and parsing parts of a code. Both are easy to implement and not very space consuming.
  • Chain of trust? An unknown area.

    For me it's also something still undefined. But may be I even know it, but with some different name.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Skip the MBR

Post by Brendan »

Hi,
embryo wrote:
Brendan wrote:GRUB's MBR ignores the partition table and loads GRUB's stage 2; it doesn't check the partition table and load the first sector of an active partition (which I assume is the goal here).
But how it can load something from an active partition without checking partition table?
Nothing in GRUB's MBR (at least, the version linked to earlier) checks or cares about the partition table - it just loads "stage 2" starting from the sector indicated in a fixed location in the MBR (that I'd assume must be set by whatever utility installed GRUB). I'd also assume something in that "stage 2" does check/use the partition table.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Skip the MBR

Post by Brendan »

Hi,
embryo wrote:
Antti wrote:
  • Chain of trust? An unknown area.
For me it's also something still undefined. But may be I even know it, but with some different name.[/list]
The basic idea is reasonably simple - create a hash of the kernel so that other systems can determine if the kernel/OS has been compromised by malicious software (more specifically, to allow "remote attestation").

The problem is that malicious code in a boot loader can do "something" (e.g. install a temporary IRQ handler) that corrupts the kernel after the hash has been created. To fix that you need to create a hash of the boot loader and then combine it with the kernel's hash in some way; so that other systems can determine if the boot loader or kernel/OS has been compromised by malicious software. However, the MBR/boot manager could corrupt the boot loader after the hash has been created, so...

This is why it's a "chain of trust". Each piece (from early firmware all the way to kernel) has to be hashed, and each piece is responsible for creating the hash for the next piece. If one piece (e.g. MBR) doesn't do this, then the chain is broken and you can't determine if the kernel has been compromised by malicious software anymore.

Of course it's more complicated than that. For example, the hash needs to be stored somewhere secure (and not in normal RAM where anything can modify it) and you also have to protect the code used to create the hash. This is mostly what the TPM chip provides.

In practice, for an MBR, it means calling one BIOS function (to detect of "TCG BIOS" functions are supported); and if it is supported calling another BIOS function (that creates a hash of the boot loader that the MBR loaded and extend a "Platform Configuration Register" with that hash) after loading the boot loader but before passing control to it.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

embryo wrote:May be it is better to say that the project is paused for some short time?
It is in a halt state but external interrupts will resume execution. I would compose the instructions if I knew how it should work. Besides, maybe this project is ten years too late. What if the MBR just printed the message: "Please use UEFI-compatible hardware."

However, thank you both for the extensive replies.
embryo

Re: Skip the MBR

Post by embryo »

Brendan wrote:Nothing in GRUB's MBR (at least, the version linked to earlier) checks or cares about the partition table - it just loads "stage 2" starting from the sector indicated in a fixed location in the MBR (that I'd assume must be set by whatever utility installed GRUB).
May be it just relies on the fact, that 99% of MBRs have the first partition as active and, because of partition table structure, the location of the disk offset is always placed at the only address within the MBR? And yes, it's error prone, but in 99% of cases it works and GRUB developers just don't pay attention to the missed 1%.
embryo

Re: Skip the MBR

Post by embryo »

Brendan wrote:The basic idea is reasonably simple - create a hash of the kernel so that other systems can determine if the kernel/OS has been compromised by malicious software (more specifically, to allow "remote attestation").
I see it looks like certificate chain, when every certificate is signed by the previous in the chain until the root certificate.
Brendan wrote:The problem is that malicious code in a boot loader can do "something" (e.g. install a temporary IRQ handler) that corrupts the kernel after the hash has been created. To fix that you need to create a hash of the boot loader and then combine it with the kernel's hash in some way; so that other systems can determine if the boot loader or kernel/OS has been compromised by malicious software. However, the MBR/boot manager could corrupt the boot loader after the hash has been created, so...
If a user incidentally (just to try it) inserts some untrusted USB key and boots from it, then everything in the trust chain is useless. It's also the case when some stranger is able to insert the same USB stick and boot from it.

So, the trust chain works only for the software, that can be run under some installed OS. But the OS just must have some means to prevent any software to write to it's kernel area or to the MBR. However, if there are two or more OSes installed and one of them is compromised, then, when booting another OS, it can help to detect the problem.
Brendan wrote:Of course it's more complicated than that. For example, the hash needs to be stored somewhere secure (and not in normal RAM where anything can modify it) and you also have to protect the code used to create the hash. This is mostly what the TPM chip provides.
Yes, I suppose it's better to include some hardware into the security package, than to relay on a software only. But still some unauthorized access to the PC can break the wall. So, as a solution, I can suggest something like removable USB BIOS, which keeps the information about MBR, boot loader, kernel or whatever hashes and performs hash checks on every booting and can be kept secure just because it can be easily detached and carried with the owner. Also, if there is some OS's vulnerability, that was exploited and allowed to install some malicious software that had made changes in the kernel, even then the USB BIOS can detect the kernel's change and warn it's owner.

But because in such case the USB is always required, it is possible to create a USB-bootmanager, that will check everything, including the BIOS's hash (because of SMM mode it can overuse). And creation of such a secure solution is relatively easy for an osdever. No hardware is required except the USB stick. But it still won't work with Windows or Linux with the automatic remote update turned on, just because eventually some update can change something in the kernel. So, I see such updates as really serious OS vulnerability, at least because the Microsoft or any big Linux distributor can do anything they want with your PC in some opaque manner.
embryo

Re: Skip the MBR

Post by embryo »

Antti wrote:I would compose the instructions if I knew how it should work.
It means you still have no clear picture of the booting process. But here is the place, where everybody can ask an clarify his vision of anything OS-related.
Antti wrote:Besides, maybe this project is ten years too late. What if the MBR just printed the message: "Please use UEFI-compatible hardware."
It's too easy to keep old (BIOS related) parts of legacy OSes instead of just rejecting to boot from MBR on a PC, that was built, for example, 5 years ago. So, I suppose such message will be actual somewhere in 2030-s or even later.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Skip the MBR

Post by Antti »

I have carefully read the latest UEFI specification (Section 5, GUID Partition Table). I am not sure whether it is allowed to have the GPT layout without a protective MBR.
UEFI wrote:5.2.3 Protective MBR

...
A Protective MBR may be located at LBA 0...
...

5.3.1 GPT overview

...
LBA 0 (i.e., the first logical block) contains a protective MBR...
...
If there is a valid protective MBR, there is only one legacy partition covering the whole disk starting from LBA 1. It is undefined, but allowed, to set the "bootable" flag. Of course, the partition starts with bytes "EFI PART", so it is not really very bootable. If it is not marked as bootable, then the BIOS may check that there are no bootable partitions and it may not load the MBR at all. Marking it as bootable and trusting that there are no BIOS versions skipping the MBR and loading LBA 1 even if there is a "non-FAT filesystem" may guarantee that on every computer the MBR code section gets jumped into.

In reality, there seems to be "hybrid" configurations so that there are GPT partitions without a valid protective MBR.
embryo

Re: Skip the MBR

Post by embryo »

Antti wrote:I have carefully read the latest UEFI specification (Section 5, GUID Partition Table). I am not sure whether it is allowed to have the GPT layout without a protective MBR.
UEFI wrote:5.2.3 Protective MBR

...
A Protective MBR may be located at LBA 0...
...

5.3.1 GPT overview

...
LBA 0 (i.e., the first logical block) contains a protective MBR...
...
There are such words:
UEFI wrote:5.2.1 Legacy Master Boot Record (MBR)

A legacy MBR may be located at LBA 0 (i.e., the first logical block) of the disk if it is not using the
GPT disk layout (i.e., if it is using the MBR disk layout). The boot code on the MBR is not executed
by UEFI firmware
So, in case of the legacy booting there should be a legacy MBR. And in case of the UEFI booting there should be a protective MBR. The word "may" is correctly used to show us that there are some variants possible. And, obviously, a writer had assumed that we are acknowledged about the variants.
Antti wrote:If it is not marked as bootable, then the BIOS may check that there are no bootable partitions and it may not load the MBR at all.
If it is a UEFI system then the BIOS should be aware of the situation and play with it correctly. And if it is a legacy system, then the BIOS just loads the first sector and starts it's code. In fact it is complicated by some checks some BIOSes can perform, but generally the main principle is always the same - BIOS should work correctly with the MBR's data if the data follows some standards the BIOS expects.
Antti wrote:Marking it as bootable and trusting that there are no BIOS versions skipping the MBR and loading LBA 1 even if there is a "non-FAT filesystem" may guarantee that on every computer the MBR code section gets jumped into.
In case if you want to write your own bootloader instead of the whole UEFI thing the situation can be even more complicated. There should be an information about how exactly the UEFI handles the situation when the MBR is replaced with a legacy one. I suppose the information can be found after careful reading of the chapter 3 of the UEFI specification where UEFI firmware boot manager is described, but I haven't done this yet. At least the specification states that:
UEFI wrote:1.7 Migration Requirements

Migration requirements cover the transition periodfrom initial implementation of this specification
to a future time when all platforms and operating systems implement to this specification. During
this period, two major compatibility considerations are important:
• The ability to continue booting legacy operating systems
But where in the specification the information about legacy booting is located I just don't know. May be it is even a part of a particular BIOS implementation.
Post Reply