Page 1 of 1

SSD's, RAID, and other storage stuff

Posted: Thu Aug 12, 2010 3:48 pm
by MDM
Hey, so I got some questions for you all (I searched through the wiki/forums, didn't find any topics like this, so sorry if I missed something and am duplicating topic subjects). So I know a little bit about normal HD driver algorithms, but I don't know much about reading/writing to SSD, or RAID controllers, and what algorithms are the most efficient for them. It seems since SSD doesn't have any moveable parts, FCFS would be best, but I would like to make sure... Also what about RAID? What type of algorithms are best for RAID controllers, as well? Does the best RAID algorithm change depending on what hard drives are in the RAID array? Specifically, what about an SSD and a hard disk used in a RAID array? What about RAID's with disks of different sizes? (Sorry for all the questions relating to this...).

Also, just a misc question... So when I turn my computer on it searches for the bootsector (0x7c00) (or does it search for bootsector on first partion...?) on my hard disk, if it finds a bootable bootsector, it loads it, which then loads the bootloader, which (depending on which bootloader you have...) reads the partion-table, and then asks the end user which partion you want to load, when you load the said partion, it loads another bootloader for that specific partion/OS on that partion. How does it know where that bootloader is/what size it is. Is their a localized bootsector for partions as well as a global bootsector? How would it find that bootsector?

Re: SSD's, RAID, and other storage stuff

Posted: Thu Aug 12, 2010 5:11 pm
by bewing
Answer#2 first: There are two competing boot methods these days, UEFI and MBR. MBR is the one that has been around forever -- UEFI may or may not replace it in the future, and is much more complicated.
How MBR works: on a drive that has a partition table, the partition table and Master Boot Program are stored on absolute LBA 0 of the disk. The BIOS loads LBA 0 to 0x7c00 (one sector only). Then the BIOS transfers control (JMPs) to address 0x7c00 (the beginning of the Master Boot Program).
This program scans the partition table (the last 66 bytes below 0x7e00) to find the active partitions on this disk. A normal MBR program expects to find only one active partition. The partitions are not in any order. A normal MBR program just boots the one active partition without asking any questions. (Booting a partition is explained just below.) Each partition is permanently assigned a range of LBAs on the disk.

It is possible to create an MBR program that allows the user to select which partition to boot. This is called "Dual Booting". There are other ways to do Dual Booting also.

A partition contains a single filesystem, plus a "bootsector" (if the partition is bootable). The bootsector is always at the very beginning of the partition. The contents of the filesystem and the bootsector correspond to the OS that is stored in that filesystem -- that is, the bootsector for each partition is always OS-specific.

A typical MBR only loads the very first sector of the selected boot partition (this is stupid and unfortunate -- a smart MBR loads more) -- and the data is loaded to 0x7c00 (this is also stupid and unfortunate). The MBR is supposed to store a copy of the partition table somewhere, and pass a pointer to the correct partition table entry to the OS-specific bootsector before the partition table gets overwritten.

If an OS-specific bootsector program is more than one sector long, it may have to load the rest of itself from disk into memory before it can be fully executed. The OS-specific bootsector may load and run a bootloader (called a "second stage bootloader") which may run a kernel-specific init program (which can be considered a third stage bootloader).

On media that do not have partition tables, the entire MBR step is skipped. The entire media is one partition (with one filesystem) with a OS-specific bootsector at the beginning.


Answer to #1: RAID controllers need drivers, but the drivers do not have anything to do with the RAID algorithm. The algorithm is completely invisible to the OS. The OS reads/writes data from/to the device just like it was any other block storage device, and the controller works its own internal magic to do any needed mirroring/striping/rebuilding on the storage devices that it controls. So the overall answer is that you do not need to worry about any of this -- the hardware handles all of it for you, invisibly.

Re: SSD's, RAID, and other storage stuff

Posted: Thu Aug 12, 2010 5:45 pm
by MDM
Awesome, thanks for the help!

Re: SSD's, RAID, and other storage stuff

Posted: Thu Aug 12, 2010 6:09 pm
by Brendan
Hi,

Doh - bewing beat me to it! Hmm - post it anyway...
MDM wrote:Hey, so I got some questions for you all (I searched through the wiki/forums, didn't find any topics like this, so sorry if I missed something and am duplicating topic subjects). So I know a little bit about normal HD driver algorithms, but I don't know much about reading/writing to SSD, or RAID controllers, and what algorithms are the most efficient for them. It seems since SSD doesn't have any moveable parts, FCFS would be best, but I would like to make sure...
Internally SSD works on large blocks (e.g. maybe 1 MiB blocks). For example, if you write 4096 bytes then the SSD might (internally) read 1 MiB, modify the 4096 bytes within it, then write the entire 1 MiB back. For maximum performance you might want to cache blocks in RAM (e.g. only ever read large blocks), and combine writes that effect the same "large block" into a single write (e.g. if someone wants to write 4096 bytes at offset 0 and someone else wants to write 4096 bytes at offset 0x8000, then you update the data in your cache and write the entire large block (rather than doing 2 separate/smaller writes).

Also, most modern OS's have I/O priorities. Taking this into account you'd want a list of pending read/write requests. You'd find the highest priority request in the list, work out which large block/s are needed for the highest priority request, then find any other pending requests in the list that effect the same large block, then combine all of those requests so that you only do 1 read (if necessary) and 1 write to satisfy all of the requests.

Of course there's still cases where you'd want to do "partial block" reads/writes. For example, if there's very little chance that reading a full block will help you avoid future reads from that block (maybe your cache is tiny due to lack of free RAM) and the bottleneck is USB bandwidth; then reading/writing a partial block might be better.
MDM wrote:Also what about RAID? What type of algorithms are best for RAID controllers, as well? Does the best RAID algorithm change depending on what hard drives are in the RAID array? Specifically, what about an SSD and a hard disk used in a RAID array? What about RAID's with disks of different sizes? (Sorry for all the questions relating to this...).
For software RAID you end up with a layer of software that sends read/write requests to the disk drivers themselves; and the disk drivers work the same as they would without RAID (and different drivers may use different algorithms to suit different device types). For hardware RAID the RAID controller sorts it out - you just pretend it's a single device (even though it's not). In this case you might want to attempt to do "large blocks" if any of the devices are SSD, and you'd might want to attempt to limit head movement if any of the devices are hard disks (which means that for a mixture of hard disks and SSD you might want to attempt to do both). It'd also depend on the RAID controller - maybe the RAID controller has it's own caching, and maybe the RAID controller supports NCQ ("Native Command Queue", where the controller can decide to do things in a different order to improve performance).
MDM wrote:Also, just a misc question... So when I turn my computer on it searches for the bootsector (0x7c00) (or does it search for bootsector on first partion...?) on my hard disk, if it finds a bootable bootsector, it loads it, which then loads the bootloader, which (depending on which bootloader you have...) reads the partion-table, and then asks the end user which partion you want to load, when you load the said partion, it loads another bootloader for that specific partion/OS on that partion. How does it know where that bootloader is/what size it is. Is their a localized bootsector for partions as well as a global bootsector? How would it find that bootsector?
For "PC BIOS" systems; when you turn the computer on the BIOS searches for a bootable device. Usually there's a search order that can be controlled by BIOS settings (e.g. "first floppy, then first CD, then first hard drive"). For different types of devices the BIOS does different things.

For floppy disks and hard disks (and USB devices that look like a disk to the BIOS) the BIOS loads the first sector (usually 512 bytes) from the device at 0x00007C00 and jumps to it. That's all the BIOS does. The first sector might contain a partition table and a piece of code to find a bootable partition (a partition marked as "active" in the partition table), load the first sector from the bootable partition and jump to it. It's possible to partition a floppy drive or have a hard drive that isn't partitioned - the BIOS doesn't know/care (but most OSs expect floppies to have no partitions and expect hard drives to be partitioned).

For CD-ROMs the BIOS searches for a special table (the "boot catalogue") on the CD, which tells the BIOS if the CD is bootable and how to boot it. There's 3 options here. The entry in the boot catalogue might tell the BIOS to emulate a floppy disk (and tell the BIOS where a floppy image is on the CD); or the boot catalogue might tell the BIOS to emulate a hard disk (and tell the BIOS where a hard disk image is on the CD). In both these case it ends up being like a real floppy/hard disk (BIOS modifies it's "disk services" functions to make the image on the CD look like a real floppy/hard drive; then loads the first 512 bytes from the emulated floppy/hard disk and jumps to it). The other options is "no emulation". In this case the BIOS loads "n" sectors from somewhere on the CD and then jumps to it; where the number of sectors, the starting sector and the address to load them are all determined from the entry in the boot catalogue. In theory this allows you to have a huge boot loader (e.g. 400 KiB) and you can ask the BIOS to load it at 0x00001000 or anywhere else that is sane for real mode code (instead of 0x00007C00). In practice some BIOSs are buggy and it's better to use 0x00007C00 as the load address (and only load 1 sector if you can). For "no emulation" CD boot, you can use the BIOS disk services to read (2048 byte) sectors from the CD; and if the boot loader needs to load more files from CD it can (but the boot loader should probably have code to support the CD's file system, e.g. ISO9660).

For network boot, the BIOS starts code in the network card's ROM. The network card's ROM broadcasts a (slightly special) request for DHCP info, and hopefully a server returns a (slightly special) DHCP info packet containing the normal information (IP address, subnet mask, etc) with some additional/extra information (the IP address for a TFTP server and a file name to download from that TFTP server). Then the network card's ROM contacts the TFTP server and tries to download the file into memory at 0x00007C00, and then jumps to this file. In this case the boot loader (that was downloaded and jumped to) can use a special API provided by the network card's ROM to access network services (including downloading more files if necessary). There are some variations on this that are only supported in some cases (e.g. some network cards might allow files to be downloaded by normal FTP or HTTP); and it's possible for the "network card's ROM" to actually be normal code (e.g. booted from a disk) that is pretending to be a network card's ROM (but actually isn't). For this to work you need a DHCP server that supports it (and you need to configure it so the correct computers start the correct file, which can be a little tricky if different computers boot different files as you need to rely on ethernet MAC addresses) and you need a TFTP server somewhere too (not necessarily on the same computer as the DHCP server).


Cheers,

Brendan

Re: SSD's, RAID, and other storage stuff

Posted: Fri Aug 13, 2010 12:41 pm
by MDM
Okay, thanks!

Re: SSD's, RAID, and other storage stuff

Posted: Tue Aug 17, 2010 3:10 pm
by MDM
When you get assigned your LBA's, are you forced to adjust hard drive calls by the partion offset, or does it normalize your hard disk calls for you?

Re: SSD's, RAID, and other storage stuff

Posted: Tue Aug 17, 2010 7:39 pm
by bewing
LBAs are always absolute numbers -- the sector offset from the very first sector of the device. Neither the HD or the controller cares about "partitions" and they certainly do not adjust the LBA values with any regard to partitions. You have to adjust your own hard drive calls by the LBA offset of the partition (which you need to save in a variable).