[answered] A question on GPT and Filesystem Detection

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

[answered] A question on GPT and Filesystem Detection

Post by Ethin »

Hello all,
I have been working on reading the GPT partition table and working my wayup to implementing a filesystem, currently Ext2. I have a couple questions though about this process:
1) Wikipedia indicates that the GUIDs (partition type GUID, partition GUID, ...) are in "mixed endian". What is the appropriate procedure for converting these "mixed endian" GUIDs into actual numbers? Rust allows me to manipulate and store 128-bit integers. Currently I just take the hole GUID, convert the words from the ATA device to little endian bytes, and then convert those into 128-bit integers. Will this suffice? As an example, my kernel currently shows the following data about this single partition I have (created via NBD):
  • Partition type GUID: e47d47d8693d798e477284830fc63daf
  • Partition GUID: 53b18f9bf4bf21bf402586be28bc51bf
  • Starting LBA: 2048
  • Ending LBA: 8388574
  • Attribute flags: 0
  • Partition name: Linux filesystem
2) What is the proper procedure for verifying CRC32s for GPT? I tried a simple CRC32 algorithm and it failed verification. I'm a bit confused on precisely what to input into the CRC32 algorithm and what to leave out, as well, for both partition entries and for the GPT header. (My kernel indicates that the GPT header CRC is c53dfce8 and that the partition entries CRC is c5701f45.
3) What is the method for detecting what filesystem is on the partition?
Last edited by Ethin on Mon Jan 13, 2020 10:21 am, edited 1 time in total.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: A question on GPT and Filesystem Detection

Post by bzt »

Ethin wrote:What is the appropriate procedure for converting these "mixed endian" GUIDs into actual numbers?
Here's how I print them on little-endian machine (in C):

Code: Select all

typedef struct {
    uint32_t Data1;
    uint16_t Data2;
    uint16_t Data3;
    uint8_t  Data4[8];
} __attribute__((packed)) guid_t;
printf("%08x-%04x-%04x-%02x%02x%02x%02x%02x%02x%02x",guid->Data1,guid->Data2,guid->Data3,
        guid->Data4[0],guid->Data4[1],guid->Data4[2],guid->Data4[3],guid->Data4[4],guid->Data4[5],guid->Data4[6],guid->Data4[7]);
So the first three fields are little-endian, and the last is big-endian.
But if you don't have to convert between ASCII and binary representations, then comparing a 16 bytes long byte array should suffice. For checking ids in GPT, 128-bit integers are perfect.
Ethin wrote:Rust allows me to manipulate and store 128-bit integers. Currently I just take the hole GUID, convert the words from the ATA device to little endian bytes, and then convert those into 128-bit integers
No need to do endianess conversions as long as you specify those integers the same way as they are stored on disk in the GPT (and printed by your kernel). If they don't match, just flip the constant in your source and it should work.
Ethin wrote:2) What is the proper procedure for verifying CRC32s for GPT? I tried a simple CRC32 algorithm and it failed verification.
Haha, there are many CRC32 algorithms. Castangoli polynomial won't work, and you have to start with full binary 1s and XOR the result by full binary 1s. It's called ANSI X3.66 CRC-32. Here's an implementation (in C, sorry) that I use to generate GPT in an image file, and it calculates correct values. I think it should be trivial to convert this function into Rust.
Ethin wrote:3) What is the method for detecting what filesystem is on the partition?
You could use the "Partition type GUID" field, but I rather would recommend checking for magic bytes in the superblock on the partition, that's more bullet-proof. So read a few sectors from "Starting LBA" into memory, and in case of ext2 check if the word at offset 56 on the third sector is 0xEF53 (if I recall correctly, please consult the spec for the correct offset and magic value).

Cheers,
bzt
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: A question on GPT and Filesystem Detection

Post by BenLunt »

Ethin wrote:Hello all,
I have been working on reading the GPT partition table and working my wayup to implementing a filesystem, currently Ext2. I have a couple questions though about this process:
1) Wikipedia indicates that the GUIDs (partition type GUID, partition GUID, ...) are in "mixed endian". What is the appropriate procedure for converting these "mixed endian" GUIDs into actual numbers? Rust allows me to manipulate and store 128-bit integers. Currently I just take the hole GUID, convert the words from the ATA device to little endian bytes, and then convert those into 128-bit integers. Will this suffice? As an example, my kernel currently shows the following data about this single partition I have (created via NBD):
  • Partition type GUID: e47d47d8693d798e477284830fc63daf
  • Partition GUID: 53b18f9bf4bf21bf402586be28bc51bf
  • Starting LBA: 2048
  • Ending LBA: 8388574
  • Attribute flags: 0
  • Partition name: Linux filesystem
I really don't have any comment on this one. Why do you care to manipulate the already given GUID? Are you asking about creating a GUID?
Ethin wrote:2) What is the proper procedure for verifying CRC32s for GPT? I tried a simple CRC32 algorithm and it failed verification. I'm a bit confused on precisely what to input into the CRC32 algorithm and what to leave out, as well, for both partition entries and for the GPT header. (My kernel indicates that the GPT header CRC is c53dfce8 and that the partition entries CRC is c5701f45.
The EFI specification, states:
"Unless otherwise specified, UEFI uses a standard CCITT32 CRC algorithm with a seed polynomial value of 0x04c11db7 for its CRC calculations."
There is a nice implementation (in C) at https://github.com/nyx0/Dexter/blob/mas ... ce/CRC32.c. As bzt says, you should be able to translate.
Ethin wrote:3) What is the method for detecting what filesystem is on the partition?
There is no standard. From the EFI specs, the System Partition is to be a FAT partition. Nothing else is specified. It is up to your code to detect. However, you should not have to detect the file system. Your Partition Boot should be written to already know what the partition's file system is. You wouldn't put a FAT partition boot code on an ext2 partition, would you? :-)

Ben
- http://www.fysnet.net/osdesign_book_series.htm
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: A question on GPT and Filesystem Detection

Post by Ethin »

BenLunt wrote:
Ethin wrote:Hello all,
I have been working on reading the GPT partition table and working my wayup to implementing a filesystem, currently Ext2. I have a couple questions though about this process:
1) Wikipedia indicates that the GUIDs (partition type GUID, partition GUID, ...) are in "mixed endian". What is the appropriate procedure for converting these "mixed endian" GUIDs into actual numbers? Rust allows me to manipulate and store 128-bit integers. Currently I just take the hole GUID, convert the words from the ATA device to little endian bytes, and then convert those into 128-bit integers. Will this suffice? As an example, my kernel currently shows the following data about this single partition I have (created via NBD):
  • Partition type GUID: e47d47d8693d798e477284830fc63daf
  • Partition GUID: 53b18f9bf4bf21bf402586be28bc51bf
  • Starting LBA: 2048
  • Ending LBA: 8388574
  • Attribute flags: 0
  • Partition name: Linux filesystem
I really don't have any comment on this one. Why do you care to manipulate the already given GUID? Are you asking about creating a GUID?
No, I am asking about interpreting a GUID for displaying to the user.
BenLunt wrote:
Ethin wrote:2) What is the proper procedure for verifying CRC32s for GPT? I tried a simple CRC32 algorithm and it failed verification. I'm a bit confused on precisely what to input into the CRC32 algorithm and what to leave out, as well, for both partition entries and for the GPT header. (My kernel indicates that the GPT header CRC is c53dfce8 and that the partition entries CRC is c5701f45.
The EFI specification, states:
"Unless otherwise specified, UEFI uses a standard CCITT32 CRC algorithm with a seed polynomial value of 0x04c11db7 for its CRC calculations."
There is a nice implementation (in C) at https://github.com/nyx0/Dexter/blob/mas ... ce/CRC32.c. As bzt says, you should be able to translate.
Thanks, I'll translate that.
BenLunt wrote:
Ethin wrote:3) What is the method for detecting what filesystem is on the partition?
There is no standard. From the EFI specs, the System Partition is to be a FAT partition. Nothing else is specified. It is up to your code to detect. However, you should not have to detect the file system. Your Partition Boot should be written to already know what the partition's file system is. You wouldn't put a FAT partition boot code on an ext2 partition, would you? :-)

Ben
- http://www.fysnet.net/osdesign_book_series.htm
That... wasn't really what I was asking. I was asking if there was some way I could determine what FS a disk has so that I can read it. My kernel does not take kernel command line arguments and I'd rather not assume that every disk that I start reading is EXT2 formatted.
User avatar
zaval
Member
Member
Posts: 658
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: A question on GPT and Filesystem Detection

Post by zaval »

As Ben said, you don't have to mess around the endianness of the GUID unless you are generating it. if you have to compare, then do compare two GUIDs congruently so to say, Guid0->Data0 with Guid1->Data0 etc, bzt showed the correct format of organizing it, except packing is not needed there. it's exactly how UEFI represents it. treating them as some fancy 128 bit integers should also work, but I'd omit such pseudo features in a low level code and use a structure for representing it. who knows what that rust compiler would decide to organize those non-existent on machine level 128 bit integers. comparing them (which is your ultimate need) still will be done on a machine word basis anyway.

as of identifying the FS, I really like the GPT feature of uniquely typing a FS through the mentioned field. unfortunately, many OS providers decided to assign to that field a little different meaning, different from FS type or class (say is it NTFS or JFS). dealing with others' FSs, you would need to take this into account and make additional checks inside of the FS boot sector or superpuper block or how are they calling it, of the FS volume, but for your own FS, you definitely could use that field. very convenient.

as of CRC, Ben is right, you need to look at the UEFI spec, what it says. the polynomial mentioned for example (funny, there is also endianness mess about it, but anyway, it's a story not for typing on a tablet). for the concrete certainty, I suggest you to take a look at the edk2 implementation, it's really quite simple even despite it's about those polynomials. the link they moved and changed it, so it's even easier, since uses precalculated table (instead of generating it last time I checked in 2016 :D). this is surely what all the BIOSes do, since it's that code, they are based on.

for displaying for users, the format is also specified in that RFC, and is described in the UEFI spec as well, the so called "registry" format. it shows first fields in LE, the last in BE. the appendix in the spec describes it in details, have a look at it. here, treating GUIDs as 128 bit integers won't work, of course.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: A question on GPT and Filesystem Detection

Post by Ethin »

The file system detection I'm talking about is something like what Linux does. When I run the command:

Code: Select all

sudo mount /dev/sdb1 /mnt
How does Linux determine the filesystem on /dev/sdb1? Does it go through every single FS module that is available and sees which one succeeds and which one fails and goes with the one that succeeds? I know, Linux is a far more complicated beast than my kernel is, and I know that a lot goes into that, but lets assume that /dev/sdb1 is just a standard ATA disk in PIO mode. What would Linux do to determine what FS driver to use?
The reason I'd like to know this is because I'd like to (not) assume that the disk contains any particular FS. I'd like either the kernel to determine it and pass control to the specified FS driver or have the FS take care of it, but I don't want to go calling every possible FS that my kernel may implement. That seems like an unnecessary waste of both CPU cycles and port accesses.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: A question on GPT and Filesystem Detection

Post by bzt »

Ethin wrote:No, I am asking about interpreting a GUID for displaying to the user.
See my printf example. 4 bytes LE, 2 bytes LE, 2 bytes LE then 8 bytes BE.
Ethin wrote:That... wasn't really what I was asking. I was asking if there was some way I could determine what FS a disk has so that I can read it.
Check the magic bytes in the superblock. That's the best and most bullet-proof solution.
Ethin wrote:How does Linux determine the filesystem on /dev/sdb1? Does it go through every single FS module that is available and sees which one succeeds and which one fails and goes with the one that succeeds?
That depends on what you have in /etc/fstab. If there's an fs specified, then it will just try to mount with that one. If you have "auto" there, or when the device is not listed in fstab, and there's no "-t" option to mount, then it iterates on fs modules to see which one can mount the device. Fs modules then check for magic to recognize the superblock. For example, the code for ext2 that checks the magic is here.
Ethin wrote:I don't want to go calling every possible FS that my kernel may implement. That seems like an unnecessary waste of both CPU cycles and port accesses.
Not really. You only do this once when you issue the "mount" command. You don't load the sectors for each modules, you just load it once, and then the modules work exclusively on memory image only (so it's fast). How many fs modules do you have? 2 or 3? Even under Linux there's no more than 20 or 30, so it's a small loop even if you install all fs modules.

Cheers,
bzt
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: A question on GPT and Filesystem Detection

Post by Ethin »

bzt wrote:
Ethin wrote:No, I am asking about interpreting a GUID for displaying to the user.
See my printf example. 4 bytes LE, 2 bytes LE, 2 bytes LE then 8 bytes BE.
Ethin wrote:That... wasn't really what I was asking. I was asking if there was some way I could determine what FS a disk has so that I can read it.
Check the magic bytes in the superblock. That's the best and most bullet-proof solution.
Thanks, I'll do that. Implementing EXT2 will be difficult.
+bzt"]
Ethin wrote:How does Linux determine the filesystem on /dev/sdb1? Does it go through every single FS module that is available and sees which one succeeds and which one fails and goes with the one that succeeds?
That depends on what you have in /etc/fstab. If there's an fs specified, then it will just try to mount with that one. If you have "auto" there, or when the device is not listed in fstab, and there's no "-t" option to mount, then it iterates on fs modules to see which one can mount the device. Fs modules then check for magic to recognize the superblock. For example, the code for ext2 that checks the magic is here.
Thanks again. Good to know.
bzt wrote:
Ethin wrote:I don't want to go calling every possible FS that my kernel may implement. That seems like an unnecessary waste of both CPU cycles and port accesses.
Not really. You only do this once when you issue the "mount" command. You don't load the sectors for each modules, you just load it once, and then the modules work exclusively on memory image only (so it's fast). How many fs modules do you have? 2 or 3? Even under Linux there's no more than 20 or 30, so it's a small loop even if you install all fs modules.

Cheers,
bzt
True that. Thanks, everyone. :)
Post Reply