OSDev.org

Posted: **Wed May 03, 2017 3:07 pm**

Hello all!

With AQUA 1.3, I'd like to have a proper file system. I am currently working on my own file system, and I'd like to know if this would be the correct way of doing thing. I'll exclude stuff that aren't important to the file system, like the sectors for language and swap partition. So here is an example, as I'm not good at explaining things. it has 9 sectors:

(start of partition)
1. This is a long text file that should continue for 512 bytes... (File1)
2. ...And it continues on to sector 2... (File1)
3. Here is another file.\0 (File2)
4. ... but wait! there's more!\0 (File1)
5. (any number of unused sectors between file allocation table and content)
6. 3\0 (sector 3)
7. File2\0directory1\0 (dir comes after filename)
8. 1-2,4\0 (from sector 1 to 2 and sector 4)
9. File1\0directory2\0 (max filename length is 512 - directory length chars)
(end of partition)

Note that when I write 3 or 1-2,4 the numbers don't stay in base 10.

Should this be a good way of "implementing" a file system into my OS?
Thanks in advance!

Posted: **Wed May 03, 2017 5:15 pm**

I have not implemented a file system yet, but this does not seem realistic to me. Some things that jump out at me (and I'm sure there are more to consider):

* how do you know the length of a file?
* how would you list a directory's contents?
* how would you find a free block?
* perhaps you want to write 16 blocks and want the to be contiguous for performance -- could you support best fit? or even first fit?
* what if you have a file at sectors 1, 4, 6, and 8 and sector 5 frees up... your file expands to another sector, where does it go?
* another way to look at the previous, how do you determine what the next sector for the file is?

I recommend you review the existing file systems like FAT and Ext2 to see what make up a usable filesystem before you try to invent a new one.

Posted: **Wed May 03, 2017 6:27 pm**

Creating a File System of your own can be one of the highlights of OS development. I have created one myself, mainly to check to make sure my virtual file system driver was file system independent, but maybe more so, because I wanted to.

Anyway, if I were you, I would first study how other file systems are built.

Of course the (in)famous FAT file system: http://www.fysnet.net/docs/fatgen103.pdf
My (simple) file system: http://www.fysnet.net/fysfs.htm
The LeanFS: https://www.fysnet.net/leanfs/index.php (which I have implemented a modified/added to version of my own)
Then of course there is the Ext2/3/4, ISO9660, Joliet ISO, HPFS, Minix FS, Btrfs file systems. You name it, there are many file systems available.

Just search for file systems on a wiki of your choosing. https://en.wikipedia.org/wiki/List_of_file_systems shows quite a few.

Anyway, first study how other file systems store file names (meta data), allocation units, free space, etc., and then decide if and how you want to create your own. Each have limitations. However, the lessor the limitation, the more complicated the file system.

I am a little fond of File Systems and have implemented many within my code. If you have any questions, post them here and we will see what we can do to help.

Ben
http://www.fysnet.net/fysos.htm

Posted: **Thu May 04, 2017 6:06 am**

obiwac wrote: (start of partition)
1. This is a long text file that should continue for 512 bytes... (File1)
2. ...And it continues on to sector 2... (File1)
3. Here is another file.\0 (File2)
4. ... but wait! there's more!\0 (File1)
5. (any number of unused sectors between file allocation table and content)
6. 3\0 (sector 3)
7. File2\0directory1\0 (dir comes after filename)
8. 1-2,4\0 (from sector 1 to 2 and sector 4)
9. File1\0directory2\0 (max filename length is 512 - directory length chars)
(end of partition)

A couple of questions that may help you:
1) In your example the file metadata is at the end, are you intending that these are always kept at the end of the partition? If so, that is a known location so you can use it, but why the end instead of the beginning? By known location I mean any location you can address, because you have to have some starting point, though usually that is at the beginning, not the end.
2) Related to the above point, are you intending this FS to be read from end to beginning? Because first you know the filename you want to access and based on that you need to know the sectors, however your listing has sectors first and then filenames. Again if this is intended to be read bottom-top order then it makes sense, but why that order? If you don't have a reason then I'd do it the "normal" way which would be top-bottom.
3) Why reverse the file and directory in your file entries? I'm guessing your hoping to either make the code simpler or perform better, but think it thru, might turn out to be just the opposite.
4) Is "8" supposed to be a single sector? Aren't you wasting like 500 bytes on having the sector info take up a whole sector? Note, it's not just the bytes on the disk, you need to read that whole sector to memory to be able to know where the actual file is, which means you're now wasting 500B of RAM as well, and ~60B cache when ever you access that sector to know where the file is. This can easily start to add up, though for a simple FS performance might not be that important.
5) Related to the above one, what if you have a large file, can you store all of its sector data in a single sector? How are you planing on storing that and what max size limit will it set to a single file?
6) You use "\0" between "File1" and "directory2", was it supposed to mean a "terminating zero"? If so, how do you know how many parts there are? For example if you dir1\dir2\dir3\file66, if each "\" is a null, how do you know how many parts there are? Or are you only going to support single directory depth? That is, no files in root, and all files always either 1 or more directories deep, and no files in any of the other directory depths either.. which doesn't seem like a good idea.
7) You've also placed a 512B limit on "full" file names (including path), while legacy FS's have those as well you probably shouldn't try to repeat their mistakes intentionally =)

Using just as simple but slightly re-ordered version of yours:
(start of partition)
1. directory2\File1\0 (max filename length is 512 - directory length chars)
2. 5-6,8\0 (from sector 5 to 6 and sector

3. directory1\File2\0
4. 7\0 (sector 7)

5. This is a long text file that should continue for 512 bytes... (File1)
6. ...And it continues on to sector 6... (File1)
7. Here is another file.\0 (File2)
8. ... but wait! there's more!\0 (File1)
9. (any number of unused sectors between last file data and end of partition)
(end of partition)
NOTE: I created that with a quick copy/paste and edit, might have some simple mistakes in it..

Comparing mine and yours, which is better and why? What benefit does yours have over this one? I think this is conceptually easier (for humans) to read and understand.

There are several improvements that can be made though they add minor complexity (though so minor that I can't see a reason not to do them, then there's even better improvements but those add more complexity which probably isn't minor). I would also think about storing other meta data besides name and path:
- Size. If you have a 100MiB file you have to read every sector of that file to reach the final terminating zero to know size of file. Note, this is why C strings are stupid and C++ std::string stores its size.
- I would store paths separately so that finding files is easier (you don't have to scan every filename entry, just the root level directory and then follow from there, this reduces the search space hugely in normal cases.

I had other things as well, but don't have the time to go into those now.. note these basic things are what pretty much every FS does, though they might do the slightly differently due to their "lower" level "nature" but the principals are largely the same.

Posted: **Thu May 04, 2017 6:30 am**

Look at this link for a book well worth reading by anyone designing a file system.

http://www.nobius.org/~dbg/

Posted: **Sun May 07, 2017 6:11 pm**

Ok thanks!
I have indeed probably overlooked a lot of things. I think I'll put it off for now.

Posted: **Mon May 08, 2017 5:56 pm**

i have 256 byte seector size
the first byte indicates if the block is free, it is a file, or a directory name
if file, last two 8 byte is the previous-next block pointer.
if directory or file name, the last two byte is the internal directory structure pointer - next file system entry pointer

area loss is 7% + 256 byte for every file

Posted: **Mon May 08, 2017 6:40 pm**

Geri wrote:i have 256 byte seector size
the first byte indicates if the block is free, it is a file, or a directory name
if file, last two 8 byte is the previous-next block pointer.
if directory or file name, the last two byte is the internal directory structure pointer - next file system entry pointer

area loss is 7% + 256 byte for every file

Given that it is a virtual system definition (even for a hardware implementation, there would needed to be a translation between the high-level specification and the actual drives), why would the Dawn spec define something like that at all?

Every hard drive in production since the 1970s has used hard sectors in order to simplify tracking by the read head; the sector sizes in hardware cannot be changed. In most newer drives, the peripheral interconnect standard defines the sector size, meaning that all drives that use (e.g.) SATA have the same sector size, or at least, they expose the same size read blocks (no current-use drive controller allows the CPU to read from the disk directly, so no matter the actual hardware the system sees the same read block sizes).

SATA drives using NAND Flash storage follow this, but only to allow the same peripheral connections to work on the drives; much of the SATA standard imposes restrictions which are meant to match the limitations of spinning platters, but aren't relevant to a solid-state drive. the newer NVMe standards suites is basically the same design minus the parts which are specific to disks, which IIUC is the main reason they are faster - they simply have less operational overhead per packet read or written.

The newer forms of NVRAM (which isn't just 3D XPoint, as MRAM and some other technologies are also starting emerge), which can operate at speeds comparable to DDR or DDR2 DRAM, are meant to bypass even that (though Intel introduced 3D XPoint as an NVMe model first because it is easier to change the firmware going into the m.2 sockets than to redesign the memory buses - Optane is less a long-term solution than a tech demo and a teaser for the NVDIMM memory subsystems, buying them time to convince the motherboard designers and OS vendors to support it - both Microsoft and a consortium of Linux distros have already started on that, but the mobos need to be redesigned before it gets into use).

Most of the designs involving accessing NVRAM as RAM rather than block devices that are coming into position assume that the high-capacity storage (e.g., HDDs and SSDs) would be treated as spill memory - that is, the memory controller would silently evict NVRAM to disk, and refill it when needed, in a model similar to current-day virtual memory - and the OS itself wouldn't even see it. Everything would be mapped to random-access memory in a storage hierarchy controlled in hardware, using a RAM disk design for file systems initially but with files themselves slowly being replaced by persistent memory objects.

This means that even if you have a contract in your hands with (say) Global Foundry right now for a SubLeq CPU (even just as an ASIC), by the time it is taped out, the sizes of drive sectors will be as relevant as the sizes of buggy whip handles.

In other words, committing to a sector size at all would be a mistake. It's the sort of thing you should be abstracting away, not enshrining in the design.

And if you think that your maker friends will be up to designing home-made hard drives, well, I wish them luck - seriously, that's no joke there, if someone can actually come up with a method of producing even medium-to-high capacity (which in this context means 1MB or higher) persistent storage that could be built on maker-scale with scratch equipment, it would be far more revolutionary than any new CPU design could ever be (well, without quantum logic, at any rate).

Posted: **Mon May 08, 2017 9:28 pm**

Hi,

Schol-R-LEA wrote:In other words, committing to a sector size at all would be a mistake. It's the sort of thing you should be abstracting away, not enshrining in the design.

I'd be tempted to go a step further than this.

Currently different devices have different block sizes - e.g. 512 byte sectors for floppy and most/older hard drives, 2 KiB sector for CD/DVD, 4 KiB sectors for newer hard drives, and whatever the MTU happens to be networked file systems (e.g. maybe about 1500 bytes). When you move a file system from one device to another (e.g. from an old hard drive to a new hard drive, from a "*.iso" to a DVD, etc) the underlying hardware changes and so does the underlying hardware's block size. This means that you mostly have to be able to deal with "file system block size doesn't match underlying hardware's block size"; which in turn means that the file system's block size should really be considered more of a performance hint and can't be considered as a "thing that must be obeyed".

If the file system's block size is just a performance hint (and may not match the underlying hardware's block size); then it doesn't really make that much sense to build a fixed block size into a file system design or into the storage device driver interface in the first place.

For example, you could implement the storage device driver interface to provide byte granularity; (e.g. so that a file system could ask to read or write 3 bytes at offset 1234 if it wanted to, and storage device driver would read/write the smallest number whole block/s it can to satisfy the file system's request, including doing "read, modify, then write" to modify arbitrary bytes within a block if/when necessary), and then design the storage device driver interface to provide optimisation hints on request - not just underlying block size, but also information like if seek times increase with distance, if there's "groups of blocks" (cylinders), etc. The file system could also provide (and be designed for) byte granularity; but could/would ask the storage device driver for the actual device's optimisation information and use that for optimisation purposes (for allocating space, and for improving the efficiency of any caches managed by the file system code).

This would also mean that you can have more flexibility in any "middle layers" between the file system and storage device driver. For example, you could have something that steals 8 bytes from each block to use as a checksum (so that if the storage device driver says to optimise for 4096-byte blocks, the file system would be told to optimise for 4088-byte blocks); or you could have a layer that does compression/decompression (that tells the file system there's no block size to optimise for); etc. These "middle layers" could be chained in whatever permutation the end user felt like (e.g. maybe file system talks to a layer that adds checksums, which talks to a layer that does compression/decompression, which talks a layer that does encryption, which talks to a layer that does RAID, which talks to the underlying storage device drivers); and regardless of which middle layers are used the file system would still be told to optimise for whatever block size makes sense (if any) and wouldn't break if/when you move a file system between different real and/or virtual devices (e.g. in a "cp /dev/sda1 /dev/vsd123" way, where "/dev/vsd123" is the virtual storage device created by "middle layers").

Cheers,

Brendan

Posted: **Tue May 09, 2017 8:33 am**

Schol-R-LEA wrote:why would the Dawn spec define something like that at all?

dividing and %-ing with 256 is fast, becouse it requires no division, just a simple shift. (as subleq cant divide from hardware. )
hardware sector size and hardware built-up is not relevant since its the job of the hardware (and/or the firmware), but the file system tries to write into 4096 contignous bytes to avoid memory cell/ssd cell exhaustion.

Posted: **Tue May 09, 2017 8:43 am**

Geri wrote:but the file system tries to write into 4096 contignous bytes to avoid memory cell/ssd cell exhaustion.

SSD erase size is usually 128Kb+, just sayin'

Posted: **Tue May 09, 2017 9:41 am**

dozniak wrote:
Geri wrote:but the file system tries to write into 4096 contignous bytes to avoid memory cell/ssd cell exhaustion.
SSD erase size is usually 128Kb+, just sayin'

i am aware. also some pendrives having large blocks. but i need ,,holes'' to find free spaces quickly on the surface if needed, as i dont have a map that indicates them.

(i checked it, not 4k, 13824 byte is the typical contignous byte size in the os)

Posted: **Tue May 09, 2017 10:31 am**

dozniak wrote:
Geri wrote:but the file system tries to write into 4096 contignous bytes to avoid memory cell/ssd cell exhaustion.
SSD erase size is usually 128Kb+, just sayin'

Yes and no, SSDs and eMMC have a flash transition layer (FTL) in HW or FW. In this case the SSD just exposes itself like a block device like it was a normal hard drive. The 128KB erase block size is hidden under the hood by the FTL and the file system SW doesn't necessarily need to pay attention to it.

My recommendation is not to pay too much attention to limitations of flash memory and instead design a file system that works with an "ideal block device". Flash memory technology tend change quickly, at least in the past and hunting those change will just slow down the file system development. Let the FTL do that work for us instead. There are flash dedicated file systems like YAFFS2 or F2FS intended to work on RAW NAND but I would even bother with that if I designed a file system.

Posted: **Tue May 09, 2017 12:21 pm**

bare NAND is not that rare if you are gonna conquer an SBC landscape (like me

). I have 5 boards and 2 of them have that as their main secondary storage. And by the way, erase unit is way bigger than 128 KB on them, rather 1MB.

OSDev.org

Is this the correct implementation of a file system?

Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?

Re: Is this the correct implementation of a file system?