Page 3 of 4
Re: Why we address RAM directly but use file system for HDD?
Posted: Sun Jan 13, 2013 9:32 pm
by linguofreak
We access disks through a filesystem because the primary thing we use disks for is storing files (because we want them to stick around after we have powered off the computer).
When we store things that aren't files on disks (such as when we page memory out to a swap partition) we don't use a filesystem. When we have files that don't need to stick around after we power off the computer, we often store them in a filesystem in RAM to make them faster to access. We also often cache files from our disk filesystem in RAM.
So "direct" access is not unique to RAM, and access through a filesystem is not unique to disks.
Re: Why we address RAM directly but use file system for HDD?
Posted: Sun Jan 13, 2013 10:22 pm
by trinopoty
amn wrote:trinopoty wrote:Back to tropic.
What if a program stores data at LBA 0x2000 and another program later modifies the data thinking it is it's own.
I have never seen or heard anyone being advised to read or write unallocated data. The way it has traditionally been done is that one asks the allocator for memory, and gets a pointer back, knowing the stretch of the area. The process memory allocator keeps track of what is what in the heap, and another process cannot safely read or write data - a segmentation fault will be raised in hardware, courtesy x86 protected mode. If two processes want to share memory they use some form shared memory API. Now all this of course pertains to the more traditional POSIX systems.
The scenario you described is no more or less applicable to a system I had described earlier (where disk is accessed using same abstraction that memory uses)
Am I missing something?
Precisely.
The program will ask for free space on disk. And how does the OS keep track of free space on disk? Using some data structure we call "File System".
You can use numbers instead of names to identify the data on disk (it will be a bit difficult for the user to use).
Allowing a program to access a disk like a big fixed size file is a another thing entirely.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 5:45 am
by amn
Indeed, the OS keeps track of the free space on the disk using the "data structure" known as the filesystem. The problem here lies not in this simple assertion, but in the fact that file system provides the user with much more than what is strictly required, and does not allow one to opt out of the abstraction.
A heap allocator also keeps track of free space in memory, but it does not lock one in into files or inodes or anything like that.
The difference between the two may be blurred but is also of certain importance - we end up accessing memory directly while we access the disk through filesystem. Both give us safety and security.
And to conclude:
Indeed, we can say that we don't access memory DIRECTLY. But the abstraction we go through is much THINNER than the abstraction known as the filesystem. We can even say that the memory access abstraction is as thin as possible - whatever is done is done so on the motherboard or configuring the MMU in order to SAFELY access the resource. It does not impose files or other higher level abstracts on us. You get a buffer and you are free to read and write to it.
And you forget one thing when you mention that the user may use numbers instead of files but it wouldn't be convenient. I am not advocating for abolition of file systems as such, I want them simply to step aside and be optional, letting developers write their own disk access abstractions based on requirements of their applications. One particular thing to keep in mind here is that not all applications need users to specify their files - most of such applications simply read a file specified by a compile-time known filename, that is around for the lifetime of the application installations. Moreover, such files are often not designed to be serviceable by users, and so such applications do not benefit from files at all - they might just read their designated sector(s) from disk directly when starting up.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 5:49 am
by sandras
Just putting it out there, but maybe you'd like to look into exokernels, and, maybe, Forth systems?
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 5:55 am
by amn
Sandras wrote:Just putting it out there, but maybe you'd like to look into exokernels, and, maybe, Forth systems?
Hi Sandras, yes that is what I have been reading on for past three years. Indeed, this topic is very closely related to exokernels. This thread was a way to take the temperature so to speak, and also to really try to see how do experienced developers see the issue, without me mentioning exokernels explicitly.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 5:58 am
by amn
linguofreak wrote:We access disks through a filesystem because the primary thing we use disks for is storing files (because we want them to stick around after we have powered off the computer).
No, we use disks for storing files because we access disks through a file system. You got it the other way around, I think. The whole point of my thread was to pose and try to answer the following question - are file systems necessity or a convenience? It appears that many developers are simply hardwired into thinking the former, but it is simply not true - no matter how prevalent the concept of files is, it is not the be-all end-all of disk storage.
So "direct" access is not unique to RAM, and access through a filesystem is not unique to disks.
Of course. Which is what I was also aiming at - abstractions should co-exist, not monopolize our thoughts. File systems are a VERY CONVENIENT AND EFFICIENT abstractions for most of the applications we write today, but they should not be the mandatory disk access layer, merely one of the alternatives. That way, those few applications that do not need them, need not suffer the extra cost which may be substantial depending on their particular design patterns. And "few applications" can be used by many users, so they (applications) do matter.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 6:26 am
by trinopoty
The word "File" is an abstract of the same level as "File System". You see, when we store data to disk, we do so in entities termed as "file". A directory is a collection of other files and directories. I agree that not all of the features provided by file systems are necessary for all applications.
Applications requiring leaner way to store data without a lot of extra (and useless) metadata just uses their own *file system* on top of the existing one.
A lot of the features we have today was developed early and no one considered removing them. Of course, some of the features are rarely used.
As for the overhead, it does not matter very much as we now get a lot of storage at lower prices.
If I correctly understand your question; I would say we use filesystem because it's more convenient.
Of course, your question can be answered on more than one level depending on the one answering it.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 6:30 am
by trinopoty
amn wrote:
Of course. Which is what I was also aiming at - abstractions should co-exist, not monopolize our thoughts. File systems are a VERY CONVENIENT AND EFFICIENT abstractions for most of the applications we write today, but they should not be the mandatory disk access layer, merely one of the alternatives. That way, those few applications that do not need them, need not suffer the extra cost which may be substantial depending on their particular design patterns. And "few applications" can be used by many users, so they (applications) do matter.
A common end-user is not willing to sacrifice simplicity and convenience for those *few* applications to work properly. Because those few application belong to a very specialized category and the common end-user has no need for them. For others that do use them, I think they are smart enough to know exactly what to do and how to do it.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 6:55 am
by iansjack
If a disk is dedicated to a single process (I was going to say "partition" but there we are already introducing some extra structure) then direct disk access is certainly possible. Early database systems worked like that (and I'm sure some modern ones do too). But as soon as more than one process is to access the resource then you need to start imposing some structure on it; that's where we start using a filesystem; how complicated that is depends upon the entire system.
But, if you want, you can memory map a file and access it as if it were RAM. This is really not a great deal more complicated than the way that handles to other resources are requested and used, or even simple memory management via malloc(). In a system that runs more than one process it is just too complicated to let each process contain the logic necessary to manage either RAM or disk (or any other hardware for that matter). Something has to arbitrate; the memory allocation routines in the case of RAM, the filesystem in the case of disk storage. You might as well ask why we don't just access the video screen directly. Without some disk management by the kernel (and that's then a filesystem) what is to stop processes overwriting important information?
How complicated that filesystem is is another question. What you call the arbitration processes doesn't matter. But you can't just let separate programs allocate and manage resources. That would be a recipe for disaster.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 7:49 am
by amn
trinopoty wrote:
A common end-user is not willing to sacrifice simplicity and convenience for those *few* applications to work properly. Because those few application belong to a very specialized category and the common end-user has no need for them. For others that do use them, I think they are smart enough to know exactly what to do and how to do it.
Smart enough does not equal a solution. There are smart developers everywhere, but if your say, technical manager told you to implement raw disk access for the sake of maximum speed in Linux how will you circumvent the file system layer? You can't unless you are willing to ask your users to either have an unused partition available or install a secondary physical storage entity, whether it be a hard drive or a so-called pen drive. Linux, for one, does not allow end-users to re-partition their hard drive to find free sectors to use. Most of the time, the entirety of available internal physical storage is strictly governed by the file system. See, smart people sit in jails too
Another important point to consider here is to remember that sometimes these *few* applications hold the weight of an entire system, as far as the users are concerned. Examples of such applications are the tools of trade of various groups of users - photo-editing tools that index your photo libraries, webservers for hosting companies, etc. In fact, an operating system that loads faster is often a choosing factor for many people out there - and booting off using a myriad of small fragmented configuration files, as is the case of many modern setups, is a good target for optimization, which can be done using another form of disk access, if just for booting process. All in all, if runtime speed is a limiting factor, users want more. And it is exactly the smart developers you speak of, who can afford to circumvent the file system and develop their own for that speed edge over their competitors, if they could, that is. If kernel provides safe raw disk access, which MIT exokernel more or less proved can be done, these developers do not need concern themselves with cross-process security, only with the actual file system as such.
But I am not trying to merely advocate for exokernels, I am trying to isolate the difference between different modes of access as pertaining to very similiar (but not equal) storage mediums.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 8:00 am
by Combuster
amn wrote:Linux, for one, does not allow end-users to re-partition their hard drive to find free sectors to use. Most of the time, the entirety of available internal physical storage is strictly governed by the file system. See, smart people sit in jails too
This smart person has been repartitioning a live disk, the one from which the running linux was booted, without incurring data loss, thus proving the possibility
Looks like chroot'ing yourself is not helpful at all - or actually, it was
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 8:06 am
by bluemoon
amn wrote:but if your say, technical manager told you to implement raw disk access for the sake of maximum speed in Linux how will you circumvent the file system layer?
Tell him
not to do premature optimizations, nor couple a default solution (raw disk access) with the goal(increase speed - of what?).
To increase disk access speed you probably get better performance with cache algorithm, optimizing access patterns, or hardware upgrades.
Furthermore, direct raw sector access does not imply shorter time to complete a particular
work - it may get worst in some situations due to seek time.
amn wrote:You can't unless you are willing to ask your users to either have an unused partition available or install a secondary physical storage entity, whether it be a hard drive or a so-called pen drive. Linux, for one, does not allow end-users to re-partition their hard drive to find free sectors to use. Most of the time, the entirety of available internal physical storage is strictly governed by the file system. See, smart people sit in jails too
The second best would be allocate a continuous file and mange it yourself, and hopefully the system overhead is neglectable comparing to disk access time.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 8:51 am
by Kevin
amn wrote:Smart enough does not equal a solution. There are smart developers everywhere, but if your say, technical manager told you to implement raw disk access for the sake of maximum speed in Linux how will you circumvent the file system layer? You can't unless you are willing to ask your users to either have an unused partition available or install a secondary physical storage entity, whether it be a hard drive or a so-called pen drive. Linux, for one, does not allow end-users to re-partition their hard drive to find free sectors to use. Most of the time, the entirety of available internal physical storage is strictly governed by the file system. See, smart people sit in jails too
First of all, cases in which having raw disk access improves performance are very rare. Probably it's not a good idea to circumvent the file system layer (including the kernel's page cache and scheduling mechanisms) in the first place, because performance will suffer instead of being improved.
Second, what you describe is by definition. In this thread, any data structure that could possibly partition a disk into smaller chunks has been defined to be a file system. If you don't want a file system, your application can't share the disk with other applications and will need the whole disk by definition.
Now if we ignore that requirement, "repartition to finde free sectors" sounds a lot like LVM to me, which will dynamically partition a disk into smaller blobs. It doesn't have subdirectories and stuff, but still logical volumes have a name, and in order to organise it you still have data structures on the disk. This is maybe the level of abstraction that closest to what a MMU is to RAM, but as you can see there's still no fundamental difference to a file system.
and booting off using a myriad of small fragmented configuration files, as is the case of many modern setups, is a good target for optimization, which can be done using another form of disk access, if just for booting process.
Do you seriously think that the average performance gets better if every application has to do the whole work itself instead of leaving it to the file system? Do you even have a rough idea of how much optimisation work is going into file systems? Duplicating that work in each application sounds like it's not going to happen, therefore performance will suffer in the end.
Re: Why we address RAM directly but use file system for HDD?
Posted: Mon Jan 14, 2013 11:47 am
by linguofreak
amn wrote:Smart enough does not equal a solution. There are smart developers everywhere, but if your say, technical manager told you to implement raw disk access for the sake of maximum speed in Linux how will you circumvent the file system layer?
Make the program doing the direct disk access setgid disk (or setuid root) and access the disk directly through /dev/sda (assuming Ubuntu 10.04 with SATA disks). Technically it still goes through "the filesystem", but it doesn't have anything to do with the on-disk filesystem.
Re: Why we address RAM directly but use file system for HDD?
Posted: Tue Jan 15, 2013 3:54 pm
by Cadav3r
Did anyone realize that you can have a file act as a mountable filesystem?
For example, on os level, u can have exampleOS.vhd, storage.vhd, whatever and evem access from 3rd party apps (see leanfs example on freedos32 site or virtualbox)
on the kernel level, u can build a module or library to give the kernel access through api and have non filesystem space in part of the kernel space or as a partition