Why we address RAM directly but use file system for HDD?

tom9876543 · Post by **tom9876543** » Fri Jan 11, 2013 8:42 pm

Blacklight wrote:
tom9876543 wrote:Paging to disk - can you explain how does the APPLICATION know which sectors to use on the disk? The disk would have to be ORGANISED into a file system?
We're treating a HDD as we do RAM here. The kernel has a table of some sort of table internally to keep track of what's used. This would be stored somewhere on the disk for later use and reuse. It's a necessity for any sane organization of storage, RAM or HDD or CD or whatever have you.

tom9876543 wrote:Person runs the CREATE application and closes it.
Person then runs the EMAIL application.
How do they share data?
Same way as you do with a filesystem, just instead of a file name, use a linear address of a sector. It's not a pretty system, but it works. Alternately, if you don't close create first, then it can pass the address directly to email.

tom9876543 wrote:OK so the CREATE application has 5000 different movies.... good luck going back to it a week later and finding the one you want to work on.
The CREATE application would have to have its own little database similar to MAC finder.... and then ALL applications would need their own "finder" which is a significant amount of coding effort and they would all be inconsistent.
So... a filename database?

1) Treating the HDD the same as RAM? Are you sure about that? That would say to me the process (application) uses ONLY memory pointers to access "memory" and has no concept of disk. Now what happens when the pointer is 32 bits and the file is larger than 4gb?

2) That would NOT work. How does the EMAIL application even know what movies the CREATE application has created? Apparently the CREATE application will have its own internal list of movies. The CREATE application has closed so the EMAIL application has no idea about the internals of the CREATE application.

3) You seem to agree it is stupid for each APPLICATION to have its own MAC finder / filename database. I think iTunes is a good example of an application having its own database and not really requiring a file system. But is it practical for EVERY application to have its own different interface to manage documents?

Brendan · Post by **Brendan** » Fri Jan 11, 2013 10:41 pm

Hi,

In both cases you've typically got something to identify the data, and the data itself. Humans naturally prefer to use names as identifiers (e.g. "hello.txt" or "char *myTextBuffer"). Computers naturally prefer to use numbers as identifiers. To make both people and computers happy you have "name to number" conversion.

For file systems the "name to number" conversion means using a file name to find a location on disk. For programming languages the "name to number" conversion means using a symbol (e.g. variable name) to find a virtual address.

The main difference is when this conversion occurs. For file systems the conversion typically occurs when the file is opened; and for programming languages the conversion typically happens when you compile the source code.

Of course programming tends to use a lot of indirection - e.g. "the data that is pointed to by the data at myName", where only the pointer has a name and the data the pointer points to doesn't. In theory file systems could do this too (e.g. a symbol link with a name that points to a second file that has no name) but I doubt anybody has ever actually wanted to bother with unnamed files.

The other difference is security. Some data is "owned" by only one process and no security is needed, and some data you want potentially many processes to be able to access and some sort of security is needed. Where some sort of security is needed you need some sort of permission check where the data can't be accessed if permission is denied (and if you want you could replace "open(fileName)" with "check_if_I_have_permission(fileName)").

Basically you end up with 2 completely different things for completely different purposes - something where names/identifiers can be converted into numbers in advance, having data without a name can make sense and no security is needed; and something where the names/identifiers can't be converted into numbers in advance, having data without a name doesn't make much sense and security is needed.

Note that this has nothing to do with disk vs. RAM - you can have swap space, and you can have file systems in RAM. However, different types of hardware have different characteristics - RAM is typically faster than disk, RAM is typically much smaller than disk and RAM is typically volatile. Because of these different characteristics, it makes sense to use faster/smaller/volatile RAM for software and slower/larger/non-volatile storage for file systems.

Some time in the next 5 years I'm expecting non-volatile RAM to become an economically viable option. If/when this happens it may make sense to shift (at least some) file systems into RAM (e.g. files that are accessed often). Even if all data could be in non-volatile RAM (which is unlikely due to size differences) you'd still want a file system due to the different usage (name lookup and permission checks on "open()"). Offtopic note: economically viable non-volatile RAM would be an interesting thing for OS research that could radically change the way (some) OSs are designed; but in the end I suspect that it'll only be used for "hibernate" and persistent VFS caches and there won't be any significant change to OS design.

Cheers,

Brendan

linguofreak · Post by **linguofreak** » Fri Jan 11, 2013 11:05 pm

Brendan wrote:Offtopic note: economically viable non-volatile RAM would be an interesting thing for OS research that could radically change the way (some) OSs are designed; but in the end I suspect that it'll only be used for "hibernate" and persistent VFS caches and there won't be any significant change to OS design.

Other than economics, aren't there also problems with write speeds for most modern non-volatile RAM being considerably slower than read speeds?

Brendan · Post by **Brendan** » Sat Jan 12, 2013 1:17 am

Hi,

linguofreak wrote:
Brendan wrote:Offtopic note: economically viable non-volatile RAM would be an interesting thing for OS research that could radically change the way (some) OSs are designed; but in the end I suspect that it'll only be used for "hibernate" and persistent VFS caches and there won't be any significant change to OS design.
Other than economics, aren't there also problems with write speeds for most modern non-volatile RAM being considerably slower than read speeds?

As far as I can tell, Everspin's ST-MRAM is as fast as normal DRAM. Their problem seems to be a lack of motherboard and OS support, leading to a lack of demand, leading to a lack of volume, leading to higher prices and lower densities (90nm process).

Basically it's already a viable alternative to volatile RAM in the technical sense, it's just not a viable alternative in the economic sense.

Cheers,

Brendan

rdos · Post by **rdos** » Sat Jan 12, 2013 4:16 am

I have an interface (for user mode) to do raw disc read/write operations. For one, it is used to be able to change/add/delete partition tables when installing RDOS on a machine. I also use it for data storage. I have a storage class which is backed on disc, which consists of a list of items of the same size which typically is C structures. I use fixed sectors (from the end of the disc) for this. In order for this scheme to work, I typically leave a number of sectors at the end of the disc unpartitioned. I feel this is a lot safer as the data is not lost if the file system becomes corrupt. I plan to implement long-term storage (MID regulations) using this method.

amn · Post by **amn** » Sat Jan 12, 2013 5:48 am

Kevin wrote: That's the job of the OS. It usually associates a memory context with a running process, that has a name, creation date, etc. All of this metadata is stored somewehere in RAM, like filesystem metadata is stored on disk.

As far as I know, there is no "name", "creation date" or anything like that associated with process heap. Perhaps on the more exotic systems, but not on vanilla Linux systems. Such metadata, even though necessary and present to some degree, is an order og magnitude more rudimentary than what a file system has. There is certainly no transaction support, or versioning facilities for memory heaps, as is often the case with modern filesystems.

So what is a file system?

That was huge part of my whole point - unlike simpler memory allocation and management, a file system, in addition to providing resource security and protection, unfortunately(?) imposes a layer of abstraction for the sake of simplicity. Resource multiplexing and protection are absolutely essential in any multi-user multi-tasking OS, but what we get on top with file systems is not strictly necessary. It's more of a tradition, really. We don't need journaling on disk any more than we need it in memory (which is not to say we don't need it, in fact I think we do).

Can you do without malloc()? I guess so. Would it be a good idea? Probably not. It's the same for using raw disks without a file system.

I didn't say we don't need `malloc()`. My argument was about not needing `open` requiring a filename with a path for on-disk data. Allocation is necessary a multi-user multi-tasking OS, it lets the resource be shared among concurrent clients. The added layer of abstraction dealing with files however, is not a necessity, it's a luxury, a commodity, a privilege, all depending on your application and yourself. In any case, when I spoke of memory- vs disk- access, I did not want to focus on allocation, but on access. I mean, why don't we write directly to bytes and bits we allocate on the disk? And why don't we [typically] use filenames, transactions, and journaling when dealing with RAM?

Kevin · Post by **Kevin** » Sat Jan 12, 2013 4:04 pm

amn wrote:As far as I know, there is no "name", "creation date" or anything like that associated with process heap.

Not with the memory context (it's more than just the heap) itself, but with the process to which the memory context belongs.

Such metadata, even though necessary and present to some degree, is an order og magnitude more rudimentary than what a file system has. There is certainly no transaction support, or versioning facilities for memory heaps, as is often the case with modern filesystems.

Metadata exists in both cases, so it's not a fundamental difference. You could add the missing metadata from the filesysem to a memory context with no major problems, and vice versa.

And of course some kind of transactions is required on RAM. Basically anything you do for thread synchronisation belongs to that category, starting with simple locks. RCU comes very close to the traditional understanding of transactions. And databases implement the real thing anyway, which obviously includes data in RAM.

One difference to that respect, however, is that RAM is volatile by nature. You don't have to worry about making data in write-back caches persistent (and doing that in the right order), because it simply won't become persistent in RAM. This is certainly one of the points that make file systems complex.

We don't need journaling on disk any more than we need it in memory (which is not to say we don't need it, in fact I think we do).

Journaling is mostly related to the problems with making things persistent. You don't have that problem in memory. What's your use case there? (It's not really required on disks either, but it's a common way to implement things)

My argument was about not needing `open` requiring a filename with a path for on-disk data.

Okay. This is not what defines a file system in my book. Having a directory tree, where each directory has entries that have a human readable name is convenient, but not the defining property of a file system. If you took ext2, and removed the directory entries from it, so that you would have to identify files by their inode number, that would be quite unconvenient, but it would still be a file system.

The purpose of a file system is managing which blocks are allocated to which file (where files don't have a fixed size). This is very much like what malloc() is doing for memory.

And why don't we [typically] use filenames, transactions, and journaling when dealing with RAM?

We do use names for anything that the user is expected to deal with. Never seen a C struct that had a field char* name? Names are usually not used for purely internal objects, like file systems typically don't allow to access their metadata using a file name.

Transactions, like I said above, are definitely used on RAM; journaling probably as well as one option of implementing transactions.

dozniak · Post by **dozniak** » Sat Jan 12, 2013 6:42 pm

http://www.eros-os.org/papers/storedesign2002.pdf

Owen · Post by **Owen** » Sat Jan 12, 2013 11:30 pm

Brendan wrote:As far as I can tell, Everspin's ST-MRAM is as fast as normal DRAM. Their problem seems to be a lack of motherboard and OS support, leading to a lack of demand, leading to a lack of volume, leading to higher prices and lower densities (90nm process).

DRAM is never produced on leading process nodes (insufficient margin in a highly commoditised market); some manufacturers are only just rolling out their 30nm DRAM process (which places them 2 years behind high end bulk semiconductor, e.g. CPUs)

In other words: the gains may not be as great as predicted (Memory on a high end bulk semiconductor process will be expensive no matter which way you slice it)

Brendan · Post by **Brendan** » Sun Jan 13, 2013 12:33 am

Hi,

Owen wrote:
Brendan wrote:As far as I can tell, Everspin's ST-MRAM is as fast as normal DRAM. Their problem seems to be a lack of motherboard and OS support, leading to a lack of demand, leading to a lack of volume, leading to higher prices and lower densities (90nm process).
DRAM is never produced on leading process nodes (insufficient margin in a highly commoditised market); some manufacturers are only just rolling out their 30nm DRAM process (which places them 2 years behind high end bulk semiconductor, e.g. CPUs)

In other words: the gains may not be as great as predicted (Memory on a high end bulk semiconductor process will be expensive no matter which way you slice it)

If some DRAM manufacturers are just rolling out 30 nm; then would it be safe to assume that Everspin could switch from a 90 nm process to a 65 nm or 45 nm process (if they had enough volume)?

I'm only going by the article/s; which seem to say "dies sizes are roughly the same as DRAM for the same capacity chip" and "timings are also comparable to DRAM" and "takes less power to switch and to run" (no refresh).

Cheers,

Brendan

amn · Post by **amn** » Sun Jan 13, 2013 5:14 am

Guys, please don't hijack the thread with discussions on DRAM. I wish you great success discussing it somewhere else though

Humble regards!

trinopoty · Post by **trinopoty** » Sun Jan 13, 2013 5:39 am

Back to tropic.
Even if you store data like if it was RAM, you need some way to know where things are. What if a program stores data at LBA 0x2000 and another program later modifies the data thinking it is it's own. Now, if the program that originally wrote the data tries to read it, it gets wrong data.
You need to keep track of data on disks as well as in RAM. We keep track of data on disk using File Systems. We keep track of data in RAM using various data structures. "File System" is just a term for the mechanism used to keep track of data on disk. Even if you abandon all file systems, you will develop/need a way to keep track of data on disk and everyone else will call it File System. Databases are similar to file systems but they remove some of the unnecessary stuff from common file systems; like creation date, etc, etc. If a database was to be written to a disk or partition, and a driver is installed in the OS to read the database, it will function just like any other file system.

Regards,
Trinopoty

amn · Post by **amn** » Sun Jan 13, 2013 5:45 am

trinopoty wrote:Back to tropic.
What if a program stores data at LBA 0x2000 and another program later modifies the data thinking it is it's own.

I have never seen or heard anyone being advised to read or write unallocated data. The way it has traditionally been done is that one asks the allocator for memory, and gets a pointer back, knowing the stretch of the area. The process memory allocator keeps track of what is what in the heap, and another process cannot safely read or write data - a segmentation fault will be raised in hardware, courtesy x86 protected mode. If two processes want to share memory they use some form shared memory API. Now all this of course pertains to the more traditional POSIX systems.

The scenario you described is no more or less applicable to a system I had described earlier (where disk is accessed using same abstraction that memory uses)

Am I missing something?

thepowersgang · Post by **thepowersgang** » Sun Jan 13, 2013 9:21 am

Probably the best answer to this is "We don't access RAM directly"
There's always some form of high level addressing scheme to memory (of any type). On disk, it's usually a "file system" (a collection of folders containing other folders or files). In memory, it's layers of allocators and the final addresses are stored at certain symbols (e.g. a linked list with a global head pointer).

You can choose to give over an entire disk to an application (use it as a large, fixed-size file), just the same as you can have a filesysem in RAM. It's a different tool for a different job.

A side note is that each "sorting" method is designed for the medium. File systems are designed around disks (usually) which have non-zero seek times, RAM layouts don't have to care about that. RAM locations need to be understood and quickly navigated by machine, while disk locations are usually meant to be human readable.

rdos · Post by **rdos** » Sun Jan 13, 2013 9:30 am

thepowersgang wrote:Probably the best answer to this is "We don't access RAM directly"
There's always some form of high level addressing scheme to memory (of any type). On disk, it's usually a "file system" (a collection of folders containing other folders or files). In memory, it's layers of allocators and the final addresses are stored at certain symbols (e.g. a linked list with a global head pointer).

There is always (?) a physical disc sector layer in the OS, which is used by file systems. If the OS allows applications access to this layer, it is quite possible to treat disc in the same way as RAM. The only difference is that the application needs to call some syscall (or rely on modified bits of pages) in order to write back contents to disc.

OSDev.org

Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?

Re: Why we address RAM directly but use file system for HDD?