implementing a page cache

poby · Post by **poby** » Thu Oct 13, 2016 4:13 am

I want to create cluster page caches for drives but I'm struggling. Say my cluster size is 64k and I'm caching them in an expanding table that eventually might have 10's of thousands of entries. When I want to access a particular cluster, how do I efficiently find it in the table? Linear search?? A 2TB hard drive might have 32 million clusters. But only a few 10's of thousands are in the page cache. How can I efficiently locate the cached cluster in ram?

I've been reading a lot about page caching and memory mapping both here on osdev and elsewhere and it's gradually starting to sink in but I'm kind of stuck on how much a page cache can help unless there is some trick to locating the cached cluster in ram. Any other suggestions much appreciated. I'm kind of lost here.

Velko · Post by **Velko** » Thu Oct 13, 2016 4:33 am

Ever heard of search trees or hash tables?

Schol-R-LEA · Post by **Schol-R-LEA** » Thu Oct 13, 2016 2:33 pm

EDIT: never mind, I lost track of the specific subject (page caching) and went on a more general spiel. What I wrote below are all good topics to learn about, but they won't, as a rule, apply here. For something that does apply, look at the Wicked-Pedo page on Caching algorithms, especially Least Recently Used algorithms, generational caching, and priority weighting. It would be a good starting point, but you will need to do more once you have that overview.

I second Velko's point, and specifically recommend using a B-Tree (or a variant such as a B+-Tree or B*-Tree), though you will want to see whether something different (such as an extensible hash table or a tree of hash tables) will be better suited for the specific purpose. Alternately, look at a Trie, though that probably wouldn't fit here (they generally aren't good for disk-bound data) but might useful in a compound structure.

Start by comparing the algorithms and structures for their likely performance, then experiment a little with an implementation or two. If it varies a lot depending on the specific data or layout, you might want to consider an adaptable approach (i.e., one where the structures on disk could be used with different algorithms, and you can shift algorithms interchangeably as the situation changes), but that would be an end-stage optimization that would take an inordinate amount or work.

Oh, and if it is a fixed (or very slowly changing) data set, consider applying a perfect hash instead. They are very fast for lookups, but if the hashed component of the data changes or new data is inserted, the whole thing needs to be recalculated, a rather slow and memory-intensive process, even with the newer dynamic perfect hashing algorithms. If data is insert-only, a tree of smaller perfect-hash tables might work well, as it would localize the hash recalculations.

MichaelFarthing · Post by **MichaelFarthing** » Thu Oct 13, 2016 3:56 pm

Schol-R-Lea speaks much truth in his post but is rather, well, scholarly.

By the sound of things at this stage you are looking for something that works reasonably well but it is probably more important that it does work.

I would suggest that you DO NOT research all this stuff but stick with a straightforward plain binary tree, many explanations of which can be found in numerous places on the web and elsewhere. It is probably as efficient as you will need for some considerable time to come (by which time you will be quite capable of making improvements with complete confidence) and it is relatively easy to implement in stages: (a) add a page (b) find a page (c) delete a page (d) balance the tree to make it more efficient.

(a) and (b) are essential (and in practice are probably written together)
(c) will probably be wanted (in your case definitely eventually, but maybe not until you're much further advanced)
(d) is optional. It is a bit like defragmenting a disk in these regards:
i. it is quite hard to do
ii. it is often unnecessary**
iii. it frequently takes longer to execute than the time saved
iv. if you want perfection it will soon need doing again

However, like defragmenting a disk, it is a good challenging little task if by any strange possibility you run out of more important things to do.

**There are some important exceptions to this. If, for example, you want an English dictionary in your program and you implement this as a binary tree and populate your tree from Chambers English Dictionary (or whatever) beginning at A and working to Z then your binary tree will be horrendously inefficient and will certainly need balancing. However, if you populate at random whenever you notice a word that you have used is not yet in the dictionary then balancing will probably not really be needed.

MichaelFarthing · Post by **MichaelFarthing** » Fri Oct 14, 2016 1:18 am

Thinking about it overnight I realised, of course, that disk caching is one of those instances where a binary tree can get very unbalanced if a user is reading a file sequentially and the file is not fragmented. So maybe you will need to balance sooner than I suggest. (Maybe). An alternative would be to hash* the page number before adding to or searching the tree - for instance, if you did a bit-reverse of the address that would fairly certainly keep your tree a lot more balanced.

So in a trivial case where a disk has only 32 clusters:
Cluster 22 = Binary 10110 reverse the bits 01101 = doctored version 13
Adding clusters 1 to 7 in order results in doctored cluster numbers being added in the order 16,8,24,4,20,12,28
Curiously, this particular mechanism turns the worst case scenario into the best as a perfectly balanced tree is the result:

Code: Select all

             16
     8               24
4       12    20      28.

instead of

Code: Select all

But this sort of idea can very easily be added later if you find you need it. Just get a simple tree working first.

*it's not exactly a hash because hashing is intended to produce a smaller value than the original to reduce the need to reserve storage space.

poby · Post by **poby** » Fri Oct 14, 2016 1:15 pm

Thanks for the very useful ideas. I had discarded the idea of a binary tree due to the likelihood of it getting severely unbalanced but doing a simple hash of the cluster number like reversing it is brilliant

Brendan · Post by **Brendan** » Sat Oct 15, 2016 1:41 am

Hi,

poby wrote:Any other suggestions much appreciated.

My suggestion is, don't bother caching "lower level disk data". Instead, cache "higher level directory and file data" in the VFS.

There's no point caching the same data at different places (e.g. in VFS and in file system driver and in storage device driver) - all that does is waste RAM duplicating the same data. The VFS will want/need to cache things like directory info anyway (as VFS will need to deal with things like permissions and mount points); and will want/need an efficient method of finding a file or directory's metadata anyway (and by caching file data in the VFS you can use that same "efficient method of finding metadata" to find cached file data).

Don't forget that disk caches have interactions with physical memory management. Basically, when OS has lots of unused physical RAM you want to cache as much file data as possible to avoid wasting it, and when OS is running low on physical RAM it needs to ask "cache holders" to purge some cached data to free up some more physical RAM. Disk caches also has interactions with virtual memory management (e.g. if a file's data is in some physical pages in a cache somewhere, then those physical pages can be mapped as "copy on write" into multiple processes too).

By doing caching at the VFS (and not in file system code and not in storage device drivers) you avoid the need for file system code and storage device drivers to handle the extra complexity caused by interactions between caches and physical and virtual memory management.

Also; by doing caching at the VFS (instead of elsewhere) you improve efficiency (especially for micro-kernels) because if there's a "cache hit" the VFS can handle things like "open()" and "read()" by itself without any extra overhead (without file system code and/or storage device driver being involved).

Lastly; there can be more complex situations to consider. For example, maybe "/foo" is a FAT32 file system and "/foo/myDiskImage.iso" is a file on the FAT32 file system; and you mount "/foo/myDiskImage.iso" as an ISO9660 file system at "/bar". In this case you want to avoid cache the same data in multiple places (to avoid wasting RAM for duplication) and therefore want to avoid caching "/foo/myDiskImage.iso" while still caching other files on "/foo" and all files on "/bar". Doing caching at the VFS makes it easier to handle these complex situations (e.g. it can simply have a "don't cache this file because it's being used by another file system as a storage device" flag).

Note that (as part of a file or directory's meta-data) the VFS can have some kind of "file system code's reference", where VFS tells file system code what this reference is (e.g. during requests from VFS to file system caused by "VFS cache miss") and the file system code uses it for whatever it likes. In this way you can avoid the need for file system code to keep track of where files/directories are on their storage device (e.g. use it for "first cluster/block/sector for file's data") by getting VFS to keep track of it for them.

Of course even with all of the above, file system code may still need to cache some data that VFS doesn't, like (e.g.) the "cluster allocation table" in FAT. However, this is typically a tiny amount of data (compared to file and directory info).

Cheers,

Brendan

rdos · Post by **rdos** » Sat Oct 15, 2016 6:10 am

I cache both raw disc data and file data much the same way the processor arranges page-tables for linear memory using a two-level approach. Meta data caching is up to the specific filesystem driver to do. I don't cache the FAT allocation table, rather rely on it being cached as raw disc data instead.

OSDev.org

implementing a page cache

implementing a page cache

Re: implementing a page cache

Re: implementing a page cache

Re: implementing a page cache

Re: implementing a page cache

Re: implementing a page cache

Re: implementing a page cache

Re: implementing a page cache