Page cache

Programming, for all ages and all languages.
Post Reply
sarah99
Posts: 2
Joined: Fri Mar 29, 2019 5:19 am

Page cache

Post by sarah99 »

What is the best way to implement page cache?

Binary search OR Hashtables?

Which is better? I'm thinking of going with binary search tree as described on interviewbit
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Page cache

Post by Korona »

First of all, implementing a page cache consists of more than just having a file offset → physical page map. However, for the rest of this post, I'll restrict myself to such these maps. I will also assume that the map should be sparse, if it is expected to be dense, a simple linear array will be the best implementation.

Plain binary search trees do not really offer good time complexities (they only guarantee a trivial O(n) complexity for insertion, deletion and search). Balanced binary search trees (with red-black trees being by far the most common) improve this situation and offer O(log n) insertion, deletion and search complexities. However, due to their overhead (at least two pointers per node) they are not really suited for integer → integer maps unless you need their ordering properties (i.e., the ability to find predecessors and successors of given nodes). B-trees improve this situation considerably.

Hash-tables are more space efficient than binary search trees.

However, the superior data structure for page caches are neither binary search trees nor hash tables. Radix trees have very desirable properties for a page cache: they offer O(log d) insertion, deletion and search, where d is the word size. Furthermore, they store contiguous keys very efficiently: if a subregion of a file is densely populated (in the page cache), the radix subtree of that region degrades into a linear array -- the best possible storage strategy. Finally, the most important feature of radix trees is that (with a careful implementation) search can be done concurrently with modification: the radix tree does not need to be locked exclusively while a thread performs a search. This is not true for binary search trees (or B-trees) and somewhat complicated (at least if you want reasonable performance) in the case of hash tables.

Indeed, the x86 page tables are implemented as (simple) radix trees, with a fixed number of bits per level. More efficient designs can use a variable number of bits per level and thus avoid to store (almost) empty levels.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
sarah99
Posts: 2
Joined: Fri Mar 29, 2019 5:19 am

Re: Page cache

Post by sarah99 »

@Korona thanks for replying. I will consider using Radix trees. Let me learn more about radix trees first.
Post Reply