Page 1 of 1

Virtual memory area datastructure

Posted: Thu Jan 09, 2020 12:43 pm
by nielsd
Hi

I'm trying to figure out what datastructure to use for the virtual memory area mappings (VMAs).
In most kernels, the VMAs store which virtual memory address ranges are being used and which aren't for a given address space.
They also store additional information, such as a pointer to a FILE structure in case of memory-mapped files.
They can be queried to find a free range of virtual addresses, for example for mmap in case there's no preferred address given.

Both Linux and BSD use a red black tree which store intervals, and Windows uses an AVL tree which also stores intervals.
An AVL tree seems to be better because the VMAs are likely more read than changed.
I'm concerned about three issues.
One issue is that you need to take a lock on the whole tree when changing it.
Second issue is suboptimal use of caching in case of a binary tree. This issue could be solved by using a B-tree of a small order (such as a 2-3-4 tree).
Last issue is, on systems with a lot of cores, there might be a lot of cache misses due to the cache coherency protocol.
Issue 1 & 3 usually aren't that big of a deal, because these are per-process anyway and you don't change the VMAs that often.

In my case, I'm using software isolated processes with a single address space, which means I'd have one shared structure for all processes. Taking a lock becomes an issue, and cache coherency invalidations also becomes more of an issue.
I was wondering how to solve (or partially solve) these issues.
I believe most important operations on the datastructure are querying a range and finding a free range.

I've also read about radix trees, but I'm not sure if that's the way to go.
Maybe a solution might involve a different datastructure than trees?

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 4:25 am
by Korona
VMAs change rarely, so locking is not that problematic.

For the choice between AVL and red-black (note that there are also other variants of balanced BSTs that allow you to achieve a configurable balance): it doesn't matter too much, especially not for a hobby OS. Any performance difference between those two will be within the noise level for almost all benchmarks.

I would advise against B-trees: in general, B-trees are less complicated than red-black trees but they are harder to augment (to interval trees).

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 12:14 pm
by Schol-R-LEA
Korona wrote:I would advise against B-trees: in general, B-trees are less complicated than red-black trees but they are harder to augment (to interval trees).
I'm not sure why B-trees would be an option in the first place; their whole purpose is for indexing data held in slower secondary storage, and while virtual memory usually involves swapping into and out of disk storage, it isn't generally content-addressable - there's no need to index the contents, just the locations in virtual memory which the pages map to.

I may be missing something, though. Would there be a use case for B-trees in tracking pages with a conventional virtual memory system?

(I can see where they might be of use in my planned design, since the way I mean for virtual memory to interact with automatic memory management might involve content-addressability, but it is far removed from how virtual memory is handled in most systems, and I'm still working out the details in any case.)

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 12:51 pm
by nielsd
The reason I said B-trees was not because of content addressability. I just thought about it to improve cache line usage.
It was more of a thought, I didn't work out the details.

I suppose AVL could be the way to go.
However, since in my case there's one global shared VMA tracker instead of a tracker per process, the depth could be big.
If a process behaves badly and creates many (small) intervals, it could make lookups for the whole system slower.

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 1:55 pm
by reapersms
That would be one of the various reasons per-process address spaces are the rule rather than the exception. Just because 64-bit address spaces allow single address space thunderdomes to be a thing again doesn't mean it's a great idea. What is your motivation for going down this path?

Software isolated single address space like that is more or less impossible to enforce, so you're essentially reinventing real mode dos, old system 6-ish era macos, or the various retro 8 bit computers of yore. How does your isolation survive userland bugs?

Alternatively, if all of the software to run on it will be written by you, one solution to the problem is just don't do that many tiny vma allocs, and make sure your malloc implementation knows that. While that works, it doesn't scale particularly well with the need for sleep, food, day job for bills, etc...

If you're using a single map, but still have a page table per process to get permissions handled, and are using the single space to make it easier to pass data and share memory between processes, things may be a bit different. I suspect in that case the overhead of needing to wrangle those issues on the user side will be less than the overhead added by the single VMA structure.

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 2:26 pm
by nielsd
reapersms wrote:What is your motivation for going down this path?
Research and having fun.
reapersms wrote: Software isolated single address space like that is more or less impossible to enforce, so you're essentially reinventing real mode dos, old system 6-ish era macos, or the various retro 8 bit computers of yore. How does your isolation survive userland bugs?

Alternatively, if all of the software to run on it will be written by you, one solution to the problem is just don't do that many tiny vma allocs, and make sure your malloc implementation knows that. While that works, it doesn't scale particularly well with the need for sleep, food, day job for bills, etc...
I think you're misunderstanding.
I'm talking about software isolated processes. (https://www.microsoft.com/en-us/researc ... isolation/ explains it shortly)
In short: The boundaries / protections are enforced by the language safety rules and static checking.

It has nothing to do with real mode, 8bit, or reinventing ancient OS.
What I'm trying to create is inspired by projects such as MS Research Singularity.
Or in my case: WebAssembly in userspace.

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 2:35 pm
by Korona
Yes, B-trees are generally faster than BSTs (unless the keys are really huge) in benchmarks due to their better caching behavior.

The depth of AVL trees (or red-black trees) grows logarithmically - doubling the number of intervals only increases the depth by 1. The locking might be more problematic. You could partition the virtual address space (say, in 4 GiB blocks) and use a radix tree for the first few levels (but radix trees do not efficiently support interval operations).

Regarding software isolation: Note that most kinds of software isolation are broken by Spectre/Meltdown (and this will be even easier in the kernel where you cannot really defend against Meltdown). Browsers generally run wasm code of different origins in different virtual address spaces.

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 2:42 pm
by Schol-R-LEA
I just realized that I had a brain fart earlier - I didn't really mean 'content-addressability' in the way that term is usually meant, but for some reason my brain glommed on to that term.

Re: Virtual memory area datastructure

Posted: Sat Jan 11, 2020 2:48 pm
by nielsd
Korona wrote:Yes, B-trees are generally faster than BSTs (unless the keys are really huge) in benchmarks due to their better caching behavior.

The depth of AVL trees (or red-black trees) grows logarithmically - doubling the number of intervals only increases the depth by 1. The locking might be more problematic. You could partition the virtual address space (say, in 4 GiB blocks) and use a radix tree for the first few levels (but radix trees do not efficiently support interval operations).
I've been thinking about partitioning. I don't know how to determine the "optimal" partition size, I guess it differs on what you run so it's not really possible to determine.
Korona wrote: Regarding software isolation: Note that most kinds of software isolation are broken by Spectre/Meltdown (and this will be even easier in the kernel where you cannot really defend against Meltdown). Browsers generally run wasm code of different origins in different virtual address spaces.
Yeah, I'm aware of the issue. I believe Spectre v2 could be mitigated against using retpoline, but I need to read up on that again.
As for spectre v1 & meltdown, this is afaik not fixable in this case.
There are CPUs now with some of these side channels fixed, but ofc most CPUs are older than that.
Since this is not a serious project and since they would be fixed in hardware in the future eventually, I'm not really going to bother with it.

Re: Virtual memory area datastructure

Posted: Mon Jan 13, 2020 8:24 am
by nyc
I like B+ trees for regions. I've at least heard of RCU implementations of B+ trees using tombstones and they have cache and sometimes (esp. in userspace) TLB advantages. Radix trees probably aren't a good fit for the problem. Balanced binary trees probably have simpler interactions with locking and lock free synchronisation because B/B+ trees won't know whether to allocate new nodes until under a lock or in some lock free synchronisation context etc.

Re: Virtual memory area datastructure

Posted: Mon Jan 13, 2020 12:41 pm
by Korona
Radix trees can trivally be made RCU compatible (see e.g., managarm's implementation). It is generally much harder to make balanced trees (e.g., B-trees or binary search trees) RCU compatible since they will be in an inconsistent state during rotations.

To handle allocation, you can preallocate a node (probably using a small cache) before taking the lock.

Re: Virtual memory area datastructure

Posted: Mon Jan 13, 2020 4:10 pm
by nyc
True enough, radix trees are easier to make lock free. They have a disadvantage in that updates to memory ranges are O(length(range)) as opposed to O(lg(nr_ranges)), but that may not be considered significant or contrary to design goals.

The allocation under lock or inside the lock free synchronisation context via preallocation outside the lock or context may not be that difficult for more experienced programmers, but you may be right that it's not a significant problem even for inexperienced programmers.

There is a method related to RCU called tombstones that enable things like balanced trees to be used with lock free read sides and single spin lock write sides, but they do get complex. I probably couldn't personally code them very easily, though I might eventually get there.