Binary Buddy Implementation

Alboin · Post by **Alboin** » Sat May 17, 2008 2:19 pm

Okay. I'm resorting to the forum. I've searched this place, read every topic having the word 'buddy' in it, and have done ample research: I've read every inkling of info I could find on the matter of buddy allocation systems. (the wiki, Google, wikipedia, and other various sites.)

However, I'm getting conflicting info. Some resources say to use a binary tree, and yet, at the same time, others say to use some sort of binary-bit structure. (Do they mean to implement a btree in a bit structure?)

Could someone please explain how binary buddy memory management is implemented? (I understand how it works in theory, but the implementation slightly eludes me. I've got it pretty much working in a demo userspace binary tree system atm, however, I think I'm missing some big part of the picture...)

Any enlightenment would be awesome.
Thanks Greatly,
Alboin

jnc100 · Post by **jnc100** » Sun May 18, 2008 8:24 am

Firstly, I recommend you only use a buddy for regions of memory where you require 2^n sized and 2^n aligned areas of contiguous memory to be allocated (a good example is the first 16 MiB for DMA). If you just require randomly assigning a 4 kiB page, then a free page stack is quicker. There were several old threads on the topic, e.g. here. If you're looking for an implementation, a rather unoptimized version is available here although it is from about 2 kernels back now.

The basic idea is that a buddy system uses a set of bitmaps for each page size. See the thread I posted (2nd page) for some illustrations of how the system works in practice.

Regards,
John.

Alboin · Post by **Alboin** » Sun May 18, 2008 12:35 pm

So a set of bitmaps are made for each size to the wanted limit? (eg. 4kb-4mb) That makes more sense. From what I was reading, I was under that impression that a series of bitmaps would have to be made for each size up to the limit of the hardware, as in a true binary tree. (eg. 4kb-4gb)

Thus, the best method, as you said, would be to use a buddy system for the lower 16mb, and a stack for the rest, correct?

Thanks for you help,
Alboin

jnc100 · Post by **jnc100** » Sun May 18, 2008 1:36 pm

Alboin wrote:So a set of bitmaps are made for each size to the wanted limit? (eg. 4kb-4mb)

Correct. You can choose any size you like, I picked 4 kiB to 4 MiB as they are common page sizes for x86. Basically, if you want to easily allocate a contiguous section of memory of a certain size, then you need a buddy of that size. In theory you could have a buddy for the 4 GiB size, but it would only have one entry which would be forever marked as not free (as some of it would be taken up for the buddy structures itself). The total size of memory required for the buddy structures is in the region of just below 2x the size of the buddy for the smallest allocation size (give or take the size of the header structures etc), making it take up twice as much room as a simple bitmap would (but still a small percentage of the memory it is allocating).

Alboin wrote:use a buddy system for the lower 16mb

Yes, that's what I did. It is, of course, very unlikely that you will ever need to use 16 MiB for DMA devices. Assuming the kernel is loaded in the bottom 16 MiB of physical space, it provides an easy way of marking the kernel's pages as used.

Many of the optimizations you can do to speed up the system are the same as those which speed up searches in bitmaps (because that is how the buddy is implemented). One simple optimization is to keep a count of the free entries at each level and just pass straight up the tree is you try and allocate from one with a free count of zero.

Regards,
John.

Ready4Dis · Post by **Ready4Dis** » Sun May 18, 2008 2:09 pm

Yes, there are many ways to do this, the simlpest is simply making a bitmap for each level of memory sizes you want (there is nothing saying you MUST use 2^N, you can use log's, you can use random numbers depending on your OS specific requirements. The other way, is to use a tree (as mentioned), so if you know the an entire 4mb is used, using a bitmap for each level, you will have 1024 entries for 4k pages, 512 entries for 8k pages, etc, etc. and this wastes a ton of memory, so using a tree, you can do away with some redundancy at the added expense of more code.

Now, the beneift of the first method is simplicity, however it is wasteful, and still not the fastest to search for free entries (just because 4mb area is marked, doesn't mean it doesn't have any 4kb area's free). Depending on your OS's requirements for speed/size you can do a few tricks. For example, using just bitmaps like the first example, except using 2 bits per entry (for everything except your lowest level). The first bit means partially used, while the second bit means fully used. Neither bit means completely free, and then you can have both bits set mean anything you want (anything else useful). The benefit of this is, if you're searching for say, 4kb, and 4kb is your smallest size, you can scan the 4mb pages to see if there are any smaller sizes free, if so, you can then scan the smaller sizes until you find something, or find it isn't available. This is MUCH more efficient speed wise than just searching ALL 4kb blocks, however it doubles the size of you bitmapped area. For an example:

Code: Select all

4k  x  x  x  x  x  x  x  x  x  x  o  o  o  o  o  o
8k       x           x           x           o
16k            x                       x

Typically, if you are allocation 4k, it would search through all 4kb blocks, so you would induce 1 compares before finding it. If you encoded with 2 bits per block, it would check the first 16k block and determine it had nothing free under it, and move to the next 16k block, which would say it has some available. So, it would check the first 8k block which would say it had some free, then check 3 4k blocks before finding a free block. This induced 6 compares rather than 11, now imagine this after you have been running the OS for a long time, and trying to allocate memory, and you can see the speed benefits could be large, and since the bottom layer (4k chunks) have nothing under them, they can be 1-bit encoded.

Now, for the example of the first 16k chunk, if you encoded this as a binary tree (or any tree for that matter), you the first element of the tree would have no children, which means you save the memory required for everything under it, but the memory to encode a tree, and how you allocate that memory is the challenge, but if you can figure it out, you will save space, and gain speed, especially when the system is taxed the most (almost all memory in use). It would take up more memory when the system had lots of free memory, and take the least space when the system was using almost all memory, so it's much better for worst case, although allocating and deallocating speeds suffer do to the extra complexity of having to build/destroy the tree each time.