Hi! I searched the forum and I could not find this particular question, if there is already an answer I would love to get a link to it.
I am building an OS to act as a platform layer for my game. It is supposed to provide memory, input, sound to the game and run it. Right now I am looking to implement the buddy allocator and later combine it with the slab allocator.
I am wondering if activating the mmu like in this tutorial: https://github.com/bztsrc/raspi3-tutori ... tualmemory , would be suitable to build the buddy allocator upon? If not could some body please point me in a another direction? The purpose is to implement some allocation function similar to the "malloc" function. Or perhaps in my case I don't need the MMU? (I suppose I'd like to have the MMU activated to support different processes running on the Pi in the future)
I am playing around with Rpi 4 and I am aware there might have been changes regarding this from the Rpi 3 to which this particular tutorial is written..
Very greatful for any help, I have been programming for some time and I have recently found out that bare metal programming is the most rewarding form of programming.
Rpi 4, buddy allocator, MMU
Re: Rpi 4, buddy allocator, MMU
Enabling the MMU and having a buddy allocator are orthogonal things, you can have either without the other or both.
However, from my experience with the RPi3, you kinda have to enable the MMU. The reason in particular is armv8 allows unaligned accesses, and your compiler may emit code assuming this. However, at least on the raspberry pi 3, you have to enable the MMU to get unaligned accesses working. I discovered this the very hard way So just enable the MMU and setup identity paging if you don't want to spend too much effort on virtual memory stuff.
(Also, to get atomic instructions working, you'll need to enable caches in case you use them)
However, from my experience with the RPi3, you kinda have to enable the MMU. The reason in particular is armv8 allows unaligned accesses, and your compiler may emit code assuming this. However, at least on the raspberry pi 3, you have to enable the MMU to get unaligned accesses working. I discovered this the very hard way So just enable the MMU and setup identity paging if you don't want to spend too much effort on virtual memory stuff.
(Also, to get atomic instructions working, you'll need to enable caches in case you use them)
Re: Rpi 4, buddy allocator, MMU
Hi,
If you're not thinking about a general purpose system, just a single application, even then you'll need
Cheers,
bzt
They are independent things.Orrexon wrote:I am wondering if activating the mmu like in this tutorial: https://github.com/bztsrc/raspi3-tutori ... tualmemory , would be suitable to build the buddy allocator upon?
That's a bit more complicated than that. You'll need several layers of allocation, see here. I've also wrote about it here.Orrexon wrote:If not could some body please point me in a another direction? The purpose is to implement some allocation function similar to the "malloc" function.
If you're not thinking about a general purpose system, just a single application, even then you'll need
- a page allocator, that keeps track of RAM (often called PMM, physical memory manager)
- a virtual memory allocator, that keeps track which pages are mapped where (VMM, virtual memory manager)
- and a user space library (which could be a kernel library if you're not planning on user space) that allows allocating arbitrary amounts of memory, this is what we actually call malloc.
You definitely need MMU, even for a monotasking system, just like @fbkr said. Without MMU, you won't have caching, memory is going to be slow, and you'll have to stick with strictly aligned accesses. With MMU, you can set up different caching mechanisms, and you can access any byte in the memory without getting an alignment fault. Furthermore, if you're planning to have multiple processes, then you'll need (it is not required, but strongly encouraged to have) separated address spaces for each process, and again, for that you'll need MMU.Orrexon wrote:Or perhaps in my case I don't need the MMU? (I suppose I'd like to have the MMU activated to support different processes running on the Pi in the future)
Nope, the basic concept of virtual memory is the same (and it is the same for all architectures). See here. Of course the bits are not like that on ARM, but everything else is the same. Take a look at the AArch64 paging figure in this post, if you scroll a bit up, you can compare it with the x86 long mode paging, and you can see the basics are the same.Orrexon wrote:I am playing around with Rpi 4 and I am aware there might have been changes regarding this from the Rpi 3 to which this particular tutorial is written..
And the most challenging one tooOrrexon wrote:Very greatful for any help, I have been programming for some time and I have recently found out that bare metal programming is the most rewarding form of programming.
Cheers,
bzt
Re: Rpi 4, buddy allocator, MMU
Oh'boy this is so cool Ok so I need 3 layers then.
I think I'll stick with the Buddy as a PMM then, as a first go at this. Only because I like the "simplicity" of it.
I guess I need to read up on the VMM. Now I've got loads of material to read. Let me just see if I understand this at a high level:
First the PMM which could be implemented using the Buddy.
Then I need a VMM, which could be the MMU (?) with some help from software to translate the pages to and from the physical address (which I would get from PMM?) this is the most difficult part for me to understand.
then, to use it in some process you would add an extra layer, "xmalloc" which is basically an api request (to VMM or PMM?), especially from user space.
I think I'll stick with the Buddy as a PMM then, as a first go at this. Only because I like the "simplicity" of it.
I guess I need to read up on the VMM. Now I've got loads of material to read. Let me just see if I understand this at a high level:
First the PMM which could be implemented using the Buddy.
Then I need a VMM, which could be the MMU (?) with some help from software to translate the pages to and from the physical address (which I would get from PMM?) this is the most difficult part for me to understand.
then, to use it in some process you would add an extra layer, "xmalloc" which is basically an api request (to VMM or PMM?), especially from user space.
I need to read more about these as well.There are many free and open source solutions for this last, dlmalloc, ptmalloc, etc., even I have written one, called bztalloc, but for a game I'd recommend jemalloc.
Unfortuantely, I could not see it I got a 404-error page not foundI have written one, called bztalloc
Well then that's settled, I absolutely need cachingWithout MMU, you won't have caching
LOL yes indeedAnd the most challenging one too
Re: Rpi 4, buddy allocator, MMU
That's ok, slab also popular, just as bitmaps.Orrexon wrote:I think I'll stick with the Buddy as a PMM then, as a first go at this.
Correct.Orrexon wrote:I guess I need to read up on the VMM. Now I've got loads of material to read. Let me just see if I understand this at a high level:
First the PMM which could be implemented using the Buddy.
Well, the MMU is the actual circuit that implements virtual addressing inside the CPU. There's only one active address space at a time (the currently running process'), that's all what MMU knows about. The purpose of the VMM is to keep track what physical pages are mapped in which address spaces. They can be shared, swapped out to disk, etc. If you go on with identity mapping, you won't need this layer at all (because there'll be one address space only, so everything you have actually in the active page table).Orrexon wrote:Then I need a VMM, which could be the MMU (?)
It is the MMU's job to translate virtual addresses to physical ones. For the other way around, physical to virtual, that's not possible, because one physical page might be not mapped at all in any address spaces (no virtual address associated), or it could be mapped in several at the same time (multiple virtual addresses). That's why you need a VMM that does the housekeeping of the pages' mappings.Orrexon wrote:with some help from software to translate the pages to and from the physical address
Yes I know. I'll try to explain.Orrexon wrote:(which I would get from PMM?) this is the most difficult part for me to understand.
then, to use it in some process you would add an extra layer, "xmalloc" which is basically an api request (to VMM or PMM?), especially from user space.
- your application / kernel calls malloc
- the malloc implementation is a library, that keeps track of free memory in any arbitrary sizes. It tries to solve the request on it's own if possible.
- when it runs out of free space, it calls the VMM and asks for a new free page. This is typically done via a syscall, like brk() or mmap() (but could be a direct function call in case of a kernel-malloc)
- then the VMM asks for a free page from the PMM, and maps it for the app or the kernel (by updating the newly allocated page's physical address in the process' page tables and flushing the MMU cache)
- if the PMM can't find any free pages, it either a) crashes the system b) prints "Out of RAM" and then crashes c) uses some very complicated way to figure out which pages are less needed, writes those to disk, making space in memory.
Believe me, for a game, choose jemalloc.Orrexon wrote:I need to read more about these as well.
Oh, I moved from github to gitlab a few years ago. Here it is: https://gitlab.com/bztsrc/bztalloc (my allocator is a compromise between complexity and efficientcy, lot better than dlmalloc, but worse than jemalloc. In return easily portable and small.)Orrexon wrote:Unfortuantely, I could not see it I got a 404-error page not found
Cheers,
bzt
Re: Rpi 4, buddy allocator, MMU
Ok I am still trying to get this thing to work I am not a quitter
I need to know if I have understood this correctly.
This code comes from setting up the MMU: (using BZT's code, but I have replaced the physical addresses that I use for the mini-uart in my rpi4 which work when I access them directly without problems. I have double and triple checked and compared those addresses)
MMIO_BASE in the rpi3: 0x3F000000
with the offset: 0x00201000
The one I use as base: 0xFE000000
have also tried legacy: 0x7E000000
both with with offset: 0x00215000
Am I correct to assume that this code is what maps the address to the virtual address accessed in the main function later?:
(here I also tried to adjust the offsets to the same as I would have when I read and write to the physical address, offsets 0x40 and 0x54 IO register and LSR register respectively)
If I am correct, then the actual value being set in that element of the "paging"-array is actually more like MMIO_BASE+00201287 because of the or:ed values. How does that actually work?
bzt or anybody?
Or maybe I have misunderstood the whole thing, please tell me where I am going wrong in that case
I need to know if I have understood this correctly.
This code comes from setting up the MMU: (using BZT's code, but I have replaced the physical addresses that I use for the mini-uart in my rpi4 which work when I access them directly without problems. I have double and triple checked and compared those addresses)
MMIO_BASE in the rpi3: 0x3F000000
with the offset: 0x00201000
The one I use as base: 0xFE000000
have also tried legacy: 0x7E000000
both with with offset: 0x00215000
Code: Select all
// kernel L3
paging[5*512]=(unsigned long)(MMIO_BASE+0x00201000) | // physical address
PT_PAGE | // map 4k
PT_AF | // accessed flag
PT_NX | // no execute
PT_KERNEL | // privileged
PT_OSH | // outter shareable
PT_DEV; // device memory
(here I also tried to adjust the offsets to the same as I would have when I read and write to the physical address, offsets 0x40 and 0x54 IO register and LSR register respectively)
Code: Select all
#define KERNEL_UART0_DR ((volatile unsigned int*)0xFFFFFFFFFFE00000)
#define KERNEL_UART0_FR ((volatile unsigned int*)0xFFFFFFFFFFE00018)
void main()
{
...
...
while(*s) {
/* wait until we can send */
do{asm volatile("nop");}while(*KERNEL_UART0_FR&0x20);
/* write the character to the buffer */
*KERNEL_UART0_DR=*s++;
}
...
...
}
bzt or anybody?
Or maybe I have misunderstood the whole thing, please tell me where I am going wrong in that case
Re: Rpi 4, buddy allocator, MMU
Yeah, they look ok. I'd suggest to use the PL011 chip, that gives you much more control over the serial port.Orrexon wrote:This code comes from setting up the MMU: (using BZT's code, but I have replaced the physical addresses that I use for the mini-uart in my rpi4 which work when I access them directly without problems. I have double and triple checked and compared those addresses)
Correct. The offset is for the PL011 chip on RPi3.Orrexon wrote:MMIO_BASE in the rpi3: 0x3F000000
with the offset: 0x00201000
About the first one, that's right. About the second one, that's the GPU's address for the peripheral on RPi3. Not sure about the RPi4. You should double check the offset too, I think the MMIO relative offsets are the same on RPi3 and RPi4, but I haven't worked with RPi lately so I could remember wrong.Orrexon wrote:The one I use as base: 0xFE000000
have also tried legacy: 0x7E000000
both with with offset: 0x00215000
Yes. The index in the "paging[5*512]" table specifies at which virtual address it's going to be accessible, and the array element's value specifies the physical address and access bits. You always must map device MMIO as outter sharable and nGnRE. (Here PT_DEV is 1 which selects the 2nd attribute, and in mair_el1 that's set as nGnRE). Here the paging table is set up in a way that index 5*512 maps to 0xFFFFFFFFFFE00000Orrexon wrote:Am I correct to assume that this code is what maps the address to the virtual address accessed in the main function later?:Code: Select all
// kernel L3 paging[5*512]=(unsigned long)(MMIO_BASE+0x00201000) | // physical address PT_PAGE | // map 4k PT_AF | // accessed flag PT_NX | // no execute PT_KERNEL | // privileged PT_OSH | // outter shareable PT_DEV; // device memory
Correct. I'd recommend to read DDI0487 ARM spec on how the paging table's bits are, but basically you always use a physical address that's least significant bits are zero, and most significant bits doesn't count, so that's where the ARM engineers put the access control bits. On the figure above, these are bits "Sign extend" and "Physical-Page Offset". Those bits are not needed for a page aligned physical address ("PA" inside the tables in the figure). The point is, if you mask the entry to clear the access control bits, you get a pure physical address.Orrexon wrote:If I am correct, then the actual value being set in that element of the "paging"-array is actually more like MMIO_BASE+00201287 because of the or:ed values. How does that actually work?
This pdf is also useful, lists the page table bits (section 4.5 Memory attributes). Much easier to read than the ARM spec, however it's not that detailed. Cheers,
bzt