Page 1 of 1

ARM MMU Implementation (Raspberry Pi)

Posted: Tue Nov 17, 2020 7:27 am
by vhaudiquet
Hi everyone ! I'm currently trying to implement "virtual memory" on my kernel targetting raspberry pi zero.
I made a kernel for x86 before, and i'm trying to understand how paging works on ARM. (ARMv6 !!)
Basically, what i understood is :
- There is one "page directory" equivalent, the level 0 base translation table, which has a size of 16KiB, and contains either pointers to "page tables" i.e. level 1 tables containing either 4KB (small) or 64KB (large) pages, or direct "sections" mapping of size 1MiB (or supersections, 16MiB).
- Given the size of the base translation table, i can address up to 4GiB virtual memory with 1MiB section mapping, which is more than enough for me as the pi as only 512 MiB RAM, and not that much memory mapped stuff.
- The translation of a virtual address on a section mapping is done using the first 12 bits to seek for section in base translation table
- Section descriptors constains BADDR physical address 1MiB-aligned of the section

Correct me if i'm wrong, but assuming i'm right :
- performance-wise, does it make sense to map memory using only 1MiB sections ?
I believe they are much faster to resolve because of only 1 level of indirection, and with 512 MiB of RAM, i think i can afford to map memory for the kernel/processes 1 MiB at a time...
Am i gonna need 4KB pages ?
- i believe i should use TTBR1 to map kernel memory, for context switch optimisation, and that i need to specify how high i want the kernel to be, and high adresses will use TTBR1 to translate, while low will use TTBR0. How do i identity map my code before jumping higher half then ?

Re: ARM MMU Implementation (Raspberry Pi)

Posted: Tue Nov 17, 2020 8:19 am
by bzt
valou3433 wrote:Hi everyone ! I'm currently trying to implement "virtual memory" on my kernel targetting raspberry pi zero.
I made a kernel for x86 before, and i'm trying to understand how paging works on ARM. (ARMv6 !!)
Here's a working ARMv6 example with plenty of useful explanations and description.
I also recommend ARM trusted firmware. It is a more complex code with ARMV7 and ARMv8 support to, but it is very well written, full of comments. It is also the most authentic ARM source we have, everything on ARM supposed to work the way as it works in that source.
valou3433 wrote:- There is one "page directory" equivalent, the level 0 base translation table, which has a size of 16KiB, and contains either pointers to "page tables" i.e. level 1 tables containing either 4KB (small) or 64KB (large) pages, or direct "sections" mapping of size 1MiB (or supersections, 16MiB).
Configurable through system registers. ARM's paging is much more flexible than x86's.
valou3433 wrote:- performance-wise, does it make sense to map memory using only 1MiB sections ?
Yes, but if two process allocates 1k, then you must use two separate 1M sections, meaning you'll waste 99.999% of the RAM.
valou3433 wrote:Am i gonna need 4KB pages ?
If you don't want to run out of RAM pretty quickly, then probably yes.
valou3433 wrote:- i believe i should use TTBR1 to map kernel memory, for context switch optimisation, and that i need to specify how high i want the kernel to be, and high adresses will use TTBR1 to translate, while low will use TTBR0. How do i identity map my code before jumping higher half then ?
Can you elaborate? I don't understand your question. TTBR0 is used to translate pages in lower-half (upper bits of the address 0), while TTBR1 is used for higher-half (upper bits of the address 1). It is a common practice and recommended that you set up lower-half for userspace and higher-half for kernel, but not mandatory. For identity mapping you simply map virtual to physical addresses in TTBR0. Obviously you cannot identity map with TTBR1.

Your loader will be loaded at 0x8000. You map that with TTBR0 using identity mapping, and map your kernel code in higher-half with TTBR1. When the mapping is okay, you jump to an address in higher-half.

Cheers,
bzt

Re: ARM MMU Implementation (Raspberry Pi)

Posted: Tue Nov 17, 2020 10:33 am
by vhaudiquet
Thank you for all these ressources !

I'm not planning to have more than 3 processes running, i'm not even sure to implement multitasking, so the memory issue should be ok...
Even if i wanted multitasking, i could use 1 MiB section map and 4KB page map at the same time, i believe ?

As for identity mapping the code : i will need 2 "base table", 1 for TTBR1, the kernel table, which will be statically allocated in the code, and 1 for TTBR0, temporary... Where do i put that ? I can't allocate it statically as it would waste space, i don't have a heap yet as the kernel is not loaded, and i don't have a stack yet for the same reason...
I'm a little bit lost...
Should i do it statically anyway ?
Should i identity map using only TTBR0, and then once in higher half change paging again by clearing lower mapping from table, setting table address in TTBR1, and enabling TTBR1 mapping ?

Re: ARM MMU Implementation (Raspberry Pi)

Posted: Tue Nov 17, 2020 1:08 pm
by bzt
valou3433 wrote:Thank you for all these ressources !
You're welcome!
valou3433 wrote:I'm not planning to have more than 3 processes running, i'm not even sure to implement multitasking, so the memory issue should be ok...
Even if i wanted multitasking, i could use 1 MiB section map and 4KB page map at the same time, i believe ?
Yes, you can mix those any way you want.
valou3433 wrote:As for identity mapping the code : i will need 2 "base table", 1 for TTBR1, the kernel table, which will be statically allocated in the code, and 1 for TTBR0, temporary... Where do i put that ?
You don't need to statically allocate neither of those.
valou3433 wrote:I can't allocate it statically as it would waste space, i don't have a heap yet as the kernel is not loaded, and i don't have a stack yet for the same reason...
I'm a little bit lost...
Okay, you do have a stack, you just haven't initialized it yet. I suggest to put it below the 0x8000 loader's area, that way it wouldn't interfere with the rest of your code (as it grows downwards).
As for the heap, you don't need that either. It's enough if you put it after the bss section, which is guaranteed to be free. This is what I do in my ARM tutorial (it's for ARMv8 though, you'll need to fill up the paging table with different values).
With my boot loader, I simply "allocate" it in the bss segment using the linker script, that works too (note that I've put it in the "NOLOAD" section meaning there's no static allocation in the file, the compiled executable is small). I did this because it's loading the kernel after the bss' end, so that's the physical memory I'm going to map into higher-half.
valou3433 wrote:Should i identity map using only TTBR0, and then once in higher half change paging again by clearing lower mapping from table, setting table address in TTBR1, and enabling TTBR1 mapping ?
Yes, you should set identity map with TTBR0. You don't need to map the paging tables though, it's enough to map your loader's text and data segments. You should also map your kernel using TTBR1 in higher-half. Then when your kernel is running (in higher-half) it's free to change TTBR0 to map a user-space task (not using identity mapping). Then whenever a context switch happens, you load a new table into TTBR0, but leave TTBR1 as-is.

This is exactly what my boot loader does. I've linked the part where I set up the paging tables. It's not static, because it also depends on configuration, so I have a small code that fills up the table (of course this is for ARMv8, but you get the idea). I use the same approach: TTBR0 is loaded with an identity mapping, while I use TTBR1 to map the loaded kernel into higher-half. Because I load the kernel from a file, I don't know in advance how much memory I need to map, that's determined in run-time (see "core.size" in line 1754).

Oh, one more very important thing: before you enable the MMU, and you directly use physical memory, all memory accesses must be naturally aligned. This means 4 byte ints must be on 4 byte boundaries, otherwise you'll get faults (CPU exceptions). Once you've enabled the MMU, this doesn't matter any more and you'll be able to read/write unaligned addresses. This matters if you want to interpret the BPB structure in the boot sector because BPB has unaligned fields.

Cheers,
bzt

Re: ARM MMU Implementation (Raspberry Pi)

Posted: Tue Nov 17, 2020 4:12 pm
by vhaudiquet
Thank you for everything !
I successfully implemented MMU activation in assembly, and it works well if i keep everything identity mapped.
Now, i have one last question : how are addresses translated when using both TTBR0 and TTBR1 ?
As the index in the table is normally used to know which part of virtual memory we're in, does the table in TTBR1 maps 0x0 to say 0x40000000 with n = 2 or to 0x0, and is invalid, and must be empty until 0x400 ? Do i need to set parameters for size of TTBR1 ? I'm a little bit confused between new versions of ARM which supports long paging modes and have these "T0SZ" "T1SZ" parameters, and the ARM ARM i have that only explains address resolution for 1 table, but ommits the case when both TTBR0 and TTBR1 are in use...

(in my code, i can't jump to higher half, i think it generates an exception... i don't really know why, but i assume it is because i missmapped memory, and i put the section descriptor at index 0x400 in the table, as i wanted it to be mapped to 0x40000000, but when i jump to my function address + 0x40000000 it crashes, and when i put the descriptor at address 0x0 in the table it does the same...)

Re: ARM MMU Implementation (Raspberry Pi)

Posted: Tue Nov 17, 2020 6:19 pm
by bzt
valou3433 wrote:Thank you for everything !
I successfully implemented MMU activation in assembly, and it works well if i keep everything identity mapped.
Well done!
valou3433 wrote:Now, i have one last question : how are addresses translated when using both TTBR0 and TTBR1 ?
As I've said: for virtual addresses that have the upper bits cleared, TTBR0 is used. For those that have the upper bits set, the MMU uses TTBR1. For example: 0x00008000 is looked up using TTBR0, and 0xFFFF8000 using TTBR1. Otherwise the tables they are pointing to are identical.
valou3433 wrote:As the index in the table is normally used to know which part of virtual memory we're in, does the table in TTBR1 maps 0x0 to say 0x40000000 with n = 2 or to 0x0, and is invalid, and must be empty until 0x400 ? Do i need to set parameters for size of TTBR1 ? I'm a little bit confused between new versions of ARM which supports long paging modes and have these "T0SZ" "T1SZ" parameters, and the ARM ARM i have that only explains address resolution for 1 table, but ommits the case when both TTBR0 and TTBR1 are in use...
Take a look at the first link. It describes this in great detail, and read the official ARMv5 specification (which contains ARMv6 too).
valou3433 wrote:(in my code, i can't jump to higher half, i think it generates an exception... i don't really know why, but i assume it is because i missmapped memory, and i put the section descriptor at index 0x400 in the table, as i wanted it to be mapped to 0x40000000, but when i jump to my function address + 0x40000000 it crashes, and when i put the descriptor at address 0x0 in the table it does the same...)
Check the FSR (Fault Status Register) and FAR (Faulting Address Register), and decode their bits. That will tell you exactly what and where the problem is. You can find the detailed per bit description in the spec. A simple exception handler that dumps the registers (like this) is pretty useful too to locate the problem.

Cheers,
bzt