Page 1 of 1

[Solved] Optimization causing non functioning function.

Posted: Mon Oct 22, 2018 4:52 pm
by Oxmose
Hi everyone,

before we start, yes I know, that's not the GCC optimizations that makes my kernel crash but badly written code. So I figured out which portion of code does not behave correctly in my code and I found out that setting optimizations for the following function makes it work uncorrectly:

Code: Select all

#pragma GCC push_options
#pragma GCC optimize ("O1")

OS_RETURN_E kernel_direct_mmap(const void* virt_addr, const void* phys_addr,
                               const uint32_t mapping_size,
                               const uint16_t flags,
                               const uint16_t allow_remap)
{
    uint32_t    pgdir_entry;
    uint32_t    pgtable_entry;
    uint32_t*   page_table;
    uint32_t*   page_entry;
    uint32_t    end_map;
    uint32_t    i;
    uint32_t*   current_pgdir;
    uint32_t*   new_frame;
    OS_RETURN_E err;

    #if PAGING_KERNEL_DEBUG == 1
    uint32_t virt_save;
    #endif

    if(init == 0)
    {
        return OS_ERR_PAGING_NOT_INIT;
    }

    current_pgdir = (uint32_t*)&kernel_current_pgdir;

    /* Get end mapping addr */
    end_map = (uint32_t)virt_addr + mapping_size;

    #if PAGING_KERNEL_DEBUG == 1
    kernel_serial_debug("Mapping (before align) 0x%08x, to 0x%08x (%d bytes)\n",
                        virt_addr, phys_addr, mapping_size);
    #endif

    /* Align addr */
    virt_addr = (uint8_t*)((uint32_t)virt_addr & 0xFFFFF000);
    phys_addr = (uint8_t*)((uint32_t)phys_addr & 0xFFFFF000);

    #if PAGING_KERNEL_DEBUG == 1
    kernel_serial_debug("Mapping (after align) 0x%08x, to 0x%08x (%d bytes)\n",
                        virt_addr, phys_addr, mapping_size);
    virt_save = (uint32_t)virt_addr;
    #endif

    /* Map all pages needed */
    while((uint32_t)virt_addr < end_map)
    {
        /* Get PGDIR entry */
        pgdir_entry = (((uint32_t)virt_addr) >> 22);
        /* Get PGTABLE entry */
        pgtable_entry = (((uint32_t)virt_addr) >> 12) & 0x03FF;

        /* If page table not present create it */
        if((current_pgdir[pgdir_entry] & PG_DIR_FLAG_PAGE_PRESENT) !=
           PG_DIR_FLAG_PAGE_PRESENT)
        {
            new_frame = kernel_paging_alloc_frames(1, &err);

            if(new_frame == NULL)
            {
                break;
            }

            page_table = map_pgtable(new_frame);

            for(i = 0; i < 1024; ++i)
            {
                page_table[i] = PAGE_FLAG_SUPER_ACCESS |
                                PAGE_FLAG_READ_ONLY |
                                PAGE_FLAG_NOT_PRESENT;
            }

            current_pgdir[pgdir_entry] = (uint32_t)new_frame |
                                         PG_DIR_FLAG_PAGE_SIZE_4KB |
                                         PG_DIR_FLAG_PAGE_SUPER_ACCESS |
                                         PG_DIR_FLAG_PAGE_READ_WRITE |
                                         PG_DIR_FLAG_PAGE_PRESENT;
        }

        /* Map the address */
        page_table = (uint32_t*)
                     ((uint32_t)current_pgdir[pgdir_entry] & 0xFFFFF000);
        page_table = map_pgtable(page_table);
        page_entry = &page_table[pgtable_entry];

        /* Check if already mapped */
        if((*page_entry & PAGE_FLAG_PRESENT) == PAGE_FLAG_PRESENT &&
           allow_remap == 0)
        {
            #if PAGING_KERNEL_DEBUG == 1
            kernel_serial_debug("Mapping (after align) 0x%08x, to 0x%08x "
                                "(%d bytes) Already mapped)\n",
                                virt_addr, phys_addr, mapping_size);
            virt_save = (uint32_t)virt_addr;
            #endif

            return OS_ERR_MAPPING_ALREADY_EXISTS;
        }

        *page_entry = (uint32_t)phys_addr |
                      flags |
                      PAGE_FLAG_PRESENT;
        #if PAGING_KERNEL_DEBUG == 1
        kernel_serial_debug("Mapped 0x%08x -> 0x%08x\n", virt_addr, phys_addr);
        #endif
        virt_addr = (uint8_t*)virt_addr + KERNEL_PAGE_SIZE;
        phys_addr = (uint8_t*)phys_addr + KERNEL_PAGE_SIZE;
    }

    invalidate_tlb();

    return OS_NO_ERR;
}
#pragma GCC pop_options
When -O2 is enabled, the function will not map the pages anymore page faults will occur. Without optimization or with -O1, everything works fine.

So my question is: can you help me find what causes the function to stop working when optimizations are turned on? I there a way to turn on optimizations one by one (and not by level) to find which one en-light the bad behavior of the core?

Re: Optimization causing non functioning function.

Posted: Mon Oct 22, 2018 6:42 pm
by alexfru
It’s all about instruction reordering. There are two: in the compiler and in the CPU.
The compiler has no idea that changing PDEs has effect on access to PTEs and may reorder the two. The CPU may also reorder some reads/writes ahead of others (I forget which, though).
So, I’d sprinkle around volatiles and memory barriers.

Re: Optimization causing non functioning function.

Posted: Mon Oct 22, 2018 10:13 pm
by nullplan
@alexfru: Ah yes, the good old "volatile is like violence" approach (if it doesn't work, you're not using enough of it). But in this case that can't be it: The assignment to the page table depends on the assignment to the page directory above it. Therefore the compiler cannot reorder the two.

Where do these page faults happen, in this function or at the place where the page mappings are used? The OP's comments seem to indicate the latter, in which case we are probably looking at an optimized out write... which the compiler only does if it can prove the write to be dead (overwritten before the next read). We may need to see map_pgtable() before we can make a judgement.

Re: Optimization causing non functioning function.

Posted: Tue Oct 23, 2018 4:16 pm
by Oxmose
Thanks for your replies!

The page fault occurs later in the code when I'm accessing an address that was supposed to be mapped by this function.

Here is the map_pgtable function:

Code: Select all

static void* map_pgtable(void* pgtable_addr)
{
    uint32_t  pgdir_entry;
    uint32_t  pgtable_entry;
    uint32_t* current_pgdir;
    uint32_t* pgtable;

    /* Get PGDIR entry */
    pgdir_entry = (((uint32_t)&kernel_current_pgtable) >> 22);
    /* Get PGTABLE entry */
    pgtable_entry = (((uint32_t)&kernel_current_pgtable) >> 12) & 0x03FF;

    /* This should always be mapped */
    current_pgdir = (uint32_t*)&kernel_current_pgdir;
    pgtable       = (uint32_t*)current_pgdir[pgdir_entry];


    /* Update the pgtable mapping to virtual */
    pgtable = (uint32_t*)(((uint32_t)pgtable + KERNEL_MEM_OFFSET) & 0xFFFFF000);
    pgtable[pgtable_entry] = (uint32_t)pgtable_addr |
                            PAGE_FLAG_SUPER_ACCESS |
                            PAGE_FLAG_READ_WRITE |
                            PAGE_FLAG_PRESENT;

    invalidate_tlb();

    return ((void*)&kernel_current_pgtable);
}
And the whole code is accessible here:
https://github.com/Oxmose/RTLKIM/blob/f ... y/paging.c

Re: Optimization causing non functioning function.

Posted: Tue Oct 23, 2018 5:15 pm
by Oxmose
here is my panic screen:

Code: Select all

#=============================    KERNEL PANIC    ============================#
|                                                                             |
| Reason: Page fault                              INT ID: 0x0e                |
| Instruction [EIP]: 0xe010872f                   Error code: 0x00000000      |
|                                                                             |
|================================= CPU STATE =================================|
|                                                                             |
| EAX: 0x00000000  |  EBX: 0x03fe1412  |  ECX: 0xe0502000  |  EDX: 0x00000000 |
| ESI: 0xe00f58c4  |  EDI: 0xe0127000  |  EBP: 0x00000000  |  ESP: 0xe01330bc |
| CR0: 0x80010011  |  CR2: 0x03fe1416  |  CR3: 0x00128000  |  CR4: 0x00000010 |
| EFLAGS: 0x00000046  |                                                       |
|                                                                             |
|============================= SEGMENT REGISTERS =============================|
|                                                                             |
| CS: 0x0008  |  DS: 0x0010  |  SS: 0x0010                                    |
| ES: 0x0010  |  FS: 0x0010  |  GS: 0x0010                                    |
|                                                                             |
|============================== ADDITIONAL INFO ==============================|
|                                                                             |
| Core ID: 0x00000000                                                         |
| Thread:  000000000                                                          |
| Inst:    8b 43 04 39 (Address: 0xe010872f)                                  |
|                                                                             |
|                         LET'S HOPE IT WON'T EXPLODE                         |
#=============================================================================#
I just checked the error code, it is set to 0...
The address in CR2 is a pointer to the ACPI RSDT that should have been mapped by a previous kernel_mmap call.

Re: Optimization causing non functioning function.

Posted: Tue Oct 23, 2018 7:12 pm
by MichaelPetch
If higher optimizations make your code fail then the first place I'd look is at inline assembly. I'm going to assume that you are new to inline assembly. First off multiple successive volatile asm blocks are not guaranteed to be emitted in the order they appear. Although likely they will be you should never assume they are. You can place more than one instruction in an asm block. Don't set up stack frames to make asm blocks work. IMPORTANT: If you modify a general purpose register you need to tell the compiler it was modified.

First function in paging.c with inline assembly gives an example of clobbering a register without telling the compiler:

Code: Select all

__inline__ static void invalidate_tlb(void)
{
    /* Invalidate the TLB */
    __asm__ __volatile__("movl	%cr3,%eax");
	__asm__ __volatile__("movl	%eax,%cr3");
}
could be written as:

Code: Select all

__inline__ static void invalidate_tlb(void)
{
    int temp;

    /* Invalidate the TLB */
    __asm__ __volatile__("movl  %%cr3, %0\n\t"
                         "movl  %0, %%cr3" : "=r"(temp));
}
This version has the compiler choose the register and use an =r constraint to tell the compiler that the value in the chosen register will be overwritten.

I haven't looked beyond this function, but I suspect you are making similar mistakes elsewhere in your code base. If you are new to inline assembly and don't understand it I recommend moving such code into separate assembly module where you don't need to be concerned about the nuances and complexities of GCC's inline assembly.

Re: Optimization causing non functioning function.

Posted: Tue Oct 23, 2018 7:24 pm
by Oxmose
You are definitely right, I avoided learning inline assembly for years but I think it is time to go for it now... I'm used to write plain assembly but I never got the courage to learn how to write proper inline assembly code.
I'm going to check all this and fix the inline assembly before investigating further on the issue.

Re: Optimization causing non functioning function.

Posted: Tue Oct 23, 2018 7:36 pm
by Oxmose
Well, well, well what to say except thank you :) It works now. That will teach me to avoid to learn something obviously mandatory to learn...

Re: Optimization causing non functioning function.

Posted: Tue Oct 23, 2018 8:47 pm
by nullplan
Oxmose wrote:You are definitely right, I avoided learning inline assembly for years but I think it is time to go for it now... I'm used to write plain assembly but I never got the courage to learn how to write proper inline assembly code.
I'm going to check all this and fix the inline assembly before investigating further on the issue.
You know, you can also just rid the code of all inline assembly in favor of external functions. There is almost never a reason to use it, except maybe speed. Don't know if that is important in invalidate_tlb(), which will kill speed all on its own due to its function.