Page 1 of 2

disabling compiler optimisation

Posted: Fri Aug 22, 2014 2:02 am
by choco
Hi,

As many of you, I am starting to figure out how OS works and trying to build my own one.
For now, I am working on IDT. I spend a lot of time trying to understand why it doesn't work and finally, I notice that compiling with -O1 instead of -O2 make it works. (I am using cross compiled gcc)
In fact, I really want compiler optimization (I would like a fast OS :-D) so I tried to understand why it doesn't work and I figure out that the tail call optimization in my isr_handler is the only difference between a working OS and a not working one. This happen as the last statement of my isr_handler is a "print" statement.

As I am writing C++, I wondered if my printing class (encapsulated buffer and cursor position in a singleton) was the cause (It use a common buffer pointer in a "not standard" branch (the interrupt branch)). After some tests, I would say no but may be I don't correctly understand this part.

Also, I tried to change my printing function and notice that if I add dummy instruction to disable tail call optimization in inner function it does work too. Until I reach the cursor update function.

Code: Select all

uint8_t inb(uint16_t port)
{
    uint8_t rega;
    asm volatile("inb %1,%0" : "=a" (rega) : "Nd" (port));
    return rega;
}

void outb(uint8_t value, uint16_t port)
{
    asm volatile("outb %0, %1" : : "a" (value), "Nd" (port));
}

Code: Select all

void Terminal::update_cursor()                                                                                     
{                                                                               
    uint16_t position=(row * VGA_WIDTH) + column;                               
    // cursor HIGH port to vga INDEX register                                   
    outb(BYTE_POS::HIGH, VGA_PORT::CURSOR_BYTE_POS);                            
    outb((uint8_t)((position>>8) & 0xFF), VGA_PORT::CURSOR_POS);                
                                                                              
    // cursor LOW port to vga INDEX register                                  
    outb(BYTE_POS::LOW, VGA_PORT::CURSOR_BYTE_POS);                           
    outb((uint8_t)(position & 0xFF), VGA_PORT::CURSOR_POS);                   
}  
Thanks you for your help.
If you need more information, please ask.

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 2:31 am
by alexfru
If your isr_handler is a C(++) function whose address you put into the IDT and it has a bunch of inline assembly code in it, you've likely outsmarted yourself. If it's not that, where's the code? How can we tell what's wrong without seeing the code? Do you believe in mind reading? :) But before going there, have you researched the problem, have you read posts on similar topics? Presumably the FAQ and the tutorials would have a few words to say about it too?

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 2:54 am
by choco
Sorry for not being clear enough.

I read many things about it but it is always about non working IDT. Not about compiler optimization issue. If I block TCO optimization at the end of isr_handler, everything works as I expected.

The only assembly code called by the isr_handler (the C(++) function) is the cursor update position. Could it be enough to break everything?

Just in case (to avoid mind reading ;-) ):

Code: Select all

                                       
#define ADD_ENTRY(i)\                                                              
    add_entry(i, (uint32_t)isr##i , 0x08, F_BASE | RING_0);

void IDT::add_entry(size_t id, uint32_t isr, uint16_t selector, uint8_t flags)  
{                                                                                  
    m_entries[id].base_lo = isr & 0xFFFF;                                          
    m_entries[id].base_hi = (isr >> 16) & 0xFFFF;                                  
                                                                                   
    m_entries[id].sel     = selector;                                              
    m_entries[id].always0 = 0;                                                     
    m_entries[id].flags   = flags;                                    
} 

IDT::init()                                                                         
{                                                                                  
    m_ptr.limit = sizeof(entry_t) * NB_ENTRIES -1;                                 
    m_ptr.base  = (uint32_t)&m_entries;                                            
                                                                                   
    memset(&m_entries, 0, sizeof(entry_t) * NB_ENTRIES);                           
                                                                                   
    ADD_ENTRY(0);                                                                  
    ADD_ENTRY(1);                                                                  
    ADD_ENTRY(2);                                                                  
    ADD_ENTRY(3);                                                                  
    ADD_ENTRY(4);                                                                  
    ADD_ENTRY(5);                                                                  
    ADD_ENTRY(6);                                                                  
    ADD_ENTRY(7);                                                                  
    ADD_ENTRY(8);                                                                  
    ADD_ENTRY(9);                                                                  
    ADD_ENTRY(10);                                                                 
    ADD_ENTRY(11);                                                                 
    ADD_ENTRY(12);                                                                 
    ADD_ENTRY(13);                                                                 
    ADD_ENTRY(14);                                                                 
    ADD_ENTRY(15);                                                                 
    ADD_ENTRY(16);                                                                 
    ADD_ENTRY(17);                                                                 
    ADD_ENTRY(18);                                                                 
    ADD_ENTRY(19);                                                                 
    ADD_ENTRY(20);                                                                 
    ADD_ENTRY(21);                                                                 
    ADD_ENTRY(22);                                                                 
    ADD_ENTRY(23);                                                                 
    ADD_ENTRY(24);                                                                 
    ADD_ENTRY(25);                                                                 
    ADD_ENTRY(26);                                                                 
    ADD_ENTRY(27);                                                                 
    ADD_ENTRY(28);                                                                 
    ADD_ENTRY(29);                                                                 
    ADD_ENTRY(30);                                                                 
    ADD_ENTRY(31);                                                                 
                                                                                   
    idt_flush((uint32_t)&m_ptr);                                                   
}

Code: Select all

.section .text                                                                     
.global idt_flush                                                                  
.type idt_flush, @function                                                         
                                                                                   
idt_flush:                                                                         
    movl 4(%esp), %eax  # Get the pointer to the GDT, passed as a parameter.       
    lidt (%eax)        # Load the new GDT pointer                                  
    ret
And now the handlers:

Code: Select all

.macro isr_noerrorcode num                                                         
.global isr\num                                                                    
.type isr\num, @function                                                           
isr\num:                                                                           
    cli                                                                            
    push $0                                                                        
    push $\num                                                                     
    jmp isr_common_stub                                                            
.endm                                                                              
                                                                                   
.macro isr_errorcode num                                                           
.global isr\num                                                                    
.type isr\num, @function                                                           
isr\num:                                                                           
    cli                                                                            
    push $\num                                                                     
    jmp isr_common_stub                                                            
.endm                                                                              
                                                                                   
                                                                                   
.section .text                                                                     
.extern isr_handler                                                                
isr_noerrorcode 0                                                                  
isr_noerrorcode 1                                                                  
isr_noerrorcode 2                                                                  
isr_noerrorcode 3                                                                  
isr_noerrorcode 4                                                                  
isr_noerrorcode 5                                                                  
isr_noerrorcode 6                                                                  
isr_noerrorcode 7                                                                  
isr_errorcode 8                                                                    
isr_noerrorcode 9                                                                  
isr_errorcode 10                                                                   
isr_errorcode 11                                                                   
isr_errorcode 12                                                                   
isr_errorcode 13                                                                   
isr_errorcode 14                                                                   
isr_noerrorcode 15                                                                 
isr_noerrorcode 16                                                                 
isr_noerrorcode 17                                                                 
isr_noerrorcode 18                                                                 
isr_noerrorcode 19                                                                 
isr_noerrorcode 20                                                                 
isr_noerrorcode 21                                                                 
isr_noerrorcode 22                                                                 
isr_noerrorcode 23                                                                 
isr_noerrorcode 24                                                                 
isr_noerrorcode 25                                                                 
isr_noerrorcode 26                                                                 
isr_noerrorcode 27                                                                 
isr_noerrorcode 28                                                                 
isr_noerrorcode 29                                                                 
isr_noerrorcode 30                                                                 
isr_noerrorcode 31                                                                 
                                                                                   
.global isr_common_stub                                                            
.type isr_common_stub, @function                                                   
isr_common_stub:                                                                                        
    pusha
    movw %ds, %ax               # Lower 16-bits of eax = ds.                    
    pushl %eax                 # save the data segment descriptor               
                                                                                
    movw $0x10, %ax  # load the kernel data segment descriptor                  
    movw %ax, %ds                                                               
    movw %ax, %es                                                               
    movw %ax, %fs                                                               
    movw %ax, %gs                                                               
                                                                                
    call isr_handler                                                            
                                                                                
    popl %eax        # reload the original data segment descriptor              
    movw %ax, %ds                                                               
    movw %ax, %es                                                               
    movw %ax, %fs                                                               
    movw %ax, %gs                                                               
                                                                                
    popa                         
    addl $8, %esp     # Cleans up the pushed error code and pushed ISR number   
    sti                                                                         
    iret           # return from interrupt

and the isr_handler function:

Code: Select all

void isr_handler(registers_t regs)                                              
{                     
    Terminal::write((char)('0' + regs.int_no));                      
}
Also, with the Tail call optimization, about 40 GPF interrupt happen.

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 5:21 am
by alexfru
There are several things worth taking a closer look:
- Is the ISR type properly indicated as an 80386 interrupt gate? You want 80386 and you want an interrupt, not trap, gate normally as that would automatically disable interrupts upon entry into the ISR.
- Hence, you don't need CLI for interrupt gates.
- STI before IRET may cause a stack corruption or overflow if things go really bad. Or it may cause unwanted ISR preemption. You better leave STI's job to IRET. Also, I've read that modern gcc may access locals at addresses just below the current ESP value, meaning that ESP does not protect them and that such code, when interrupted, may get its local variables trashed. This is known as the red zone and is probably something to worry about it 64-bit code, but I'd double check and try to make sure that can't happen.
- EOIs should be sent to the PIC in IRQ ISRs. Is this particular ISR invoked by an IRQ or via an int instruction?
- If your code uses user mode (ring 3), are there proper SS0:ESP0 in the TSS?
- Your code doesn't properly save/restore ES, FS and GS, assuming they'd always be the same as DS. I'd not do that.
- But most importantly, you're passing registers by value (as a structure) to the ISR. There might be alignment issues inside the structure, there may be alignment issues with the whole structure. I'd make sure the structure content is properly aligned and I'd pass the structure by reference (er, pointer) to avoid odd things w.r.t. structure passing and accessing. All you need to do is:

Code: Select all

...
pushl %esp
call isr_handler
popl %eax
...
void isr_handler(registers_t* regs)
{
    Terminal::write((char)('0' + regs->int_no));
}
There are other things to look at, but start with these.

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 10:01 am
by Wajideu
Another possible problem is your declaration of the structure for the array "m_entries". The options "-O and "-O1" optimize for size instead of speed. By default, I believe GCC compiles with "-O3" which optimizes for both speed and size. When optimizing for speed, sometimes the elements in a structure are padded. A workaround for this would be to declare the structure with the modifier "__attribute__((packed))"

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 10:31 am
by Owen
DaemonR wrote:Another possible problem is your declaration of the structure for the array "m_entries".
Nope
DaemonR wrote:The options "-O and "-O1" optimize for size instead of speed.
Nope (That's -Os)
DaemonR wrote: By default, I believe GCC compiles with "-O3" which optimizes for both speed and size.
Nope (the default is -O0)
DaemonR wrote: When optimizing for speed, sometimes the elements in a structure are padded.
Padding is specified by the ABI, and unaffected by optimization flags (else you would have trouble whenever optimized and unoptimized objects were linked together)
DaemonR wrote: A workaround for this would be to declare the structure with the modifier "__attribute__((packed))"
We finally encountered the one correct portion of this post (if we had a padding issue, which we don't)

---

To the OP: See the known bugs list with JamesM's tutorial

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 10:36 am
by Wajideu
Well, it seems like I misunderstood the documentation a bit. You learn new things everyday I guess.

Regardless, telling him to look for problems in his ISR's is a complete waste of time as well. He already stated the code works if he compiles with -O1, so it's obviously an optimization issue

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 10:38 am
by Combuster
It's obviously a code issue if optimisations break anything.

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 12:35 pm
by Gigasoft
Seriously? How many times do I have to see this same awful joke of an ISR popping up in every new guy's introduction post? Whoever is hosting the webpage it originally comes from, I'd appreciate it if you would take it down. The cdecl calling convention does NOT work that way. The area where function parameters are passed is for input only. When the function returns, its contents are useless and must be discarded. You can not expect it to retain its original content. I assume that this is why it crashes. This masterpiece also has these funny random instructions that load 0x10 into FS and GS for no reason at all, and then later on it loads them with the original value of DS. And people keep pasting this into every single OS, not having a clue about what it is for.

Also, an exception handler should probably do something to deal with the exception, instead of just returning. Otherwise, the same exception will happen again.

Re: disabling compiler optimisation

Posted: Fri Aug 22, 2014 5:53 pm
by Wajideu
Gigasoft wrote:Seriously? How many times do I have to see this same awful joke of an ISR popping up in every new guy's introduction post? Whoever is hosting the webpage it originally comes from, I'd appreciate it if you would take it down.
Grab your torch and pitchforks! :evil:


Joking aside, you'll find everything you need in the Intel Architectures Software Developer Manual.

In particular, section 6.10 labelled "INTERRUPT DESCRIPTOR TABLE (IDT)" will give you pretty much all the info you need to know.

Here is a simple description of the cdecl calling convention in case you need that too.

Re: disabling compiler optimisation

Posted: Mon Aug 25, 2014 2:02 am
by choco
Thanks everyone for your help and many pointers.

Alexfru :
- the ISR was a correct interrupt gate
- I removed the unneeded cli/sti (behavior changed on crash, but doesn't solve the problem)
- I had this issue before enabling IRQ so I didn't go futher while it was not solved
- The code was in kernel mode (ring 0)
- I didn't really look at register segment saving. What is the best practice? Pushing all of them one after the other? It is not to much costly to do this?
- You are right, the issue was because of this data structure.

Owen: Thanks for the pointer on JamesM's tutorial issue. It is one of my sources and I will care about all of these possible problem.

Re: disabling compiler optimisation

Posted: Mon Aug 25, 2014 2:41 am
by alexfru
Yes, push and pop the segment registers one by one. Interrupts aren't very frequent and you shouldn't be worrying about this segment register saving and restoring in terms of performance now (if ever).

Re: disabling compiler optimisation

Posted: Mon Aug 25, 2014 3:31 am
by bluemoon
If you are using flat memory model, you don't need to save/restore segment registers.
In kernel context with CPL=0 you can still use the user's data segments (of DPL=3).
Paging on the other hand only check CPL.
You may still want to validate the value of DS/ES, but that's much quicker than changing its values.

The other option is to use FS/GS in kernel and not touching DS/ES.

Re: disabling compiler optimisation

Posted: Mon Aug 25, 2014 1:13 pm
by Gigasoft
If there is such a thing as "best practices" when writing an operating system, it would simply be having some kind of clue about what you're doing. I have no idea what you are actually using FS and GS for. If you are not using them, then why are you messing with them for no reason? Just leave them alone. And loading the old value of DS into them afterwards is just random and confusing. The first priority should be to make your code behave correctly. Do not introduce weird, unpredictable behaviour just because you think you can save four nanoseconds. With this interrupt handler, all code that touches FS or GS would need to run with interrupts disabled, or their value would mysteriously change behind its back. Windows, as an optimization, loads the value 0 into GS when returning from an interrupt, rather than restoring the saved value, if CS equals 0x1b (the default for user mode code). This is confusing enough if you don't know about it, but loading an actual valid selector is much worse.

Most often, you will use FS or GS to point to CPU or thread variables, by having a separate GDT for each CPU. User mode code will also probably use one for pointing to thread variables.
The other option is to use FS/GS in kernel and not touching DS/ES.
That sounds pretty ridiculous. Every instruction would be one byte longer, you could not use the MOVS and CMPS instructions, and you would still have to ensure that it points to the right segment.

Re: disabling compiler optimisation

Posted: Mon Aug 25, 2014 7:33 pm
by Brendan
Hi,
bluemoon wrote:You may still want to validate the value of DS/ES, but that's much quicker than changing its values.
There's an even faster way. Assuming the OS is designed for "flat memory model" (no segmentation), there are typically only 3 values that user-level code can load into DS or ES. One is the correct value (e.g. a DPL=3, flat read/write data descriptor), one is the code segment (e.g. a DPL=3, flat read/execute code descriptor), and the other is the CPU's "null descriptor". For all other values, code running at CPL=3 will get a general protection fault and the descriptor won't be loaded.

Now, if you assume that the kernel uses the same segments; you can guarantee that if user-space code has changed DS or ES to something wrong and the kernel uses the segment register to write to anything then the kernel will get a general protection fault. This means that the kernel's general protection fault handler can check if the exception was caused by "bad DS or ES set by user-space" and correct the segment register and retry the instruction. With this in place, there's no need to check DS or ES anywhere else (and no need to check, save or load DS or ES in interrupt handlers).

For FS and GS; these are typically used for special purposes. For example; GS might be used by the kernel to quickly find a "per-CPU" data structure. In this case, user-space code can still only load the same 3 descriptors into GS; but if user-space code loads the "flat data descriptor" into GS the kernel won't get a general protection fault. However; there's another trick. If you make sure that the per-CPU data structure is 4 KiB or less, and also make sure that the first 4 KiB in every virtual address space is "not present"; then if user-space code has loaded the "user data" or "user code" descriptor into GS and the kernel uses it you can guarantee that the kernel will get a page fault (all offsets into the per-CPU data structure will be below 0x00001000 which is a "not present" page) and the page fault handler can correct the segment register and retry the instruction. With this in place, there'd be no need to check GS anywhere else (and no need to check, save or load GS in interrupt handlers).

Basically what I'm saying is that with a few little tricks, it's possible for the kernel to ignore segment registers completely and still be immune to anything user-space code might do to the segment registers; and (assuming no processes are malicious and therefore no processes modify segment registers anyway) this method ends up being "zero cycles of overhead on the critical path" (e.g. only a little extra overhead in the exception handlers, which aren't that critical for performance anyway).

However; if you use virtual80x86 mode for anything this method will not work, because any value that could've been valid for real mode may have been left in DS, ES, FS or GS. Fortunately, there's no need to bother with virtual80x86 mode for anything.
choco wrote:- I didn't really look at register segment saving. What is the best practice? Pushing all of them one after the other? It is not to much costly to do this?
For interrupt handlers the best practices (in my opinion only) are:
  • use tricks (see above) to avoid the need to touch segment registers in the first place
  • avoid the "C function call" overhead for cases where it will be a significant increase in the total cost of an interrupt handler, by writing some interrupt handlers in pure assembly. Note: this mainly applies to very simple interrupt handlers. For example, consider "multi-CPU TLB shootdown" where one CPU sends an IPI and the interrupt handler used by other CPUs to handle the IPI only needs to do about 5 instructions (e.g. load an "address to invalidate", do an INVLPG instruction and return), where the extra mess caused by C function calling conventions can double the cost of the interrupt handler.
  • avoid passing the interrupted code's registers to the C function for cases where there's no sane reason for the C interrupt handler to want them in the first place. Essentially; any/all C functions for IRQ handlers should be like "void myIRQhandler(void)". Note: for these cases C calling conventions guarantee that the called code will preserve various registers, and the assembly stub shouldn't bother saving or restoring any of these "callee preserved" registers.
  • have a general purpose function to handle kernel panics, that any kernel code can call for any reason, that can be used in the same way you'd use "assert()" in well written C code. For example, you might have a kernel function that is only ever called by other kernel functions, which does "if( argument_passed_by_caller == NULL) { kernel_panic("Caller of foo passed NULL!\n");".
  • have a general purpose function to handle crashes; which determines if the crash was caused by user-space code or kernel, and either terminates the process (if the crash was in user-space) or calls the general purpose kernel panic function (if the kernel crashed).
  • realise that if you've got a general purpose function to handle crashes, then none of the code used in any "C exception handler function" will be used by any other "C exception handler function".
  • realise that different exception handlers may need very different "assembly stubs" (and not just because some have error codes and some don't). For a very simple example, the very first thing a page fault handler should do is save CR2 somewhere safe (so that it can't be trashed by a second page fault), but this is pointless for any other exception handler. Double fault should probably use a hardware task switch and be extremely different. NMI and machine check are extremely tricky to get right (due to the fact that they can't be effectively masked/postponed and potential nesting) and also take very special care. The debug exception has special requirements (mostly involving messing with the "Resume Flag" before doing IRET so that single-step debugging works properly). For some of them (e.g. divide error) the assembly stub can immediately call the general purpose function to handle crashes with no intervening code.
  • By combining the last 2 things it's easy to see that for exceptions it's best to have a special purpose assembly stubs for each exception, where each of the special purpose assembly stubs calls the corresponding special purpose C code for that exception (if any). Having "generic assembly stubs" for exception handlers and/or having a "common exception handler" in C (that is nothing more than a big "switch()" that calls the C function that should've been called in the first place) is purely idiotic.
Finally; please note that for a good tutorial you want simplified examples to make it easy for the reader to learn from (and not complicated code that confuses the reader and makes it hard to learn anything); and the code for a real OS is complicated (and has to handle a large number of things that can be ignored in a tutorial). Basically; it's impossible to have a good tutorial that contains example code that is usable in a real OS.


Cheers,

Brendan