Page 1 of 1

Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 8:03 am
by austanss
I recently made the advancement to program the PIT, and the first thing I wrote was a sleep() function.
It looks like this:

Code: Select all

volatile uint64_t sys::chrono::ms_clock = 0;
void sys::chrono::sleep(volatile uint32_t ms)
{
    ms_clock = 0;
    while (ms_clock < ms);
    ms_clock = 0;        
}
ms_clock is incremented by my PIT IRQ handler:

Code: Select all

		sys::chrono::ms_clock++;
		milliseconds++;
		if (milliseconds >= io::pit::frequency_hz)
		{
			milliseconds = 0;
			seconds++;
		}
I stepped through the sleep() function with a debugger, and verified that the values were in fact correct. ms was 3000, and sys::chrono::ms_clock was 0 (to start).
Yet when stepping through the loop, it just skips by, no delay at all.

Could this be an optimization bug? Or am I at fault?

Re: Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 8:13 am
by gonzo
Try clobbering memory instead by using asm("" ::: "memory") and see if that forces it to re-read the value instead of using volatile.

Re: Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 8:15 am
by austanss
This is the disassembled output of the function:

Code: Select all

00000000001007b4 <sys::chrono::sleep(unsigned int)>:
  1007b4:	f3 0f 1e fa          	endbr64 
  1007b8:	55                   	push   rbp
  1007b9:	48 89 e5             	mov    rbp,rsp
  1007bc:	89 7d fc             	mov    DWORD PTR [rbp-0x4],edi
  1007bf:	48 c7 05 3e 28 01 00 	mov    QWORD PTR [rip+0x1283e],0x0        # 113008 <sys::chrono::ms_clock>
  1007c6:	00 00 00 00 
  1007ca:	8b 45 fc             	mov    eax,DWORD PTR [rbp-0x4]
  1007cd:	89 c2                	mov    edx,eax
  1007cf:	48 8b 05 32 28 01 00 	mov    rax,QWORD PTR [rip+0x12832]        # 113008 <sys::chrono::ms_clock>
  1007d6:	48 39 c2             	cmp    rdx,rax
  1007d9:	0f 97 c0             	seta   al
  1007dc:	84 c0                	test   al,al
  1007de:	74 02                	je     1007e2 <sys::chrono::sleep(unsigned int)+0x2e>
  1007e0:	eb e8                	jmp    1007ca <sys::chrono::sleep(unsigned int)+0x16>
  1007e2:	48 c7 05 1b 28 01 00 	mov    QWORD PTR [rip+0x1281b],0x0        # 113008 <sys::chrono::ms_clock>
  1007e9:	00 00 00 00 
  1007ed:	90                   	nop
  1007ee:	5d                   	pop    rbp
  1007ef:	c3                   	ret 

Re: Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 8:18 am
by austanss
gonzo wrote:Try clobbering memory instead by using asm("" ::: "memory") and see if that forces it to re-read the value instead of using volatile.
I tried that and it seemed to iterate in the loop, but the delay was still nonexistent.

Re: Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 8:37 am
by austanss
OK, I resolved the issue by using a hail mary, writing it in assembly:
(for x86-64, NASM, SysV calling convention)

Code: Select all

sleep:
    push rbp
    mov rbp, rsp
    push rax
    mov eax, edi
    .loop:
        cli
        mov rbx, qword [ms_clock]
        cmp rbx, rax
        jg .done
        sti
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        jmp .loop
    .done:
        sti
        pop rax
        pop rbp
        ret 

Re: Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 2:31 pm
by Octocontrabass
rizxt wrote:This is the disassembled output of the function:
You're passing the wrong flags to your cross-compiler. It's using the red zone to store data, but the red zone is clobbered by interrupts.

Re: Possible optimization bug in sleep() function

Posted: Wed Feb 17, 2021 8:24 pm
by Schol-R-LEA
Aside from any other problems with this code, I am not sure if this is really what you want for the sleep() function anyway.

Why do I say this? because you generally aren't going to be sleeping in the kernel at all, and even if you did, you wouldn't want it to busy-wait - you would use HLT to wait for the clock ticks, and count those. Even then, though, there are few if any situations where you would want to actually halt the kernel outright, if only because there are usually going to processes to run - whether kernel or user - while waiting for whatever you might try to sleep on.

Conversely, the sleep() function for userspace processes is a system call out to the scheduler, which should move the sleeping thread to a sleep queue and reschedule whatever other processes might still be active (or exhausting those, to a null process which simply loops on HLT). Since clock ticks should invoke the scheduler first and foremost (assuming preemption is being used), it is the scheduler's job to track the timing and move the awakened threads back into the running queue, not the function's. See this earlier thread for my on one way to do this.

Either way, while this will work to sleep the kernel, it isn't what your going to want, at least not once you have a scheduler running. I suppose I can see this version being useful as a stepping stone towards that, though.