Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
mariuszp wrote:Yes, I had the problem with an event queue before, and fixed it in (almost) exactly the same way as you just described.
Good. How did you fix it?
BY having the OS have a blocking "wait for mouse" call, which after returning reports only the most recent mouse state. The previous ones are not queued up.
Also, it seems most of my performance issue is with text rendering, which I do using FreeType. A 5-paragraph "lorem ipsum" takes 12 seconds to render 10 times (in DejaVu Sans, 20)...
Your ddiWritePen() function is horribly inefficient.
You're calling FT_Load_Glyph in a loop without reusing the result. It's probably worth to try to cache the result of that function.
The same might also apply to FT_Render_Glyph. You might want to look into FreeType's caching API or roll your own cache.
Instead of directly modifying the target surface you're drawing every single pixel with a ddiFillRect(). ddiFillRect() thus handles special cases like negative offsets for every single pixel! Just manipulate the surface memory directly. Also swap the x and y loops to get better caching behavior.
ddiWritePen() performs a realloc() for each character in the text!
ddiFillRect() computes the surface pitch on the fly, which involves a multiplication. Change that to a shift or store the pitch as part of the surface.
ddiFillRect() calls into ddiFill(), which calls into ddiCopy() once per pixel! ddiCopy() seems to be a roll-your-own memcpy() (that still calls memcpy?). Apart from being incorrect (accessing the buffer as uint64_t violates aliasing rules) I also expect it to be slower than GCCs internal optimized SSE memcpy().
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Korona wrote:Your ddiWritePen() function is horribly inefficient.
You're calling FT_Load_Glyph in a loop without reusing the result. It's probably worth to try to cache the result of that function.
The same might also apply to FT_Render_Glyph. You might want to look into FreeType's caching API or roll your own cache.
Instead of directly modifying the target surface you're drawing every single pixel with a ddiFillRect(). ddiFillRect() thus handles special cases like negative offsets for every single pixel! Just manipulate the surface memory directly. Also swap the x and y loops to get better caching behavior.
ddiWritePen() performs a realloc() for each character in the text!
ddiFillRect() computes the surface pitch on the fly, which involves a multiplication. Change that to a shift or store the pitch as part of the surface.
ddiFillRect() calls into ddiFill(), which calls into ddiCopy() once per pixel! ddiCopy() seems to be a roll-your-own memcpy() (that still calls memcpy?). Apart from being incorrect (accessing the buffer as uint64_t violates aliasing rules) I also expect it to be slower than GCCs internal optimized SSE memcpy().
The library is compiled with -O3 which appears to optimise ddiCopy(). But either way, I'll look into all the issues one-by-one and see how much i can boost the performance
The loop and other micro-optimisations didn't do too much, but the caching certainly has. Render 5 paragraphs of "lorem ipsum" ten times previously took approximately 12000 ms, now it takes an average of 470 ms. This is more than 25 times faster!
I;ll continue to look for possible optimisaitons, and will get onto implementing partial updates in the compositor.