Scrolling terminal in software-emulated text mode
Re: Scrolling terminal in software-emulated text mode
Using QEMU monitor, I use the dump-guest-memory command.
Also, I fixed******** the issue.
*******The rendering is so incredibly slow, I could tally down on a piece of paper when the characters appear on the screen as they appear. No, I'm not being hyperbolic. Literally.
Also, I fixed******** the issue.
*******The rendering is so incredibly slow, I could tally down on a piece of paper when the characters appear on the screen as they appear. No, I'm not being hyperbolic. Literally.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
Re: Scrolling terminal in software-emulated text mode
OK, fixed that easily by individually rendering entries when I put an entry into the buffer, rather than re-rendering the whole buffer.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Scrolling terminal in software-emulated text mode
How do you examine the memory dump?rizxt wrote:Using QEMU monitor, I use the dump-guest-memory command.
Your code has a lot of room for optimization. I see you already found one, but there are still others.rizxt wrote:The rendering is so incredibly slow
Re: Scrolling terminal in software-emulated text mode
Now my scrolling function has some screen tearing...
I'm going to attempt using a framebuffer buffer...
I'm going to attempt using a framebuffer buffer...
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
Re: Scrolling terminal in software-emulated text mode
After seeing the need for a framebuffer buffer, and recognizing that with my current codebase, I would need to inflate my kernel size to >100MB to create a framebuffer buffer, I now recognize that the need for memory management is dire.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
Re: Scrolling terminal in software-emulated text mode
Can anyone refer me to some resources on implementing a memory management system?
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
Re: Scrolling terminal in software-emulated text mode
Looks like you're starting to realize what I was saying here.
Some people can only learn from their own mistakes, there's nothing wrong with that. Good luck with the optimizations! As I have said, try not to recalculate the offset for each and every pixel. That will help a lot.
(In general don't use any function calls within the loop, because that clears the instruction cache and slows down the execution considerably. The best if you use as few variables as possible, so that the compiler can optimize for register-only code. Plus avoid multiplication. Use addition and shifting instead.)
Cheers,
bzt
Some people can only learn from their own mistakes, there's nothing wrong with that. Good luck with the optimizations! As I have said, try not to recalculate the offset for each and every pixel. That will help a lot.
(In general don't use any function calls within the loop, because that clears the instruction cache and slows down the execution considerably. The best if you use as few variables as possible, so that the compiler can optimize for register-only code. Plus avoid multiplication. Use addition and shifting instead.)
Cheers,
bzt
Re: Scrolling terminal in software-emulated text mode
That makes no sense. What do you have in mind? Your codebase's size has nothing to do with the video card's linear framebuffer.rizxt wrote:After seeing the need for a framebuffer buffer, and recognizing that with my current codebase, I would need to inflate my kernel size to >100MB to create a framebuffer buffer, I now recognize that the need for memory management is dire.
Cheers,
bzt
Re: Scrolling terminal in software-emulated text mode
I meant that the only way I can properly reserve memory is by inflating my kernel through the use of resb. Otherwise I have to use arbitrary memory addresses that might overwrite something important or get overwritten by something else.bzt wrote:That makes no sense. What do you have in mind? Your codebase's size has nothing to do with the video card's linear framebuffer.rizxt wrote:After seeing the need for a framebuffer buffer, and recognizing that with my current codebase, I would need to inflate my kernel size to >100MB to create a framebuffer buffer, I now recognize that the need for memory management is dire.
Cheers,
bzt
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
Re: Scrolling terminal in software-emulated text mode
That still makes no sense. Using resb will not inflate your kernel's size. Plus since framebuffer is in MMIO, usually above the last available RAM address, you can't overwrite it by accident. Not to mention that the framebuffer's address changes, it's different on each machine.rizxt wrote:I meant that the only way I can properly reserve memory is by inflating my kernel through the use of resb. Otherwise I have to use arbitrary memory addresses that might overwrite something important or get overwritten by something else.
You'll need to query the memory map to see which regions of the RAM are free to use. You must not overwrite anything that's not listed as free. The framebuffer might be included in the list as used memory, but not necessarily, and not on all machines.
Cheers,
bzt
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Scrolling terminal in software-emulated text mode
It sounds like it's time to dive into memory management. By allocating memory dynamically from your kernel heap, you can ensure your kernel only allocates memory it's actually going to use.
But if resb is making your kernel binary larger, perhaps you're not putting it in the .bss section like you're supposed to. (It still reserves memory no matter where you put it!)
But if resb is making your kernel binary larger, perhaps you're not putting it in the .bss section like you're supposed to. (It still reserves memory no matter where you put it!)
Re: Scrolling terminal in software-emulated text mode
Just write a memory allocator! The memory allocator is the first part of the kernel you should write, that needs to be well tested and robust enough to build everything else on top of!rizxt wrote:I meant that the only way I can properly reserve memory is by inflating my kernel through the use of resb. Otherwise I have to use arbitrary memory addresses that might overwrite something important or get overwritten by something else.bzt wrote:That makes no sense. What do you have in mind? Your codebase's size has nothing to do with the video card's linear framebuffer.rizxt wrote:After seeing the need for a framebuffer buffer, and recognizing that with my current codebase, I would need to inflate my kernel size to >100MB to create a framebuffer buffer, I now recognize that the need for memory management is dire.
Cheers,
bzt
CuriOS: A single address space GUI based operating system built upon a fairly pure Microkernel/Nanokernel. Download latest bootable x86 Disk Image: https://github.com/h5n1xp/CuriOS/blob/main/disk.img.zip
Discord:https://discord.gg/zn2vV2Su
Discord:https://discord.gg/zn2vV2Su
Re: Scrolling terminal in software-emulated text mode
This makes no sense. Is the CPU going to throw away 128KiB of cached instructions on every function call? I think you're thinking of the instruction pipeline, but even there, CPUs became very good at branch prediction years ago. Note the "prediction" part - this is for conditional branches; unconditional calls must surely have been optimized long before.bzt wrote:In general don't use any function calls within the loop, because that clears the instruction cache and slows down the execution considerably.
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
Re: Scrolling terminal in software-emulated text mode
There are multiple levels of caches in a CPU, furthermore the cache size differs for every level in every CPU family. You can't state that every CPU has 128K instruction cache, that's not true. At which level? For which CPU model?eekee wrote:This makes no sense. Is the CPU going to throw away 128KiB of cached instructions on every function call?bzt wrote:In general don't use any function calls within the loop, because that clears the instruction cache and slows down the execution considerably.
https://en.wikipedia.org/wiki/CPU_cache ... _processor
Plus using a function call would cripple the compiler's optimizer by not allowing it to use all registers as it could (because not all registers are preserved in a call it assumes they will change, plus arguments must be passed in dedicated registers as per the ABI etc.), not to mention the additional stack usage and unnecessary stack frame creation/deletion. etc. etc. etc. If you doubt this, just use "objdump" to disassemble the generated code!
https://en.wikipedia.org/wiki/Register_allocation
"CALL" instruction does not use branch predictions as conditional near jumps like "JE", "JNE" etc. The distance of the call also matters, see Intel Software Developer Manual Vol 2A page 3-126. See section "Operation" with the microcode. Also read about cache handling with and without LFENCE for both near, normal and far calls (hint: there's a difference). However not using "CALL" in the first place will reduce the overhead of these to zero for sure.eekee wrote:I think you're thinking of the instruction pipeline, but even there, CPUs became very good at branch prediction years ago.
If that were true, then the compiler wouldn't inline certain functions nor unroll loops for speed optimization. But it does (long before execution, so long that those are done in compile-time).eekee wrote:Note the "prediction" part - this is for conditional branches; unconditional calls must surely have been optimized long before.
Good read on the topic: http://www.ece.uah.edu/~milenka/docs/mi ... WDDD02.pdf (explains why jump distance matters, and some other things as well)
And of course read: https://software.intel.com/content/www/ ... anual.html (the official optimization manual)
Finally, if this isn't enough, or you don't want to read through all the docs, then just measure it! Do a simple loop with a function call and one with an inlined function, then measure how long they're running! Same goes for the address calculation: write a loop where you calculate the address in each and every loop, and another one where you calculate the upper left corner before the loop once and you only add scanline to that inside the loop. Multiplication is a much more expensive instruction than a single add even on modern computers. For loops that run many many times and often, every CPU cycle counts.
Cheers,
bzt
Re: Scrolling terminal in software-emulated text mode
@bzt: I put myself in a difficult position by challenging you here because I don't exactly want to argue with you right now, but... I don't know, the things you say are just too weird. Some of your points are good -- I'm almost sure you're right about multiplication -- others I could challenge further. If reducing the number of variables results in more complex expressions then there's no improvement because intermediate results in the expressions need to be kept somewhere. I'm sure this somewhere is not any more or less registerizable than local variables. As for cache, you have to admit that it's amusing to see you ask me "which level?" when you made the claim, "that clears the instruction cache" in the first place! (Emphasis mine.) I admit I made a wild assumption: that the level 1 instruction cache will typically be 128KiB in size. High-end processors had this amount when I was last paying attention many years ago, (if I remember right, as always,) so I assume most or all processors have now have 128 KiB or more. The wikipedia page you linked tells me its divided into blocks, which makes your claim of one function call dumping the entire cache even more... entertaining! (I knew it was divided into cache lines anyway, as I'm sure do you.) Is this a language issue? Did you not mean to imply it flushes the entire thing? (Let's call "the instruction cache" the level 1 instruction cache for the sake of argument.) Anyway, I should test as you told me to, although I think I'll have to be careful to avoid the results telling me more about the compiler than the CPU -- it may very well inline function calls in the loop. The bit about call distance making a difference is certainly good info. Unrolling loops I'm not so sure about again. It was clearly a bad idea when I was using source-based Linux distributions because unrolled loops make bad use of caching, but I was using cheap CPUs with limited cache. I realized this long before the Gentoo "ricers" of that era. Here's something I tested back then: I got good results from optimizing all the userspace for size. I think I optimized my kernel for speed (which would include framebuffer scrolling) but I'm almost sure I never used -O3. I used -O2 which didn't unroll loops. (This was all with Gcc 3, maybe 4.0.)
As for scrolling in userspace, I set my xterms to "jumpscroll" which gets a very fast overall rate regardless of the rate of scrolling individual lines. I'm not sure what the algorithm is, but it works. I might look it up if I find myself struggling with scroll speed, but my terminal/console plans are odd anyway.
As for scrolling in userspace, I set my xterms to "jumpscroll" which gets a very fast overall rate regardless of the rate of scrolling individual lines. I'm not sure what the algorithm is, but it works. I might look it up if I find myself struggling with scroll speed, but my terminal/console plans are odd anyway.
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie