Optimizing scrolling in VBE
Optimizing scrolling in VBE
I had implemented scroll down it was very slow I tried double buffering but nothing significant happened (even get worse). Now I have implemented SSE, It gives reasonable performance in other emulators except Bochs. It improves the performance a little bit in Bochs. Now what should I do?
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.
Re: Optimizing scrolling in VBE
I'd try a circular buffer. To draw a character, write it on the buffer and on the screen. When you want to shift everything up, rotate the buffer (not physically of course) and copy it to the screen. I haven't tested this, but it's all I can think of.
If you want to make Bochs faster, you could compiling it with the A20 line permanently enabled, which means it doesn't have to check if you're accessing above 0xFFFF if the line has been enabled.
-bace
If you want to make Bochs faster, you could compiling it with the A20 line permanently enabled, which means it doesn't have to check if you're accessing above 0xFFFF if the line has been enabled.
-bace
"for example, turning off the system’s power through the movement of a large red switch" - the Advanced Configuration and Power Interface Specification
- BrightLight
- Member
- Posts: 901
- Joined: Sat Dec 27, 2014 9:11 am
- Location: Maadi, Cairo, Egypt
- Contact:
Re: Optimizing scrolling in VBE
Bochs is known to be a slow emulator. If it works fast in other emulators but slowly in Bochs, that's pretty normal.
You know your OS is advanced when you stop using the Intel programming guide as a reference.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Optimizing scrolling in VBE
Two key approaches to the problem:
1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp
2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
- 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
- Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.
1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp
2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
- 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
- Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.
Re: Optimizing scrolling in VBE
Is your 'scroll' a simple memmove() style operation where you read from video RAM and write to another place in video RAM ? I ask because writing to video RAM is very slow and reading is even slower. You should make sure NEVER to read from video RAM and only to write if a pixel changes. You might need to triple buffer to improve performance.muazzam wrote:I had implemented scroll down it was very slow I tried double buffering but nothing significant happened (even get worse). Now I have implemented SSE, It gives reasonable performance in other emulators except Bochs. It improves the performance a little bit in Bochs. Now what should I do?
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.
If a trainstation is where trains stop, what is a workstation ?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Optimizing scrolling in VBE
Triple buffering is the act of using a third buffer to continue work on a subsequent frame when you're waiting for vsync instead of idling for that situation to happen. It has zero advantages when you're not vsyncing, and it's exposure through VBE is optional so it might not be available in the first place without further hacks.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.
- AndrewAPrice
- Member
- Posts: 2303
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
Re: Optimizing scrolling in VBE
My message to the window manager to invalidate the screen comes with min/max x/y coords, and I only copy across that region of the screen. Without this simple optimization, I can only get one frame every few seconds on ultra high resolutions.
My OS is Perception.
Re: Optimizing scrolling in VBE
Perhaps I've given the incorrect name to what I mean. Basically I'm saying that you need to keep a copy of video RAM so that you don't need to read from video RAM. Unnecessary writes can also be prevented by checking the buffered copy to see whether pixels have actually changed. Sorry for misusing the term.Combuster wrote:Triple buffering is the act of using a third buffer to continue work on a subsequent frame when you're waiting for vsync instead of idling for that situation to happen. It has zero advantages when you're not vsyncing, and it's exposure through VBE is optional so it might not be available in the first place without further hacks.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.
If a trainstation is where trains stop, what is a workstation ?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Optimizing scrolling in VBE
Hmm... I have a realtime software raytracer gimmick that can do over 720p fluently, so pumping entire screens of pixels to video memory at that rate should be possible with more than just a few frames per second total.MessiahAndrw wrote:I can only get one frame every few seconds on ultra high resolutions.
Maybe I should try and run it inside my OS and see how well it performs there for the fun of it...
Re: Optimizing scrolling in VBE
Thanks for all replies!
I can not reduce bit depth as I have to develop games and GUI also.
Is changing display start (vbe function 7) is hardware scrolling? if yes then, I think, it uses cx/dx instead of ecx/edx for set display start address So, how more than 1 MiB memory is accessed?Combuster wrote:Two key approaches to the problem:
1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp
I can not reduce bit depth as I have to develop games and GUI also.
I am already using LFB.Combuster wrote: 2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
I am not operating on individual 24-bit pixel but 128 bits as a whole with SSE. Also, I think, bochs does not supports 32-bit modes.Combuster wrote: - 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
Unfortunately I can not avoid reading from video memory and I don't have hardware blitters available (in VBE).Combuster wrote: - Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.
Re: Optimizing scrolling in VBE
Yes it is simple memmove() style operation. Maybe I should post source code.gerryg400 wrote:Is your 'scroll' a simple memmove() style operation where you read from video RAM and write to another place in video RAM ? I ask because writing to video RAM is very slow and reading is even slower. You should make sure NEVER to read from video RAM and only to write if a pixel changes. You might need to triple buffer to improve performance.
Note: I am not asking to fix this for me. I am just posting to avoid ambiguities.
Code: Select all
;________________________________________________
;Scroll the screen down
;IN/OUT: nothing
;
scrollDown:
mov esi, dword[LFBAddress]
mov eax, dword[bytesPerScanline]
mov ebx, fonts.height ;Scroll one line up
mul ebx
mov edi, esi
sub edi, eax
mov eax, dword[bytesPerScanline] ;Total bytes on screen
movzx ebx, word[resolution.y]
mul ebx
mov ecx, eax
shr ecx, 7 ;Divide by 128
mov ax, 0x18
mov es, ax
mov ds, ax
.copy:
prefetch [esi+128]
prefetch [esi+160]
prefetch [esi+192]
prefetch [esi+224]
movdqa xmm0, [esi+0]
movdqa xmm1, [esi+16]
movdqa xmm2, [esi+32]
movdqa xmm3, [esi+48]
movdqa xmm4, [esi+64]
movdqa xmm5, [esi+80]
movdqa xmm6, [esi+96]
movdqa xmm7, [esi+112]
movdqa [edi+0], xmm0
movdqa [edi+16], xmm1
movdqa [edi+32], xmm2
movdqa [edi+48], xmm3
movdqa [edi+64], xmm4
movdqa [edi+80], xmm5
movdqa [edi+96], xmm6
movdqa [edi+112], xmm7
add edi, 128
add esi, 128
loop .copy
mov ax, 0x10
mov ds, ax
movzx eax, word[maxRow]
call clearLine
.end:
pop es
pop ds
Re: Optimizing scrolling in VBE
It really worth implementing an off-screen buffer (unless you can work with accelerations). What is your reason not to do so?muazzam wrote:Unfortunately I can not avoid reading from video memory
Reading from display RAM is tens times, if not hundreds or more, slower than reading from system memory.
Re: Optimizing scrolling in VBE
My OS supports double buffering but not all the time. There are three function useVideoBuffer1 (real video memory), useVideoBuffer2 (back buffer) and refreshScreen. Should I enable double buffering all the time and let the timer to refreshScreen (copy back buffer to real video memory)?bluemoon wrote:It really worth implementing an off-screen buffer (unless you can work with accelerations). What is your reason not to do so?muazzam wrote:Unfortunately I can not avoid reading from video memory
Reading from display RAM is tens times, if not hundreds or more, slower than reading from system memory.
Re: Optimizing scrolling in VBE
But my first priority is that it should works well even on slow computers.omarrx024 wrote:Bochs is known to be a slow emulator. If it works fast in other emulators but slowly in Bochs, that's pretty normal.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Optimizing scrolling in VBE
Don't think, know. In this case, RTFM. The manual is very explicit about it.muazzam wrote:I think, it uses cx/dx instead of ecx/edx for set display start address
Again RTFM on what CX and DX actually mean.So, how more than 1 MiB memory is accessed?
You are still plotting individual character pixels in 24 bit mode before you get there.I am not operating on individual 24-bit pixel but 128 bits as a whole with SSE.
Again, don't think, know. In particular, this is a lie.Also, I think, bochs does not supports 32-bit modes.
This is also a lie. It has been explained already, of which gerryg400 did before you made this claim so you could have actually known you were lying.Unfortunately I can not avoid reading from video memory