Optimizing scrolling in VBE

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Muazzam
Member
Member
Posts: 543
Joined: Mon Jun 16, 2014 5:59 am
Location: Shahpur, Layyah, Pakistan

Optimizing scrolling in VBE

Post by Muazzam »

I had implemented scroll down it was very slow I tried double buffering but nothing significant happened (even get worse). Now I have implemented SSE, It gives reasonable performance in other emulators except Bochs. It improves the performance a little bit in Bochs. Now what should I do?
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.
User avatar
bace
Member
Member
Posts: 34
Joined: Fri Jan 16, 2015 10:41 am
Location: United Kingdom

Re: Optimizing scrolling in VBE

Post by bace »

I'd try a circular buffer. To draw a character, write it on the buffer and on the screen. When you want to shift everything up, rotate the buffer (not physically of course) and copy it to the screen. I haven't tested this, but it's all I can think of.
If you want to make Bochs faster, you could compiling it with the A20 line permanently enabled, which means it doesn't have to check if you're accessing above 0xFFFF if the line has been enabled.

-bace
"for example, turning off the system’s power through the movement of a large red switch" - the Advanced Configuration and Power Interface Specification
User avatar
BrightLight
Member
Member
Posts: 901
Joined: Sat Dec 27, 2014 9:11 am
Location: Maadi, Cairo, Egypt
Contact:

Re: Optimizing scrolling in VBE

Post by BrightLight »

Bochs is known to be a slow emulator. If it works fast in other emulators but slowly in Bochs, that's pretty normal. ;)
You know your OS is advanced when you stop using the Intel programming guide as a reference.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimizing scrolling in VBE

Post by Combuster »

Two key approaches to the problem:

1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp

2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
- 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
- Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Optimizing scrolling in VBE

Post by gerryg400 »

muazzam wrote:I had implemented scroll down it was very slow I tried double buffering but nothing significant happened (even get worse). Now I have implemented SSE, It gives reasonable performance in other emulators except Bochs. It improves the performance a little bit in Bochs. Now what should I do?
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.
Is your 'scroll' a simple memmove() style operation where you read from video RAM and write to another place in video RAM ? I ask because writing to video RAM is very slow and reading is even slower. You should make sure NEVER to read from video RAM and only to write if a pixel changes. You might need to triple buffer to improve performance.
If a trainstation is where trains stop, what is a workstation ?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimizing scrolling in VBE

Post by Combuster »

Triple buffering is the act of using a third buffer to continue work on a subsequent frame when you're waiting for vsync instead of idling for that situation to happen. It has zero advantages when you're not vsyncing, and it's exposure through VBE is optional so it might not be available in the first place without further hacks.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
AndrewAPrice
Member
Member
Posts: 2303
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: Optimizing scrolling in VBE

Post by AndrewAPrice »

My message to the window manager to invalidate the screen comes with min/max x/y coords, and I only copy across that region of the screen. Without this simple optimization, I can only get one frame every few seconds on ultra high resolutions.
My OS is Perception.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Optimizing scrolling in VBE

Post by gerryg400 »

Combuster wrote:Triple buffering is the act of using a third buffer to continue work on a subsequent frame when you're waiting for vsync instead of idling for that situation to happen. It has zero advantages when you're not vsyncing, and it's exposure through VBE is optional so it might not be available in the first place without further hacks.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.
Perhaps I've given the incorrect name to what I mean. Basically I'm saying that you need to keep a copy of video RAM so that you don't need to read from video RAM. Unnecessary writes can also be prevented by checking the buffered copy to see whether pixels have actually changed. Sorry for misusing the term.
If a trainstation is where trains stop, what is a workstation ?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimizing scrolling in VBE

Post by Combuster »

MessiahAndrw wrote:I can only get one frame every few seconds on ultra high resolutions.
Hmm... I have a realtime software raytracer gimmick that can do over 720p fluently, so pumping entire screens of pixels to video memory at that rate should be possible with more than just a few frames per second total.

Maybe I should try and run it inside my OS and see how well it performs there for the fun of it...
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Muazzam
Member
Member
Posts: 543
Joined: Mon Jun 16, 2014 5:59 am
Location: Shahpur, Layyah, Pakistan

Re: Optimizing scrolling in VBE

Post by Muazzam »

Thanks for all replies!
Combuster wrote:Two key approaches to the problem:

1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp
Is changing display start (vbe function 7) is hardware scrolling? if yes then, I think, it uses cx/dx instead of ecx/edx for set display start address So, how more than 1 MiB memory is accessed?
I can not reduce bit depth as I have to develop games and GUI also.
Combuster wrote: 2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
I am already using LFB.
Combuster wrote: - 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
I am not operating on individual 24-bit pixel but 128 bits as a whole with SSE. Also, I think, bochs does not supports 32-bit modes.
Combuster wrote: - Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.
Unfortunately I can not avoid reading from video memory and I don't have hardware blitters available (in VBE).
User avatar
Muazzam
Member
Member
Posts: 543
Joined: Mon Jun 16, 2014 5:59 am
Location: Shahpur, Layyah, Pakistan

Re: Optimizing scrolling in VBE

Post by Muazzam »

gerryg400 wrote:Is your 'scroll' a simple memmove() style operation where you read from video RAM and write to another place in video RAM ? I ask because writing to video RAM is very slow and reading is even slower. You should make sure NEVER to read from video RAM and only to write if a pixel changes. You might need to triple buffer to improve performance.
Yes it is simple memmove() style operation. Maybe I should post source code.
Note: I am not asking to fix this for me. I am just posting to avoid ambiguities.

Code: Select all

;________________________________________________
;Scroll the screen down
;IN/OUT: nothing
;
scrollDown:
	mov esi, dword[LFBAddress]
	
	mov eax, dword[bytesPerScanline]
	mov ebx, fonts.height	;Scroll one line up
	mul ebx
	
	mov edi, esi
	sub edi, eax
	
	mov eax, dword[bytesPerScanline]	;Total bytes on screen
	movzx ebx, word[resolution.y]
	mul ebx
	
	mov ecx, eax
	shr ecx, 7		;Divide by 128
	
	mov ax, 0x18
	mov es, ax
	mov ds, ax
	
.copy:

	prefetch [esi+128]
	prefetch [esi+160]
	prefetch [esi+192]
	prefetch [esi+224]

	movdqa xmm0, [esi+0]
	movdqa xmm1, [esi+16]
	movdqa xmm2, [esi+32]
	movdqa xmm3, [esi+48]
	movdqa xmm4, [esi+64]
	movdqa xmm5, [esi+80]
	movdqa xmm6, [esi+96]
	movdqa xmm7, [esi+112]
	
	movdqa [edi+0], xmm0 
	movdqa [edi+16], xmm1
	movdqa [edi+32], xmm2
	movdqa [edi+48], xmm3
	movdqa [edi+64], xmm4
	movdqa [edi+80], xmm5
	movdqa [edi+96], xmm6
	movdqa [edi+112], xmm7
	
	
	add edi, 128
	add esi, 128
	
	loop .copy
	
	mov ax, 0x10
	mov ds, ax
	
	movzx eax, word[maxRow]
	call clearLine
.end:
	pop es
	pop ds
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Optimizing scrolling in VBE

Post by bluemoon »

muazzam wrote:Unfortunately I can not avoid reading from video memory
It really worth implementing an off-screen buffer (unless you can work with accelerations). What is your reason not to do so?
Reading from display RAM is tens times, if not hundreds or more, slower than reading from system memory.
User avatar
Muazzam
Member
Member
Posts: 543
Joined: Mon Jun 16, 2014 5:59 am
Location: Shahpur, Layyah, Pakistan

Re: Optimizing scrolling in VBE

Post by Muazzam »

bluemoon wrote:
muazzam wrote:Unfortunately I can not avoid reading from video memory
It really worth implementing an off-screen buffer (unless you can work with accelerations). What is your reason not to do so?
Reading from display RAM is tens times, if not hundreds or more, slower than reading from system memory.
My OS supports double buffering but not all the time. There are three function useVideoBuffer1 (real video memory), useVideoBuffer2 (back buffer) and refreshScreen. Should I enable double buffering all the time and let the timer to refreshScreen (copy back buffer to real video memory)?
User avatar
Muazzam
Member
Member
Posts: 543
Joined: Mon Jun 16, 2014 5:59 am
Location: Shahpur, Layyah, Pakistan

Re: Optimizing scrolling in VBE

Post by Muazzam »

omarrx024 wrote:Bochs is known to be a slow emulator. If it works fast in other emulators but slowly in Bochs, that's pretty normal. ;)
But my first priority is that it should works well even on slow computers.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimizing scrolling in VBE

Post by Combuster »

muazzam wrote:I think, it uses cx/dx instead of ecx/edx for set display start address
Don't think, know. In this case, RTFM. The manual is very explicit about it.
So, how more than 1 MiB memory is accessed?
Again RTFM on what CX and DX actually mean.
I am not operating on individual 24-bit pixel but 128 bits as a whole with SSE.
You are still plotting individual character pixels in 24 bit mode before you get there.
Also, I think, bochs does not supports 32-bit modes.
Again, don't think, know. In particular, this is a lie.
Unfortunately I can not avoid reading from video memory
This is also a lie. It has been explained already, of which gerryg400 did before you made this claim so you could have actually known you were lying.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply