Page 1 of 2

Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 12:22 pm
by Muazzam
I had implemented scroll down it was very slow I tried double buffering but nothing significant happened (even get worse). Now I have implemented SSE, It gives reasonable performance in other emulators except Bochs. It improves the performance a little bit in Bochs. Now what should I do?
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 1:21 pm
by bace
I'd try a circular buffer. To draw a character, write it on the buffer and on the screen. When you want to shift everything up, rotate the buffer (not physically of course) and copy it to the screen. I haven't tested this, but it's all I can think of.
If you want to make Bochs faster, you could compiling it with the A20 line permanently enabled, which means it doesn't have to check if you're accessing above 0xFFFF if the line has been enabled.

-bace

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 1:45 pm
by BrightLight
Bochs is known to be a slow emulator. If it works fast in other emulators but slowly in Bochs, that's pretty normal. ;)

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 2:32 pm
by Combuster
Two key approaches to the problem:

1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp

2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
- 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
- Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 2:35 pm
by gerryg400
muazzam wrote:I had implemented scroll down it was very slow I tried double buffering but nothing significant happened (even get worse). Now I have implemented SSE, It gives reasonable performance in other emulators except Bochs. It improves the performance a little bit in Bochs. Now what should I do?
1. Is there a better thing than SSE?
2. Should I use function 7 of VBE?
3. Should I use triple buffer (I think not)?
4. Any other method.
5. Or it will remain slow on bochs.
Any help or suggestions will be greatly appreciated.
EDIT: I am using VESA modes 1024x768x24 and 800x600x24 with single buffering.
Is your 'scroll' a simple memmove() style operation where you read from video RAM and write to another place in video RAM ? I ask because writing to video RAM is very slow and reading is even slower. You should make sure NEVER to read from video RAM and only to write if a pixel changes. You might need to triple buffer to improve performance.

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 2:53 pm
by Combuster
Triple buffering is the act of using a third buffer to continue work on a subsequent frame when you're waiting for vsync instead of idling for that situation to happen. It has zero advantages when you're not vsyncing, and it's exposure through VBE is optional so it might not be available in the first place without further hacks.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 3:09 pm
by AndrewAPrice
My message to the window manager to invalidate the screen comes with min/max x/y coords, and I only copy across that region of the screen. Without this simple optimization, I can only get one frame every few seconds on ultra high resolutions.

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 3:47 pm
by gerryg400
Combuster wrote:Triple buffering is the act of using a third buffer to continue work on a subsequent frame when you're waiting for vsync instead of idling for that situation to happen. It has zero advantages when you're not vsyncing, and it's exposure through VBE is optional so it might not be available in the first place without further hacks.
In addition, triple buffering and basically any form of page flipping schemes give you optimising headaches when the differences between frames are small and full-screen copies are not recommended.
Perhaps I've given the incorrect name to what I mean. Basically I'm saying that you need to keep a copy of video RAM so that you don't need to read from video RAM. Unnecessary writes can also be prevented by checking the buffered copy to see whether pixels have actually changed. Sorry for misusing the term.

Re: Optimizing scrolling in VBE

Posted: Sun Feb 15, 2015 3:54 pm
by Combuster
MessiahAndrw wrote:I can only get one frame every few seconds on ultra high resolutions.
Hmm... I have a realtime software raytracer gimmick that can do over 720p fluently, so pumping entire screens of pixels to video memory at that rate should be possible with more than just a few frames per second total.

Maybe I should try and run it inside my OS and see how well it performs there for the fun of it...

Re: Optimizing scrolling in VBE

Posted: Mon Feb 16, 2015 12:05 am
by Muazzam
Thanks for all replies!
Combuster wrote:Two key approaches to the problem:

1) Reduce the amount of updates sent to the video hardware:
- Use hardware scrolling (or ring buffers in system memory if you do backbuffering)
- Don't scroll so you don't have to update the entire screen.
- Delay committing every scroll to video memory in case more newlines follow
- Reduce the bitdepth. A text console may not need more than 8bpp
Is changing display start (vbe function 7) is hardware scrolling? if yes then, I think, it uses cx/dx instead of ecx/edx for set display start address So, how more than 1 MiB memory is accessed?
I can not reduce bit depth as I have to develop games and GUI also.
Combuster wrote: 2) Improve the speed of individual updates
- Use a LFB mode instead of windowed video mode.
I am already using LFB.
Combuster wrote: - 24-bit logic is a pain, and the compiler has a hard time optimising it. 32bpp modes may be faster in many circumstances.
I am not operating on individual 24-bit pixel but 128 bits as a whole with SSE. Also, I think, bochs does not supports 32-bit modes.
Combuster wrote: - Set video memory to something better than uncacheable on real hardware (i.e. write-combining)
- Avoid reading video memory.
- Use the hardware blitter.
Unfortunately I can not avoid reading from video memory and I don't have hardware blitters available (in VBE).

Re: Optimizing scrolling in VBE

Posted: Mon Feb 16, 2015 12:13 am
by Muazzam
gerryg400 wrote:Is your 'scroll' a simple memmove() style operation where you read from video RAM and write to another place in video RAM ? I ask because writing to video RAM is very slow and reading is even slower. You should make sure NEVER to read from video RAM and only to write if a pixel changes. You might need to triple buffer to improve performance.
Yes it is simple memmove() style operation. Maybe I should post source code.
Note: I am not asking to fix this for me. I am just posting to avoid ambiguities.

Code: Select all

;________________________________________________
;Scroll the screen down
;IN/OUT: nothing
;
scrollDown:
	mov esi, dword[LFBAddress]
	
	mov eax, dword[bytesPerScanline]
	mov ebx, fonts.height	;Scroll one line up
	mul ebx
	
	mov edi, esi
	sub edi, eax
	
	mov eax, dword[bytesPerScanline]	;Total bytes on screen
	movzx ebx, word[resolution.y]
	mul ebx
	
	mov ecx, eax
	shr ecx, 7		;Divide by 128
	
	mov ax, 0x18
	mov es, ax
	mov ds, ax
	
.copy:

	prefetch [esi+128]
	prefetch [esi+160]
	prefetch [esi+192]
	prefetch [esi+224]

	movdqa xmm0, [esi+0]
	movdqa xmm1, [esi+16]
	movdqa xmm2, [esi+32]
	movdqa xmm3, [esi+48]
	movdqa xmm4, [esi+64]
	movdqa xmm5, [esi+80]
	movdqa xmm6, [esi+96]
	movdqa xmm7, [esi+112]
	
	movdqa [edi+0], xmm0 
	movdqa [edi+16], xmm1
	movdqa [edi+32], xmm2
	movdqa [edi+48], xmm3
	movdqa [edi+64], xmm4
	movdqa [edi+80], xmm5
	movdqa [edi+96], xmm6
	movdqa [edi+112], xmm7
	
	
	add edi, 128
	add esi, 128
	
	loop .copy
	
	mov ax, 0x10
	mov ds, ax
	
	movzx eax, word[maxRow]
	call clearLine
.end:
	pop es
	pop ds

Re: Optimizing scrolling in VBE

Posted: Mon Feb 16, 2015 12:31 am
by bluemoon
muazzam wrote:Unfortunately I can not avoid reading from video memory
It really worth implementing an off-screen buffer (unless you can work with accelerations). What is your reason not to do so?
Reading from display RAM is tens times, if not hundreds or more, slower than reading from system memory.

Re: Optimizing scrolling in VBE

Posted: Mon Feb 16, 2015 12:44 am
by Muazzam
bluemoon wrote:
muazzam wrote:Unfortunately I can not avoid reading from video memory
It really worth implementing an off-screen buffer (unless you can work with accelerations). What is your reason not to do so?
Reading from display RAM is tens times, if not hundreds or more, slower than reading from system memory.
My OS supports double buffering but not all the time. There are three function useVideoBuffer1 (real video memory), useVideoBuffer2 (back buffer) and refreshScreen. Should I enable double buffering all the time and let the timer to refreshScreen (copy back buffer to real video memory)?

Re: Optimizing scrolling in VBE

Posted: Mon Feb 16, 2015 12:46 am
by Muazzam
omarrx024 wrote:Bochs is known to be a slow emulator. If it works fast in other emulators but slowly in Bochs, that's pretty normal. ;)
But my first priority is that it should works well even on slow computers.

Re: Optimizing scrolling in VBE

Posted: Mon Feb 16, 2015 1:34 am
by Combuster
muazzam wrote:I think, it uses cx/dx instead of ecx/edx for set display start address
Don't think, know. In this case, RTFM. The manual is very explicit about it.
So, how more than 1 MiB memory is accessed?
Again RTFM on what CX and DX actually mean.
I am not operating on individual 24-bit pixel but 128 bits as a whole with SSE.
You are still plotting individual character pixels in 24 bit mode before you get there.
Also, I think, bochs does not supports 32-bit modes.
Again, don't think, know. In particular, this is a lie.
Unfortunately I can not avoid reading from video memory
This is also a lie. It has been explained already, of which gerryg400 did before you made this claim so you could have actually known you were lying.