Vertical Retrace Timing

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Gondolin
Posts: 9
Joined: Sun Jan 25, 2009 3:06 pm

Vertical Retrace Timing

Post by Gondolin »

Context: Freshly booted into 64-bit long mode with a VBE Linear Frame Buffer pointer. I originally tested the LFB, and though surprised that it even worked, was disappointed in the amount of visual tearing. I gradually refined my update code to use double buffering and vertical retrace sync and now it is very smooth even for highly contrasting sequential frames. I would like to share how I accomplished this since the threads about this subject are rather scattered and I found it very difficult to get solid info. Also, I have some questions about some anomalies that could be cleared up.

Approach 1 (Write Blindly to LFB):
1. Map the LFB address space to virtual memory. I give it about 8 MiB to support higher resolutions.
2. Write directly to the LFB space.
3. Goto 2

Results: Bad/random tearing. Obviously this just doesn't work, although with very mild color transitions you won't notice the difference.

Approach 2 (Double Buffer to System Memory):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer? I noticed significant speed increase with caching enabled. However, all I am doing in my infinite loop is render->backbuffer->LFB and you may not want your entire <= 8 MiB cache polluted with temporary video memory? Also I turned on caching for the LFB pages and thought "no way this is going to work" but it seemed to not make any difference?
4. Draw everything to your RAM backbuffer.
5. Copy your RAM backbuffer to the LFB. I used

Code: Select all

rep movsq
for the copy, but might a 16-byte SSE loop be faster? I assume the bus is only 64-bits wide so it would just increase instruction bandwidth...which is somewhat pointless?
6. Goto 4

Results: Still bad/random tearing and obviously slower because of the extra layer of data. I know that there is a huge amount of optimization that you can use in practice to eliminate most of the moving/copying by keeping track of what parts of the screen changes each frame, but I am just testing with a fullscreen rapid fading effect between black and white. I figured if I can get this smooth then everything else should be okay.

Approach 3 (Double Buffer and Sync to Vertical Retrace using VGA 0x3DA port):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer and/or LFB?
4. Draw everything to your RAM backbuffer.
5. Use the 0x3DA VGA port to find out when the vertical retrace starts.

Code: Select all

  mov dx, 0x3DA
VBI_POLL_END:
  in al, dx
  test al, 8
  jnz VBI_POLL_END
VBI_POLL_BEG:
  in al, dx
  test al, 8
  jz VBI_POLL_BEG
6. Copy your RAM backbuffer to the LFB.
7. Goto 4

Results: Interestingly, I got a very smooth, consistent tearing at the very top of my screen. To fix it I made one more adjustment...

Approach 4 (Double Buffer, Sync to Vertical Retrace using VGA 0x3DA port, and Timing Delay):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer and/or LFB?
4. Draw everything to your RAM backbuffer.
5. Use the 0x3DA VGA port to find out when the vertical retrace starts.
6. Wait for a while (hardware configuration dependent). For me 300,000 to 1,500,000 iterations worked.

Code: Select all

  mov rcx, 500000
.delay:
  loop .delay
6. Copy your RAM backbuffer to the LFB.
7. Goto 4

Results: With the (still pretty small) delay right after detecting the vertical retrace the tearing/shearing is completely gone. Why the delay is necessary I'm not sure? It seems the copy happens too fast, but its hard to tell what affects what in being too slow/fast etc for the monitor refresh...

Any suggestions, comments, questions?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Vertical Retrace Timing

Post by Brendan »

Hi,
Gondolin wrote:Results: With the (still pretty small) delay right after detecting the vertical retrace the tearing/shearing is completely gone. Why the delay is necessary I'm not sure? It seems the copy happens too fast, but its hard to tell what affects what in being too slow/fast etc for the monitor refresh...
How fast is your blit, and how fast is the video card sending the data to the monitor?

Imagine 2 dots racing down the screen. The first dot is the video card sending data to the monitor. The second dot is your blitting. If the second dot waits until the first dot reaches the bottom of the screen and then starts from the top of the screen, then if the second dot moves faster than the first dot all the blitting will be done before the video card sends the data to the monitor. However, if the second dot moves slower than the first dot, then part way through blitting the video card will catch up, and you'll see new data at the top of the screen and old data at the bottom of the screen.

Now, imagine if you wait until the first dot (display update) is a few lines down from the top of the screen before you start blitting. If the second dot is faster than the first dot, then you'll get tearing when the second dot overtakes the first dot (you'll get old data displayed at the top of the screen, then the blitting overtakes the display update and you get new data at the bottom of the screen). However, if the second dot moves slower than the first dot, then because the first dot started earlier it'll need to do almost an entire frame before it can possible overtake the second dot - basically the display update does an entire frame of old data before it starts the next frame of new data, and if it takes 17 ms to display a frame then the blitting can take almost 34 ms without tearing.

Basically what I'm suggesting is that the blitting is slower than the display update - for example, if it takes 17 ms for the video card to send each frame to the monitor then your blitting is taking longer than 17 ms (but less than 34 ms).

The best way to solve the problem is called "page flipping". You have 2 areas in video display memory, where one area is being sent to the monitor and the other area is being modified by you; and when you've got a frame ready you wait for retrace and then tell the video card to display the other area. This works right regardless of whether blitting is faster or slower than display update. Page flipping is what "VBE Function 0x07 - Set/Get Display Start" is most often used for.

Also, the VGA card's "retrace in progress" flag may not work (and may be something completely different) if the video card isn't VGA compatible and/or isn't operating in "VGA compatible mode". In this case your code can lock up (e.g. waiting for a bit that never changes to change). If you use "VBE Function 0x07 - Set/Get Display Start" this isn't a problem because you can ask this function to wait until retrace before changing the display start (e.g. call the function with "BL = 0x80").


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Gondolin
Posts: 9
Joined: Sun Jan 25, 2009 3:06 pm

Re: Vertical Retrace Timing

Post by Gondolin »

Thanks for your help Brendan. Your explanation of the dots makes a lot of sense. I thought, how could the VGA vertical retrace bit be set if it hasn't started already? But if its actually one whole frame slower then its playing the reverse catch up as you said. I'm not sure how fast the bitblit is. I know the monitor refresh rate is 60Hz (16 ms). The resolution is 1280x800x32 and I have no optimizations while copying the data from the backbuffer to the LFB. That's 4 MiB copied 8-bytes at a time with

Code: Select all

rep movsq
. I'm not sure how fast the bus can get that much data to the video card. I suppose the only effective way to find out would be to measure. It should be fairly simple with the RDTSC, although I wonder if this would be accurate for the time it takes for the data to arrive completely in VRAM?

Then again, it could also be that the 0x3DA bit has nothing to do with vertical retrace and is just being set periodically per refresh cycle? It does seem to be consistent across each frame though. I'm sure the card is VGA compatible, but what does "VGA compatible mode" mean? I'm not specifically doing anything to set up VGA, I just get the LFB with VBE while in real mode.

That leads to the other problem of course. I would use the "Set/Get Display Start" page flipping but this is 64-bit long mode, and I understand this leaves only two options for calling the VBE functions directly, writing a 16-bit mode emulator or switching back to real mode. Both are quite ugly, but if this really is the only way to get the VBE LFB to work smoothly, which would you recommend?

On a slightly unrelated note, I know ATI/AMD have been releasing card documentation gradually. I've skimmed through the R5XX doc which seems to be the most complete and it looked as if there was enough information to write a driver. The doc provides the concept of how the card works, then all the register and command packet information. It does not precisely tell you how to put them all together but I'm assuming there are other places to figure that out. Has anyone actually used this information to write a full 3D hardware accelerated driver? Or at least 2D? I've looked at the Nouveau and Haiku projects but those appear to be mostly reverse engineered and before the docs were released.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Vertical Retrace Timing

Post by Brendan »

Hi,
Gondolin wrote:Thanks for your help Brendan. Your explanation of the dots makes a lot of sense. I thought, how could the VGA vertical retrace bit be set if it hasn't started already? But if its actually one whole frame slower then its playing the reverse catch up as you said. I'm not sure how fast the bitblit is. I know the monitor refresh rate is 60Hz (16 ms). The resolution is 1280x800x32 and I have no optimizations while copying the data from the backbuffer to the LFB. That's 4 MiB copied 8-bytes at a time with

Code: Select all

rep movsq
. I'm not sure how fast the bus can get that much data to the video card.
I don't know what sort of bus your video card is using, or what sort of RAM is being used for video display memory. You can find theoretical maximum transfer rates on this wikipedia page; although don't forget there's usually a significant different between the theoretical maximum and the transfer rate you actually get in practice (and don't forget that video display memory is being read by the video card and written to at the same time).
Gondolin wrote:I suppose the only effective way to find out would be to measure. It should be fairly simple with the RDTSC, although I wonder if this would be accurate for the time it takes for the data to arrive completely in VRAM?
RDTSC should be accurate enough; but it'll only tell you about your computer, and won't tell you about everyones else's computers...
Gondolin wrote:Then again, it could also be that the 0x3DA bit has nothing to do with vertical retrace and is just being set periodically per refresh cycle? It does seem to be consistent across each frame though. I'm sure the card is VGA compatible, but what does "VGA compatible mode" mean? I'm not specifically doing anything to set up VGA, I just get the LFB with VBE while in real mode.
Mostly "VGA compatible" means that the card supports the standard VGA I/O ports when it's in a standard VGA mode. Once you start using SVGA all bets are off. ;)

I'd assume that the "retrace in progress" flag is doing exactly what it should on your computer. Unfortunately your OS needs to be able to work on more than just your computer...
Gondolin wrote:That leads to the other problem of course. I would use the "Set/Get Display Start" page flipping but this is 64-bit long mode, and I understand this leaves only two options for calling the VBE functions directly, writing a 16-bit mode emulator or switching back to real mode. Both are quite ugly, but if this really is the only way to get the VBE LFB to work smoothly, which would you recommend?
They aren't the only options.

VBE has protected mode interface/s, which includes the "Set Display Start" sub-function (but not the "Get Display Start" sub-function); and it should be entirely possible to use these protected mode interfaces from long mode (using 32-bit or 16-bit code segments).

You might also be able to blit less data to the screen every frame - it's very rare for every single pixel to change every single frame, and therefore it's very likely that you only need to update some of the data instead of all of it. If you only update half the screen then your blit would be faster than the screen update.

The other thing I'd consider is not bothering to wait for retrace and letting the video tear (unless/until you've got a native video driver for the video card that uses page flipping and a vertical retrace IRQ). I mean let's be honest here - the user can't expect the same quality and performance from a legacy "no video driver" interface as they'd expect from a driver designed specifically for the video card.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Dex
Member
Member
Posts: 1444
Joined: Fri Jan 27, 2006 12:00 am
Contact:

Re: Vertical Retrace Timing

Post by Dex »

From my work with vesa all in 32bit pmode, i never have any problem with tearing/shearing, as long as you use Double Buffeing.
The retrace is only needed as far as far as timing goes, as in making the program run at the same speed on differant speed PC.
I would be suprized if it makes any differance when your talking vesa (and not the old mode 13h)
I have done many test in the form of buffer to lfb writes, using many method and theres not one fix, what will speed the fps on one PC, will be slower in another, even if both have say MMX.
Setting MTRR to write combine, can give 30-40% inc in FPS, when just dumping back buffer to lfb, in a loop.
Write some timer code and test the fastest way to dump back buffer to to screen.
Gondolin
Posts: 9
Joined: Sun Jan 25, 2009 3:06 pm

Re: Vertical Retrace Timing

Post by Gondolin »

Wow, I had thought the VBE 2.0 protected mode interface relied on 16-bit real mode code, but it is just 32-bit protected mode code. This can easily be used in 64-bit long mode with a couple of wrapper far jumps to a compatibility mode code selector. I also disabled interrupts during the call but I disabled them during the buffer copy anyway. This seems to work really nicely. I split the VRAM into 2 sections, copy to one while using setDisplayStart to select the other and sync with the vertical retrace. This produces smooth transitions and achieves decent performance. As both of you say, you don't need to copy the whole buffer usually. I just started with the worst case (solid screen color changing every frame) to get that working. Then I can work in optimizations if I need more speed.
Post Reply