Vertical Retrace Timing
Posted: Sat May 16, 2009 9:14 pm
Context: Freshly booted into 64-bit long mode with a VBE Linear Frame Buffer pointer. I originally tested the LFB, and though surprised that it even worked, was disappointed in the amount of visual tearing. I gradually refined my update code to use double buffering and vertical retrace sync and now it is very smooth even for highly contrasting sequential frames. I would like to share how I accomplished this since the threads about this subject are rather scattered and I found it very difficult to get solid info. Also, I have some questions about some anomalies that could be cleared up.
Approach 1 (Write Blindly to LFB):
1. Map the LFB address space to virtual memory. I give it about 8 MiB to support higher resolutions.
2. Write directly to the LFB space.
3. Goto 2
Results: Bad/random tearing. Obviously this just doesn't work, although with very mild color transitions you won't notice the difference.
Approach 2 (Double Buffer to System Memory):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer? I noticed significant speed increase with caching enabled. However, all I am doing in my infinite loop is render->backbuffer->LFB and you may not want your entire <= 8 MiB cache polluted with temporary video memory? Also I turned on caching for the LFB pages and thought "no way this is going to work" but it seemed to not make any difference?
4. Draw everything to your RAM backbuffer.
5. Copy your RAM backbuffer to the LFB. I used for the copy, but might a 16-byte SSE loop be faster? I assume the bus is only 64-bits wide so it would just increase instruction bandwidth...which is somewhat pointless?
6. Goto 4
Results: Still bad/random tearing and obviously slower because of the extra layer of data. I know that there is a huge amount of optimization that you can use in practice to eliminate most of the moving/copying by keeping track of what parts of the screen changes each frame, but I am just testing with a fullscreen rapid fading effect between black and white. I figured if I can get this smooth then everything else should be okay.
Approach 3 (Double Buffer and Sync to Vertical Retrace using VGA 0x3DA port):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer and/or LFB?
4. Draw everything to your RAM backbuffer.
5. Use the 0x3DA VGA port to find out when the vertical retrace starts.
6. Copy your RAM backbuffer to the LFB.
7. Goto 4
Results: Interestingly, I got a very smooth, consistent tearing at the very top of my screen. To fix it I made one more adjustment...
Approach 4 (Double Buffer, Sync to Vertical Retrace using VGA 0x3DA port, and Timing Delay):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer and/or LFB?
4. Draw everything to your RAM backbuffer.
5. Use the 0x3DA VGA port to find out when the vertical retrace starts.
6. Wait for a while (hardware configuration dependent). For me 300,000 to 1,500,000 iterations worked.
6. Copy your RAM backbuffer to the LFB.
7. Goto 4
Results: With the (still pretty small) delay right after detecting the vertical retrace the tearing/shearing is completely gone. Why the delay is necessary I'm not sure? It seems the copy happens too fast, but its hard to tell what affects what in being too slow/fast etc for the monitor refresh...
Any suggestions, comments, questions?
Approach 1 (Write Blindly to LFB):
1. Map the LFB address space to virtual memory. I give it about 8 MiB to support higher resolutions.
2. Write directly to the LFB space.
3. Goto 2
Results: Bad/random tearing. Obviously this just doesn't work, although with very mild color transitions you won't notice the difference.
Approach 2 (Double Buffer to System Memory):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer? I noticed significant speed increase with caching enabled. However, all I am doing in my infinite loop is render->backbuffer->LFB and you may not want your entire <= 8 MiB cache polluted with temporary video memory? Also I turned on caching for the LFB pages and thought "no way this is going to work" but it seemed to not make any difference?
4. Draw everything to your RAM backbuffer.
5. Copy your RAM backbuffer to the LFB. I used
Code: Select all
rep movsq
6. Goto 4
Results: Still bad/random tearing and obviously slower because of the extra layer of data. I know that there is a huge amount of optimization that you can use in practice to eliminate most of the moving/copying by keeping track of what parts of the screen changes each frame, but I am just testing with a fullscreen rapid fading effect between black and white. I figured if I can get this smooth then everything else should be okay.
Approach 3 (Double Buffer and Sync to Vertical Retrace using VGA 0x3DA port):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer and/or LFB?
4. Draw everything to your RAM backbuffer.
5. Use the 0x3DA VGA port to find out when the vertical retrace starts.
Code: Select all
mov dx, 0x3DA
VBI_POLL_END:
in al, dx
test al, 8
jnz VBI_POLL_END
VBI_POLL_BEG:
in al, dx
test al, 8
jz VBI_POLL_BEG
7. Goto 4
Results: Interestingly, I got a very smooth, consistent tearing at the very top of my screen. To fix it I made one more adjustment...
Approach 4 (Double Buffer, Sync to Vertical Retrace using VGA 0x3DA port, and Timing Delay):
1. Map the LFB address space to virtual memory.
2. Map a backbuffer to RAM with the same size as the LFB.
3. Turn on caching for the backbuffer and/or LFB?
4. Draw everything to your RAM backbuffer.
5. Use the 0x3DA VGA port to find out when the vertical retrace starts.
6. Wait for a while (hardware configuration dependent). For me 300,000 to 1,500,000 iterations worked.
Code: Select all
mov rcx, 500000
.delay:
loop .delay
7. Goto 4
Results: With the (still pretty small) delay right after detecting the vertical retrace the tearing/shearing is completely gone. Why the delay is necessary I'm not sure? It seems the copy happens too fast, but its hard to tell what affects what in being too slow/fast etc for the monitor refresh...
Any suggestions, comments, questions?