640x480x16 Performance?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

640x480x16 Performance?

Post by rkennedy9064 »

I've been playing around with vga modes and was wondering about the performance of vga 640x480x16. 320x200x256 was pretty easy to implement since it's 1 byte per pixel, but 640x480x16 seems to have a lot of performance costs since you have to calculate which of the 8 pixels you want to set. From what I've seen write mode 2 seems to be one of the faster ways to display things on the screen. I currently have a screen buffer I use to store the color data for each pixel, then a function to copy the data to the screen. I know that if all the pixels are the same color you can use the set/reset register to write 8 bits at a time, but is there any faster way to write data to the screen if each pixel can be any color? This is the code I have to copy the data to the screen currently.

Code: Select all

let frame_buffer = self.get_frame_buffer();
for offset in 0..SIZE {
    let color = self.screen_buffer[offset];
    // Set the mask to the pixel being modified
    vga.graphics_controller_registers
        .set_bit_mask(0x80 >> (offset & 0x7));
    // Faster then offset / 8 ?
    let offset = offset >> 3;
    unsafe {
        // Load the memory latch with 8 pixels
        frame_buffer.add(offset).read_volatile();
        // Write the color to the masked pixel
        frame_buffer.add(offset).write_volatile(color);
    }
}
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: 640x480x16 Performance?

Post by alexfru »

No, you should:
  • not read from VGA RAM
  • not redraw unchanged parts (unless it's just a few pixels/chars here and there or the text cursor or mouse pointer)
  • generally avoid (frequent) changes on the screen (e.g. don't redraw windows while they're being dragged across the screen, don't do any other expensive animation like transparency, blending, gradually collapsing/expanding windows, etc)
  • minimize VGA I/O port traffic (e.g. handle one plane of several adjacent pixels at once, then another plane, and so on; e.g. you can fill a horizontal scanline in 4 takes, a plane at a time)
  • prefer filling various shapes (window borders, buttons, etc) with solid colors, minimize the use of bitmaps/images
  • have optimized text output (you can optimize for solid colors, for empty space above/below chars, etc etc)
Oh, if you have multiple CPUs and can boot them, do that too.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: 640x480x16 Performance?

Post by bzt »

Oh, you mean 16 colors (4 bits) and not 16 bits (64k colors)? Then alexfru is right.

I'd highly recommend against that mode. But If you insist, the only way to make it fast is to have a copy of the screen in RAM. Then
- when you modify a pixel, you create a list of boxes, areas that has changed. Updating a window can mark the entire window area, no need to check every pixel
- on vertical retrace, you "flush" the screen, you iterate through that list 4 times: first you switch to plane 0, then you update every pixel's 1st bit in the list by copying the area from the RAM screen to the VGA screen. Then you switch to plane 1, and copy the 2nd bits, etc.
- when done, you clear the list

This minimizes the number of plane switches (which is slow), and updates video RAM when it is not used (during a vertical retrace). Also you can calculate the screen offsets of the boxes once and reuse them for all 4 plane writes.

If I were you, I'd rather consider using SVGA mode at least (or even better VESA). If you don't use the VESA 2.0 extensions, some modes have fixed mode values. and you can set them with INT 10h/AH=0. But you really should use VESA (INT 10h/AH=4F) instead, that's more supported and also provides linear frame buffer access (no need for switch banks). Bank switch is necessary as the VGA has a 64k MMIO at 0xA0000, but a 640x480x16 (16 bits, not colors) resolution requires a little bit more than 600k of memory, so you have to map a certain 64k of that 600k at 0xA0000. Linear frame buffers use an address other than 0xA0000 (typically 0xE000000) and they are contiguous, meaning you can only access those from protected mode, but in return you can use the entire 600k as-is.

Cheers,
bzt
Gigasoft
Member
Member
Posts: 856
Joined: Sat Nov 21, 2009 5:11 pm

Re: 640x480x16 Performance?

Post by Gigasoft »

How often do you need to set a single pixel? Not very often, I assume.

Transferring a rectangle to the screen should be done one plane at a time in write mode 0. The left border, middle and right border is done separately. Solid shapes and transparent text are drawn with write mode 3. For pattern fills whose pattern width is a multiple of 8, you can use write mode 1 to replicate the pattern, but I don't know if it is faster than just writing the entire thing in mode 0. I have never found an use for write mode 2.
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

Re: 640x480x16 Performance?

Post by rkennedy9064 »

Thanks for the quick responses. Yeah the mode I'm talking about is 16 color mode, sorry if there was any confusion. I'll play around with these suggestions and see if I can get something working. I looked into SVGA mode a bit, but it seems like there isn't a standard for registers like VGA. If anyone has good links to SVGA documentation I'd love to see that. As for VESA, I'm currently running in long mode, so I'm not sure how viable it would be to try and call a bios interrupt at runtime to switch modes.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: 640x480x16 Performance?

Post by bzt »

rkennedy9064 wrote:Thanks for the quick responses. Yeah the mode I'm talking about is 16 color mode, sorry if there was any confusion. I'll play around with these suggestions and see if I can get something working.
You're welcome!
rkennedy9064 wrote:I looked into SVGA mode a bit, but it seems like there isn't a standard for registers like VGA. If anyone has good links to SVGA documentation I'd love to see that.
All VGA registers are working as usual in SVGA. There're some additional registers for bank switching, but I'm afraid they were never standardized, you'll have to write a driver for each video card.
rkennedy9064 wrote:As for VESA, I'm currently running in long mode, so I'm not sure how viable it would be to try and call a bios interrupt at runtime to switch modes.
Well, the same stands for SVGA too. Basically you have three options (from easier to harder):
1. set up video mode while you're still in real mode, before you enter long mode
2. temporarily switch back to real mode, call the interrupt then switch back to long mode
3. write an emulator, and interpret the code in the video ROM (you read the address for INT 10h from the IVT, and you interpret the real mode instructions there in long mode. This is not as hard as it first seems, but definitely the hardest of the three).

Also note that using SVGA/VESA is BIOS legacy, meaning most modern computers don't support it. There you'll have to use GOP under UEFI, which suggests you should switch the mode during boot (either BIOS+VESA or UEFI+GOP), and then your kernel can use the framebuffer without caring about how it got it. To have a mode switch code which works on all firmware, I'm afraid you'll have to write device drivers, because post-VGA cards are not standardized.

Cheers,
bzt
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

Re: 640x480x16 Performance?

Post by rkennedy9064 »

Do you happen to have a reference, or a way to get start on an emulator to interpret the instructions? I also found out that qemu emulates an ati r128p, so I was able to qemu with it and my original VGA code still works on that card. I'd be interested in trying out both solutions if I can find some decent documentation on the emulator.
klange
Member
Member
Posts: 679
Joined: Wed Mar 30, 2011 12:31 am
Libera.chat IRC: klange
Discord: klange

Re: 640x480x16 Performance?

Post by klange »

rkennedy9064 wrote:Do you happen to have a reference, or a way to get start on an emulator to interpret the instructions? I also found out that qemu emulates an ati r128p, so I was able to qemu with it and my original VGA code still works on that card. I'd be interested in trying out both solutions if I can find some decent documentation on the emulator.
QEMU's default for many years now is to provide an interface adopted from Bochs which has a very easy API for modesetting. I highly recommend implementing drivers for the Bochs "BGA" device. The old default in QEMU was a Cirrus CLGD 5446, a very limited VGA-compatible card but one you can find references for if you want to directly program it. The ATI r128pro option is relatively new (~1 year old now) and is probably not available for a lot of users, but it's slightly more modern (1999 vs. the Cirrus's 1996) and is supposed to have 3D functionality though I'm not sure where QEMU is in implementing that.
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

Re: 640x480x16 Performance?

Post by rkennedy9064 »

Thanks, this is what I found https://www.kraxel.org/blog/2019/09/dis ... s-in-qemu/. I'm assuming the bochs device is specific to boxhs, so eventually I'd have to write my own driver right?
ati vga
qemu: -device ati-vga.
✓ VGA compatible
✓ vgabios support
✗ no UEFI support

Emulates two ATI SVGA devices, the model property can be used to pick the variant. model=rage128p selects the "Rage 128 Pro" and model=rv100 selects the "Radeon RV100".

The devices are newer (late 90ies / early 2000) and more modern than the cirrus VGA. Nevertheless the use case is very similar: For guests of a similar age which are shipping with drivers for those devices.

This device has been added recently to qemu, development is in progress still. The fundamentals are working (modesetting, hardware cursor). Most important 2D accel ops are implemented too. 3D acceleration is not implemented yet.

Linux has both drm and fbdev drivers for these devices. The drm drivers are not working due to emulation being incomplete still (which hopefully changes in the future). The fbdev drivers are working. Modern linux distros prefer the drm drivers though. So you probably have to build your own kernel if you want use this device.
klange
Member
Member
Posts: 679
Joined: Wed Mar 30, 2011 12:31 am
Libera.chat IRC: klange
Discord: klange

Re: 640x480x16 Performance?

Post by klange »

rkennedy9064 wrote:Thanks, this is what I found https://www.kraxel.org/blog/2019/09/dis ... s-in-qemu/. I'm assuming the bochs device is specific to boxhs, so eventually I'd have to write my own driver right?
It seems like the ATI device emulation is still incomplete and it's probably not a good thing to invest time on now (unless you have an actual Rage sitting around and you're hoping to eventually use it).

The "Bochs graphics adapter" is supported by Bochs, VirtualBox, and QEMU; it's been the default in the latter for a long time now, so it's a very good choice if you want to have a graphical output and don't want to spend time fiddling with legacy interfaces (eg. emulating a BIOS, which you won't even get very far with on a UEFI system) or writing complex drivers for real cards (which is very time-consuming and not entirely necessary if you can let a bootloader like Grub do the hard parts for you). The biggest benefit of the "BGA" is that it provides access to a linear framebuffer with arbitrary resolutions and high color depth with just a few simple port writes - it was designed to be easy to write drivers for, so that guest OSes could run in emulators with less overhead than if they had to configure a real device. We have an article on the wiki that explains how to set the display mode and find the the address of the framebuffer to map. Linear framebuffers are especially useful because once you have one, you no longer need to care all that much about the video device unless you want to modeset again. If/when you look at UEFI, you'll find that it has an API for setting up a linear framebuffer at boot. Grub also can do this on most BIOS machines. Any other graphics card driver you write for a real card (or other emulated cards like VMware's SVGA implementation, which is different from the BGA shared by Bochs/VirtualBox/QEMU) will also provide the same kinda of framebuffer interface.
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

Re: 640x480x16 Performance?

Post by rkennedy9064 »

bzt wrote: 3. write an emulator, and interpret the code in the video ROM (you read the address for INT 10h from the IVT, and you interpret the real mode instructions there in long mode. This is not as hard as it first seems, but definitely the hardest of the three).
I was curious about option 3. Do you mean grab the the address stored in 0000:0040, then start reading the the data, decode it and then do whatever the encoded instruction would have done if the bios actually called it? I'd be interested in looking into that. Do you have any reference links, or places to start?
klange
Member
Member
Posts: 679
Joined: Wed Mar 30, 2011 12:31 am
Libera.chat IRC: klange
Discord: klange

Re: 640x480x16 Performance?

Post by klange »

rkennedy9064 wrote:
bzt wrote: 3. write an emulator, and interpret the code in the video ROM (you read the address for INT 10h from the IVT, and you interpret the real mode instructions there in long mode. This is not as hard as it first seems, but definitely the hardest of the three).
I was curious about option 3. Do you mean grab the the address stored in 0000:0040, then start reading the the data, decode it and then do whatever the encoded instruction would have done if the bios actually called it? I'd be interested in looking into that. Do you have any reference links, or places to start?
Yep. You can take a look at libraries like libx86emu which I believe is what XOrg uses to implement its legacy VESA fallback driver.
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

Re: 640x480x16 Performance?

Post by rkennedy9064 »

Awesome, thanks for the info!
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: 640x480x16 Performance?

Post by Korona »

Today, it might be easier to just use VMX or SVM to do that.

I have another comment regarding:
not redraw unchanged parts (unless it's just a few pixels/chars here and there or the text cursor or mouse pointer)
That's correct in general but don't overdo it. Maintaining a bounding rect of the changes is a good idea. Anything more fine-grained needs to be very carefully engineered to not decrease performance due to: (i) introducing difficult to predict branches, (ii) making the memory access pattern non-linear (also to the cached source memory) and (iii) utilizing store buffers less efficiently.

(Of course, there are exceptions, e.g., if you have a specialized method to draw a game sprite of some shape that makes it is really easy to predict even non-rectangular changes to VRAM - but that's a very narrow special case.)

Also, do not forget to map video RAM as write-combining. That easily has the highest performance impact on real hardware and yields a speedup of multiple orders of magnitude.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
rkennedy9064
Member
Member
Posts: 36
Joined: Wed Sep 01, 2010 3:54 pm

Re: 640x480x16 Performance?

Post by rkennedy9064 »

Korona wrote:Today, it might be easier to just use VMX or SVM to do that.
I'm not familiar with VMX and SVM. What do those stand for? Also thanks for the tips on write-combining. I wasn't aware of that. I'll have to look into that more.
Post Reply