Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
bloodline wrote:PCI Config offset 14 (BAR1) can be configured via registers CF8, CF4, and CF3... Which is oddly cryptic as such "registers" don't appear to be located anywhere...
I took another look at the datasheet and it's in there, section 3.2, but it's not something you can manipulate in software: it's controlled by whether resistors are wired up to specific lines of the memory data bus.
Doh! Yup, when I did that run, I did have -vga std option set
But still, you pointing out my errors have helped me far more than the documentation did
bloodline wrote:Anyway, I might give-up with the old Cirrus chip and follow @thewrongchristian 's advice and try to find some documentation for QEMU's virtio display adaptor...
I have been looking at this one, it's perhaps more impenetrable than the Cirrus Document But Just setting -vga virtio as an option in qemu has afforded a slight speed improvement... So that's the route to go down.
Octocontrabass wrote:Probably for firmware. Actual GPU drivers use DMA to move things around, they usually don't access the framebuffer directly. (Does "memory schedules" refer to a type of DMA?)
Yes. The kernel driver will construct a memory schedule of work to be done, and then the PCIe device will read & write the schedule with DMA (bus mastering).
Anyway, this seems to explain why performance with LFB is slower than using the GPU interface. Something that appears to be a bit illogical at first. However, BARs never have the same performance as bus mastering. It also explains why the LFB should only be written and not read. When doing a read of a BAR, the CPU will need to wait for the PCIe device to read the contents from local RAM and then send it back as a PCIe transaction. With writing using the correct caching settings & a decently implemented PCIe device, the CPU shouldn't need to wait for the PCIe device to handle the request.
I think you're over thinking this.
This is the CL GD5446 we're talking about, a value PCI GFX chipset from the 1990's. Its "GPU" was a simple blitter, and all the VRAM was on the device side of the PCI bus, and relied on write combining for burst performance when writing to the FB.
Certainly, but I'm wondering why modern Intel graphics chips have such a poor performance, and why AMD chips tend to perform a lot better. A poor implementation of the LFB via BARs certainly can explain it. Since Intel assumes everybody will use the GPU interface, they didn't bother with providing speed in the BAR interface.
rdos wrote:
Anyway, this seems to explain why performance with LFB is slower than using the GPU interface. Something that appears to be a bit illogical at first. However, BARs never have the same performance as bus mastering. It also explains why the LFB should only be written and not read. When doing a read of a BAR, the CPU will need to wait for the PCIe device to read the contents from local RAM and then send it back as a PCIe transaction. With writing using the correct caching settings & a decently implemented PCIe device, the CPU shouldn't need to wait for the PCIe device to handle the request.
I think you're over thinking this.
This is the CL GD5446 we're talking about, a value PCI GFX chipset from the 1990's. Its "GPU" was a simple blitter, and all the VRAM was on the device side of the PCI bus, and relied on write combining for burst performance when writing to the FB.
Certainly, but I'm wondering why modern Intel graphics chips have such a poor performance, and why AMD chips tend to perform a lot better. A poor implementation of the LFB via BARs certainly can explain it. Since Intel assumes everybody will use the GPU interface, they didn't bother with providing speed in the BAR interface.
Doesn't the Intel GPU operate exclusively via the regular shared system RAM?
So, writing to the framebuffer is just a case of writing to the physical RAM you've indicated to the GPU to pull the framebuffer contents from. As I understand it, BAR2 indicates the physical memory address of this shared framebuffer RAM, but I've not poked it. I'm still mostly QEMU based at the moment.
The performance (or lack thereof) in the Intel GPU will be a function of the GPU itself. AMD are probably just better at GPUs than Intel.
thewrongchristian wrote:
So, writing to the framebuffer is just a case of writing to the physical RAM you've indicated to the GPU to pull the framebuffer contents from. As I understand it, BAR2 indicates the physical memory address of this shared framebuffer RAM, but I've not poked it. I'm still mostly QEMU based at the moment.
No, when you read or write to a BAR area you will create PCIe requests to the GPU which it needs to serve in real time. It will typically map some of it's local RAM to the BAR, and then it is the respnosibility of the GPU to route between those. If you do a quick, pipe-lined solution, this could indeed end up as highly inefficient. I know because I have implemented BARs myself, and I decided I needed to do bus mastering requests to main memory to achieve the throughput I wanted. If I let the CPU read the BAR, it's too slow, but when the PCIe card uses bus-mastering, I can read it from main memory very fast.
thewrongchristian wrote:Doesn't the Intel GPU operate exclusively via the regular shared system RAM?
Yes. Recent ones participate in the cache coherency protocol, too.
thewrongchristian wrote:As I understand it, BAR2 indicates the physical memory address of this shared framebuffer RAM, but I've not poked it.
BAR2 provides a window into the GPU's view of RAM according to the GPU's page tables. On a recent GPU with coherent shared memory, I would expect it to be slower than directly accessing the memory from the CPU's view.