Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
I fixed the PCI-stuff and the code now works under VirtualBox like it should.
That means the SSE-move runs on real hardware (under Windows, Linux), Bochs, QEMU and VirtualBox. It also works under VirtualPC (under Windows). But not with VirtualPC and my OS.
I found out that all aligned MMX and SSE reads and writes to video memory fail under Virtual PC. It works when the code works with normal system memory.