Rectangle fills very slow(about 10 seconds).
Here is code:
Code: Select all
for i := x to w do
begin
for j := y to j + h do
begin
PutPixel(i, j, Color);
end;
end;
Code: Select all
for i := x to w do
begin
for j := y to j + h do
begin
PutPixel(i, j, Color);
end;
end;
TP is generate small and fast code.dchapiesky wrote:Hmmmm.... this is a hard one...
1) Don't use pascal
2) get rid of the p-code interpeter
3) use assembly
4) specifically 512 bit VMX instructions
5) in parallel on multiple cores
6) with prisma-chromatic fiber interconnects between your VRAM and the combustinator
Cheers and good luck!
PutPixel is not from graph unit function. It's implemented by me.monobogdan wrote:And so, my shell works too slow.
Rectangle fills very slow(about 10 seconds).
Here is code:
How to make this code faster?Code: Select all
for i := x to w do begin for j := y to j + h do begin PutPixel(i, j, Color); end; end;
Borland Turbo Pascal.dchapiesky wrote:please explain what TP means...
Several versions of Turbo Pascal, including the latest version 7, include a CRT unit used by many fullscreen text mode applications. This unit contains code in its initialization section to determine the CPU speed and calibrate delay loops. This code fails on processors with a speed greater than about 200 MHz and aborts immediately with a "Runtime error 200" message.[25] (the error code 200 had nothing to do with the CPU speed 200 MHz). This is caused because a loop runs to count the number of times it can iterate in a fixed time, as measured by the real-time clock. When Turbo Pascal was developed it ran on machines with CPUs running at 1 to 8 MHz, and little thought was given to the possibility of vastly higher speeds, so from about 200 MHz enough iterations can be run to overflow the 16-bit counter.[26] A patch was produced when machines became too fast for the original method, but failed as processor speeds increased yet further, and was superseded by others.
Sorry, it does not. It compiles fast (which is why I mainly used Turbo Pascal instead of Turbo C++ in the 90's when my computers weren't fast enough), but it doesn't produce fast code. I had to write quite a bit of assembly code (with various hacks) to make rendering fast on my machines.monobogdan wrote: TP is generate small and fast code.
The 6-byte Real type is not supported by the x87 FPU directly, so, quite a bit of 16-bit code from the system library would be involved.dchapiesky wrote:Turbo Pascal had slow floating point code...
You may find that in planar 16-color modes (e.g. the VGA 640x480x16 mode) the entire screen can be updated no faster than ~30 times per second using the most optimal code. On a 1GHz+ CPU. Which is like 3K+ CPU clocks/pixel. Ouch. I'm not sure which part of the video hardware is to blame (non-planar VGA and SVGA modes don't have this problem). I never looked into it. Slow port I/O for switching planes and setting pixel masks? Some weird compatibility feature?Brendan wrote: * Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.
For this case I pre-arrange it all as "buffer per plane" in RAM; then do "switch to plane 0; blit everything for plane 0; switch to plane 1; blit everything for plane 1; ..." (and I set the pixel mask and write mode once when setting the mode). I've never had any kind of performance problem.alexfru wrote:You may find that in planar 16-color modes (e.g. the VGA 640x480x16 mode) the entire screen can be updated no faster than ~30 times per second using the most optimal code. On a 1GHz+ CPU. Which is like 3K+ CPU clocks/pixel. Ouch. I'm not sure which part of the video hardware is to blame (non-planar VGA and SVGA modes don't have this problem). I never looked into it. Slow port I/O for switching planes and setting pixel masks? Some weird compatibility feature?Brendan wrote: * Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.
I did the same, except I probably switched the planes for every scanline, not just four times per frame. It would be interesting to test this on different machines.Brendan wrote:For this case I pre-arrange it all as "buffer per plane" in RAM; then do "switch to plane 0; blit everything for plane 0; switch to plane 1; blit everything for plane 1; ..." (and I set the pixel mask and write mode once when setting the mode). I've never had any kind of performance problem.alexfru wrote:You may find that in planar 16-color modes (e.g. the VGA 640x480x16 mode) the entire screen can be updated no faster than ~30 times per second using the most optimal code. On a 1GHz+ CPU. Which is like 3K+ CPU clocks/pixel. Ouch. I'm not sure which part of the video hardware is to blame (non-planar VGA and SVGA modes don't have this problem). I never looked into it. Slow port I/O for switching planes and setting pixel masks? Some weird compatibility feature?Brendan wrote: * Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.