True, dst = src works fast enough for me, especially at O2.alexfru wrote:Which makes me wonder whether there was unnecessary stuff in your loops (or loops were too short) or you were compiling with compiler optimizations disabled.bloodline wrote:ok, I replaced the horizontal line copy part of my "blitting" functions with an inline asm "rep movsl" and the speed improvement is mind blowing! Literally several orders of magnitude!
Properly structuring the code (and, of course, using effective algorithms) and enabling optimizations usually works quite well.
X86 asm experts: moving large blocks of data.
Re: X86 asm experts: moving large blocks of data.
Re: X86 asm experts: moving large blocks of data.
8infy wrote:True, dst = src works fast enough for me, especially at O2.alexfru wrote:Which makes me wonder whether there was unnecessary stuff in your loops (or loops were too short) or you were compiling with compiler optimizations disabled.bloodline wrote:ok, I replaced the horizontal line copy part of my "blitting" functions with an inline asm "rep movsl" and the speed improvement is mind blowing! Literally several orders of magnitude!
Properly structuring the code (and, of course, using effective algorithms) and enabling optimizations usually works quite well.
Ok, so I haven’t been using any optimisation options, for two reasons; Firstly I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand, and secondly my motivation for this project comes from wanting to learn more about the modern PC architecture.
I’m going to experiment with some compiler options now!
-edit-
-O2 seems to produced the fastest code, -O3 just crashes
CuriOS: A single address space GUI based operating system built upon a fairly pure Microkernel/Nanokernel. Download latest bootable x86 Disk Image: https://github.com/h5n1xp/CuriOS/blob/main/disk.img.zip
Discord:https://discord.gg/zn2vV2Su
Discord:https://discord.gg/zn2vV2Su
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: X86 asm experts: moving large blocks of data.
Compilers have come a long way since then! You might try Compiler Explorer to see some of the ways optimization affects your code. It's a good idea to try your code with different optimization levels every so often to catch bugs.bloodline wrote:I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand
Either you've got some undefined behavior in your code, or your compiler is configured to emit instructions that aren't supported. If it's undefined behavior, the compiler might be able to warn you about it (if you turn on the correct warnings).bloodline wrote:-O3 just crashes
Re: X86 asm experts: moving large blocks of data.
In practicality live on compiler explorer, also love watching Godbolt’s lectures on YouTube!!Octocontrabass wrote:Compilers have come a long way since then! You might try Compiler Explorer to see some of the ways optimization affects your code. It's a good idea to try your code with different optimization levels every so often to catch bugs.bloodline wrote:I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand
It’s fascinating to see how tiny changes I my C/C++ can lead to big differences in the generated code!
Hmmm, I’m careful to deal with warnings, so they don’t hide errors... I’ll probably explore which part of my code is causing -O3 optimisation issues once I’ve completed my code clean up!Either you've got some undefined behavior in your code, or your compiler is configured to emit instructions that aren't supported. If it's undefined behavior, the compiler might be able to warn you about it (if you turn on the correct warnings).bloodline wrote:-O3 just crashes
CuriOS: A single address space GUI based operating system built upon a fairly pure Microkernel/Nanokernel. Download latest bootable x86 Disk Image: https://github.com/h5n1xp/CuriOS/blob/main/disk.img.zip
Discord:https://discord.gg/zn2vV2Su
Discord:https://discord.gg/zn2vV2Su
Re: X86 asm experts: moving large blocks of data.
Oops. A big one.bloodline wrote:Ok, so I haven’t been using any optimisation options, for two reasons;
Being "old school" may explain the -O3 crashes.bloodline wrote:Firstly I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand, and secondly my motivation for this project comes from wanting to learn more about the modern PC architecture.
I’m going to experiment with some compiler options now!
-edit-
-O2 seems to produced the fastest code, -O3 just crashes
Today's computers enable today's compilers to analyze and optimize code in ways impractical some 30 years ago. What old compilers simply couldn't do today's do easily.
This has two important effects.
The most obvious one is that you get fast code without having to resort to assembly or lots of trivial manual optimizations.
The least obvious effect is that today's compilers aren't looking at source code (or its internal representation) through a keyhole, they are seeing lots of it at once and remembering a lot of what they have seen for much longer. This means that they can (and do) make and apply decisions across large chunks of code. Specifically, if your code has what's known as "undefined behavior" (which has existed since the first C language standard of 1989, btw), this undefined behavior may drive your compiler into generating "broken" code that doesn't work how you want or expect it to. Because of undefined behavior, compiled code can be "broken" in some very strange ways, such that the effects of the undefined behavior aren't localized to the line or statement where it occurs at the source code level, but are far removed from there. Old compilers couldn't do such analysis across large chunks of code and therefore undefined behavior was usually limited to where it occurred at the source level and so the effects of UB were immediate and easy to deal with. Often times there were no perceived ill effects of UB and things "just worked". This is not so anymore. And it's not because today's compilers are somehow mean and evil. The C language standard allowed old compilers to be mean and evil just as well. It's just that it took some technological progress for that meanness to manifest.
Allow me to show you an example of completely unexpected undefined behavior that I've been bitten by years ago:
Code: Select all
void f(int a, int position) {
int bit = (a & (1 << position)) != 0;
if (bit)
puts("bit=1");
else
puts("bit=0");
}
There are many more examples of undefined behavior, it's just one that resulted a totally unexpected crash.
My understanding is that Linus Torvalds writes the Linux kernel as if there was no technological progress or undefined behavior and it still was 1988. He does so by disallowing specific optimizations in the compiler. And he probably keeps adding such disallowing options in the makefile. I think it's a rather unfortunate situation with deliberate misuse of the language and compiler. Perhaps, using a different language, with which he wouldn't need to fight, would be better.
I think you should learn about undefined behavior and eradicate it and enjoy compiling your code with high levels of optimization.
Re: X86 asm experts: moving large blocks of data.
Good advice, I appreciate it! I agree with letting the Compiler do the hard work, but I do want to stay close to the hardware for now, as I’m enjoying learning the PC architecture, it’s very new to me.alexfru wrote:Oops. A big one.bloodline wrote:Ok, so I haven’t been using any optimisation options, for two reasons;
As for my -O3 crash, I’ve probably forgotten to mark a variable being accessed in an interrupt as a volatile, or a variable had been optimised away, or something. I’m coming to the end of a huge code clean up... which was also a significant rewrite, so mistakes will be there, aplenty!
CuriOS: A single address space GUI based operating system built upon a fairly pure Microkernel/Nanokernel. Download latest bootable x86 Disk Image: https://github.com/h5n1xp/CuriOS/blob/main/disk.img.zip
Discord:https://discord.gg/zn2vV2Su
Discord:https://discord.gg/zn2vV2Su
Re: X86 asm experts: moving large blocks of data.
Linux is compiled with -fno-delete-null-pointer-checks because (surprise, surprise!) when you're the kernel you're allowed to map page 0 as read-write. This flag is not incidental, it was added because its absence caused a bug. This seems completely reasonable. I also compile my kernel with -fno-delete-null-pointer-checks, and commend everybody else to do so too.alexfru wrote: My understanding is that Linus Torvalds writes the Linux kernel as if there was no technological progress or undefined behavior and it still was 1988. He does so by disallowing specific optimizations in the compiler. And he probably keeps adding such disallowing options in the makefile. I think it's a rather unfortunate situation with deliberate misuse of the language and compiler. Perhaps, using a different language, with which he wouldn't need to fight, would be better.
Re: X86 asm experts: moving large blocks of data.
You can, but why would you want to? Address 0 is used as the null pointer in every C implementation I know, and dereferencing the null pointer is always undefined behavior. I can sort of see a point to this if the architecture demands important things be written at that address (IIRC ARM puts the exception vectors there, right?) but otherwise, I would never put anything at address 0, since that way I will find null pointer dereference bugs more easily.moonchild wrote:Linux is compiled with -fno-delete-null-pointer-checks because (surprise, surprise!) when you're the kernel you're allowed to map page 0 as read-write.
Carpe diem!
Re: X86 asm experts: moving large blocks of data.
I for one map a kernel page at 0 (other lower-half addresses are mapped as user pages except this one). This first page contains the task control block structure for the current task, unavailable from user space, but pretty useful to access easily from the kernel. This way dereferencing a NULL pointer in user space will always trigger a page fault, no matter the language used to compile the executable. (Meaning I do not rely on any language's runtime, I can always detect NULL derefence and gracefully end the process with a SIGSEGV.)nullplan wrote:You can, but why would you want to? Address 0 is used as the null pointer in every C implementation I know
Cheers,
bzt
Re: X86 asm experts: moving large blocks of data.
Perhaps, one could design and use an API that works with addresses that are integers instead of pointers? Assembly could work as well.moonchild wrote:Linux is compiled with -fno-delete-null-pointer-checks because (surprise, surprise!) when you're the kernel you're allowed to map page 0 as read-write.
Re: X86 asm experts: moving large blocks of data.
Can you elaborate? Pointers (addresses) are integers... How would using integers instead of pointers prevent anything here? What am I missing?alexfru wrote:Perhaps, one could design and use an API that works with addresses that are integers instead of pointers? Assembly could work as well.
Re: X86 asm experts: moving large blocks of data.
OK, and a NULL pointer dereference in kernel space will instead clobber the task control block. I think I will keep my model, in which page 0 cannot be mapped, ever. And a null pointer dereference in kernel space will cause a kernel panic, instead of continuing with clobbered state. (BTW, I just have %gs point to a CPU structure, which contains a pointer to the current TCB. That way I don't need to mess with kernel-side page mappings when switching tasks.)bzt wrote:I for one map a kernel page at 0 (other lower-half addresses are mapped as user pages except this one). This first page contains the task control block structure for the current task, unavailable from user space, but pretty useful to access easily from the kernel. This way dereferencing a NULL pointer in user space will always trigger a page fault,[...]
Carpe diem!
Re: X86 asm experts: moving large blocks of data.
The idea is that if the compiler doesn't know that something is a pointer or is used as one, it's not going to remove late checks for it being 0. It might be easier to say it than do it though.kzinti wrote:Can you elaborate? Pointers (addresses) are integers... How would using integers instead of pointers prevent anything here? What am I missing?alexfru wrote:Perhaps, one could design and use an API that works with addresses that are integers instead of pointers? Assembly could work as well.
Re: X86 asm experts: moving large blocks of data.
Only if you're a noob programmer. I know for sure that my kernel is working okay, because it is I who wrote it, and I made lots of tests and run-time checks. Are you suggesting that you don't trust your own code?nullplan wrote:OK, and a NULL pointer dereference in kernel space will instead clobber the task control block.
On the other hand I won't write all of the userspace applications, some of them will be ported, therefore I won't have absolute control over them, so I'm more concerned about those.
And how do you plan to recover from a kernel panic? Let me guess, by rebooting the entire computer?nullplan wrote:And a null pointer dereference in kernel space will cause a kernel panic
...which will detected as the TCB starts with a magic. It doesn't happen, but even in the unlikely event if it does, my kernel is capable to handle that gracefully, kill the clobbered process (and only that one).nullplan wrote:instead of continuing with clobbered state.
That way you must use SWAPGS, which is a) known to be vulnerable to attacks, b) using segmentation makes your kernel unportable.nullplan wrote:(BTW, I just have %gs point to a CPU structure, which contains a pointer to the current TCB. That way I don't need to mess with kernel-side page mappings when switching tasks.)
I would recommend to write your kernel properly instead, so that it can't have NULL pointer dereferences in the first place. You can't guarantee that all applications are written correctly, but you can guarantee at least that the kernel that you wrote is free of NULL dereferences. It shouldn't be a problem for an experienced programmer to write correct code. That's why they are called "experienced".
Cheers,
bzt
Re: X86 asm experts: moving large blocks of data.
What does TCBs have to do with moving blocks of data? Anyway, what I do is simple have the GS register contains a kernel structure with this info. I don't use SWAPGS either. Apps will ignore GS. FS will contain the TLS pointer. I thinking mapping address 0 for the current TCB is, to put it bluntly, a bad idea. My kernel has a bug where it will occasionally access a null pointer. I think I would be more confused by this bug if it was overwriting the TCB. Don't worry, I am fixing the bug . To be honest, I don't trust my code, as my current kernel has only been in development for a couple months.
Bugs affect every programmer just the same.bzt wrote:Only if you're a noob programmer.