X86 asm experts: moving large blocks of data.

8infy · Post by **8infy** » Tue Nov 03, 2020 4:45 am

alexfru wrote:
bloodline wrote:ok, I replaced the horizontal line copy part of my "blitting" functions with an inline asm "rep movsl" and the speed improvement is mind blowing! Literally several orders of magnitude!
Which makes me wonder whether there was unnecessary stuff in your loops (or loops were too short) or you were compiling with compiler optimizations disabled.
Properly structuring the code (and, of course, using effective algorithms) and enabling optimizations usually works quite well.

True, dst = src works fast enough for me, especially at O2.

bloodline · Post by **bloodline** » Tue Nov 03, 2020 5:11 am

8infy wrote:
alexfru wrote:
bloodline wrote:ok, I replaced the horizontal line copy part of my "blitting" functions with an inline asm "rep movsl" and the speed improvement is mind blowing! Literally several orders of magnitude!
Which makes me wonder whether there was unnecessary stuff in your loops (or loops were too short) or you were compiling with compiler optimizations disabled.
Properly structuring the code (and, of course, using effective algorithms) and enabling optimizations usually works quite well.
True, dst = src works fast enough for me, especially at O2.

Ok, so I haven’t been using any optimisation options, for two reasons; Firstly I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand, and secondly my motivation for this project comes from wanting to learn more about the modern PC architecture.

I’m going to experiment with some compiler options now!

-edit-
-O2 seems to produced the fastest code, -O3 just crashes

Octocontrabass · Post by **Octocontrabass** » Tue Nov 03, 2020 11:54 am

bloodline wrote:I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand

Compilers have come a long way since then! You might try Compiler Explorer to see some of the ways optimization affects your code. It's a good idea to try your code with different optimization levels every so often to catch bugs.

bloodline wrote:-O3 just crashes

Either you've got some undefined behavior in your code, or your compiler is configured to emit instructions that aren't supported. If it's undefined behavior, the compiler might be able to warn you about it (if you turn on the correct warnings).

bloodline · Post by **bloodline** » Tue Nov 03, 2020 12:00 pm

Octocontrabass wrote:
bloodline wrote:I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand
Compilers have come a long way since then! You might try Compiler Explorer to see some of the ways optimization affects your code. It's a good idea to try your code with different optimization levels every so often to catch bugs.

In practicality live on compiler explorer, also love watching Godbolt’s lectures on YouTube!!

It’s fascinating to see how tiny changes I my C/C++ can lead to big differences in the generated code!

bloodline wrote:-O3 just crashes
Either you've got some undefined behavior in your code, or your compiler is configured to emit instructions that aren't supported. If it's undefined behavior, the compiler might be able to warn you about it (if you turn on the correct warnings).

Hmmm, I’m careful to deal with warnings, so they don’t hide errors... I’ll probably explore which part of my code is causing -O3 optimisation issues once I’ve completed my code clean up!

alexfru · Post by **alexfru** » Tue Nov 03, 2020 12:27 pm

bloodline wrote:Ok, so I haven’t been using any optimisation options, for two reasons;

Oops. A big one.

bloodline wrote:Firstly I’m quite old school, and was taught to only apply compiler optimisation after you were happy with the performance of the code as written by hand, and secondly my motivation for this project comes from wanting to learn more about the modern PC architecture.

I’m going to experiment with some compiler options now!

-edit-
-O2 seems to produced the fastest code, -O3 just crashes

Being "old school" may explain the -O3 crashes.

Today's computers enable today's compilers to analyze and optimize code in ways impractical some 30 years ago. What old compilers simply couldn't do today's do easily.

This has two important effects.

The most obvious one is that you get fast code without having to resort to assembly or lots of trivial manual optimizations.

The least obvious effect is that today's compilers aren't looking at source code (or its internal representation) through a keyhole, they are seeing lots of it at once and remembering a lot of what they have seen for much longer. This means that they can (and do) make and apply decisions across large chunks of code. Specifically, if your code has what's known as "undefined behavior" (which has existed since the first C language standard of 1989, btw), this undefined behavior may drive your compiler into generating "broken" code that doesn't work how you want or expect it to. Because of undefined behavior, compiled code can be "broken" in some very strange ways, such that the effects of the undefined behavior aren't localized to the line or statement where it occurs at the source code level, but are far removed from there. Old compilers couldn't do such analysis across large chunks of code and therefore undefined behavior was usually limited to where it occurred at the source level and so the effects of UB were immediate and easy to deal with. Often times there were no perceived ill effects of UB and things "just worked". This is not so anymore. And it's not because today's compilers are somehow mean and evil. The C language standard allowed old compilers to be mean and evil just as well. It's just that it took some technological progress for that meanness to manifest.

Allow me to show you an example of completely unexpected undefined behavior that I've been bitten by years ago:

Code: Select all

void f(int a, int position) {
  int bit = (a & (1 << position)) != 0;
  if (bit)
    puts("bit=1");
  else
    puts("bit=0");
}

If position is too large, f() may end up crashing instead of printing a bogus value. The reason is that the compiler can generate the 80386 bit instruction instead of a sequence of shl & and. If I remember correctly, the bit instruction accesses memory at address (&a + position/32). So, when position is in the valid range, everything's OK. When it's not, the benign-looking shift unexpectedly crashes your program.

There are many more examples of undefined behavior, it's just one that resulted a totally unexpected crash.

My understanding is that Linus Torvalds writes the Linux kernel as if there was no technological progress or undefined behavior and it still was 1988. He does so by disallowing specific optimizations in the compiler. And he probably keeps adding such disallowing options in the makefile. I think it's a rather unfortunate situation with deliberate misuse of the language and compiler. Perhaps, using a different language, with which he wouldn't need to fight, would be better.

I think you should learn about undefined behavior and eradicate it and enjoy compiling your code with high levels of optimization.

bloodline · Post by **bloodline** » Tue Nov 03, 2020 12:50 pm

alexfru wrote:
bloodline wrote:Ok, so I haven’t been using any optimisation options, for two reasons;
Oops. A big one.

Good advice, I appreciate it! I agree with letting the Compiler do the hard work, but I do want to stay close to the hardware for now, as I’m enjoying learning the PC architecture, it’s very new to me.

As for my -O3 crash, I’ve probably forgotten to mark a variable being accessed in an interrupt as a volatile, or a variable had been optimised away, or something. I’m coming to the end of a huge code clean up... which was also a significant rewrite, so mistakes will be there, aplenty!

moonchild · Post by **moonchild** » Tue Nov 03, 2020 3:26 pm

alexfru wrote: My understanding is that Linus Torvalds writes the Linux kernel as if there was no technological progress or undefined behavior and it still was 1988. He does so by disallowing specific optimizations in the compiler. And he probably keeps adding such disallowing options in the makefile. I think it's a rather unfortunate situation with deliberate misuse of the language and compiler. Perhaps, using a different language, with which he wouldn't need to fight, would be better.

Linux is compiled with -fno-delete-null-pointer-checks because (surprise, surprise!) when you're the kernel you're allowed to map page 0 as read-write. This flag is not incidental, it was added because its absence caused a bug. This seems completely reasonable. I also compile my kernel with -fno-delete-null-pointer-checks, and commend everybody else to do so too.

nullplan · Post by **nullplan** » Wed Nov 04, 2020 12:54 pm

moonchild wrote:Linux is compiled with -fno-delete-null-pointer-checks because (surprise, surprise!) when you're the kernel you're allowed to map page 0 as read-write.

You can, but why would you want to? Address 0 is used as the null pointer in every C implementation I know, and dereferencing the null pointer is always undefined behavior. I can sort of see a point to this if the architecture demands important things be written at that address (IIRC ARM puts the exception vectors there, right?) but otherwise, I would never put anything at address 0, since that way I will find null pointer dereference bugs more easily.

bzt · Post by **bzt** » Wed Nov 04, 2020 1:15 pm

nullplan wrote:You can, but why would you want to? Address 0 is used as the null pointer in every C implementation I know

I for one map a kernel page at 0 (other lower-half addresses are mapped as user pages except this one). This first page contains the task control block structure for the current task, unavailable from user space, but pretty useful to access easily from the kernel. This way dereferencing a NULL pointer in user space will always trigger a page fault, no matter the language used to compile the executable. (Meaning I do not rely on any language's runtime, I can always detect NULL derefence and gracefully end the process with a SIGSEGV.)

Cheers,
bzt

alexfru · Post by **alexfru** » Wed Nov 04, 2020 1:22 pm

moonchild wrote:Linux is compiled with -fno-delete-null-pointer-checks because (surprise, surprise!) when you're the kernel you're allowed to map page 0 as read-write.

Perhaps, one could design and use an API that works with addresses that are integers instead of pointers? Assembly could work as well.

kzinti · Post by **kzinti** » Wed Nov 04, 2020 9:26 pm

alexfru wrote:Perhaps, one could design and use an API that works with addresses that are integers instead of pointers? Assembly could work as well.

Can you elaborate? Pointers (addresses) are integers... How would using integers instead of pointers prevent anything here? What am I missing?

nullplan · Post by **nullplan** » Wed Nov 04, 2020 9:47 pm

bzt wrote:I for one map a kernel page at 0 (other lower-half addresses are mapped as user pages except this one). This first page contains the task control block structure for the current task, unavailable from user space, but pretty useful to access easily from the kernel. This way dereferencing a NULL pointer in user space will always trigger a page fault,[...]

OK, and a NULL pointer dereference in kernel space will instead clobber the task control block. I think I will keep my model, in which page 0 cannot be mapped, ever. And a null pointer dereference in kernel space will cause a kernel panic, instead of continuing with clobbered state. (BTW, I just have %gs point to a CPU structure, which contains a pointer to the current TCB. That way I don't need to mess with kernel-side page mappings when switching tasks.)

alexfru · Post by **alexfru** » Thu Nov 05, 2020 12:33 am

kzinti wrote:
alexfru wrote:Perhaps, one could design and use an API that works with addresses that are integers instead of pointers? Assembly could work as well.
Can you elaborate? Pointers (addresses) are integers... How would using integers instead of pointers prevent anything here? What am I missing?

The idea is that if the compiler doesn't know that something is a pointer or is used as one, it's not going to remove late checks for it being 0. It might be easier to say it than do it though.

bzt · Post by **bzt** » Thu Nov 05, 2020 8:58 am

nullplan wrote:OK, and a NULL pointer dereference in kernel space will instead clobber the task control block.

Only if you're a noob programmer. I know for sure that my kernel is working okay, because it is I who wrote it, and I made lots of tests and run-time checks. Are you suggesting that you don't trust your own code?

On the other hand I won't write all of the userspace applications, some of them will be ported, therefore I won't have absolute control over them, so I'm more concerned about those.

nullplan wrote:And a null pointer dereference in kernel space will cause a kernel panic

And how do you plan to recover from a kernel panic? Let me guess, by rebooting the entire computer?

nullplan wrote:instead of continuing with clobbered state.

...which will detected as the TCB starts with a magic. It doesn't happen, but even in the unlikely event if it does, my kernel is capable to handle that gracefully, kill the clobbered process (and only that one).

nullplan wrote:(BTW, I just have %gs point to a CPU structure, which contains a pointer to the current TCB. That way I don't need to mess with kernel-side page mappings when switching tasks.)

That way you must use SWAPGS, which is a) known to be vulnerable to attacks, b) using segmentation makes your kernel unportable.

I would recommend to write your kernel properly instead, so that it can't have NULL pointer dereferences in the first place. You can't guarantee that all applications are written correctly, but you can guarantee at least that the kernel that you wrote is free of NULL dereferences. It shouldn't be a problem for an experienced programmer to write correct code. That's why they are called "experienced".

Cheers,
bzt

nexos · Post by **nexos** » Thu Nov 05, 2020 10:10 am

What does TCBs have to do with moving blocks of data? Anyway, what I do is simple have the GS register contains a kernel structure with this info. I don't use SWAPGS either. Apps will ignore GS. FS will contain the TLS pointer. I thinking mapping address 0 for the current TCB is, to put it bluntly, a bad idea. My kernel has a bug where it will occasionally access a null pointer. I think I would be more confused by this bug if it was overwriting the TCB. Don't worry, I am fixing the bug

. To be honest, I don't trust my code, as my current kernel has only been in development for a couple months.

bzt wrote:Only if you're a noob programmer.

Bugs affect every programmer just the same.

OSDev.org

X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.

Re: X86 asm experts: moving large blocks of data.