Page 1 of 2

Need help to understand this inline assembler code...

Posted: Wed Mar 23, 2011 11:05 pm
by osdevkid
Dear All,

The below macro code is a part of Linux Kernel code from "IO.H" file

Code: Select all

#define outb_p(value,port) \
__asm__ ("outb %%al,%%dx\n" \
		"\tjmp 1f\n" \
		"1:\tjmp 1f\n" \
		"1:"::"a" (value),"d" (port))
For the above inline ASM macro code, I have written similar ASM code in below:

Code: Select all

       lea _value, %ax
       lea _port, %dx
       outb %al, %dx
1:    jmp 1f
1: 
Is it correct ?

If yes then, it will go to infinite loop when execute the sentence "1: jmp 1f"

Also, why this one more lable "1:" at last line is required?

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 12:53 am
by Jezze
Someone else might be better to answer this than me but as far as I've understood jumping to 1f means to jump to the address of next occurrance of the label 1. f stands for forward. So it doesnt seem like it would go into an infinite loop but I'm far from sure.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 5:27 am
by JamesM
Jezze wrote:Someone else might be better to answer this than me but as far as I've understood jumping to 1f means to jump to the address of next occurrance of the label 1. f stands for forward. So it doesnt seem like it would go into an infinite loop but I'm far from sure.
That is exactly correct. The '1f' syntax is a way of creating anonymous symbols.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 6:28 am
by turdus
It may seem odd to you to jump right to the next instruction, but there's a point. Any branch (so the jmp) causes the instruction cache to flush, which takes time. This means that the cpu won't continue execution after outb for a while, so there's time for the hardware to settle down after IO.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 7:41 am
by Solar
turdus wrote:Any branch (so the jmp) causes the instruction cache to flush...
Is that documented and guaranteed? I could picture a halfway-smart branch prediction unit to optimize that flush away...

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 7:54 am
by turdus
Solar wrote:
turdus wrote:Any branch (so the jmp) causes the instruction cache to flush...
Is that documented and guaranteed? I could picture a halfway-smart branch prediction unit to optimize that flush away...
Yes, it was so with M68k, x86 family, Sparc and so on (I can't remember if it's in the amd manual, but I'll check it for you, let's say it's empirically true). Since it's unconditional I don't think branch prediction came to play (it lacks special opcode selecting possible branch anyway). I could much more imagine that the compiler's optimizer throws it away, so it's advisable to use "volatile" keyword.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 9:10 am
by Jvac
This function is used to do low level port input and output. It is designed for kernel use, but can be used from user
space. Must have optimization enabled, causing unresolved references at link time.

Are you applying it in kernel or user space?

You might have to use ioperm(2) or iopl(2) to tell the kernel to allow the user space application to access the I/O
ports in question. Failure to do this will cause the application to receive a segmentation fault.

Will it go into infinite loop?

Not sure it is hard to say.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 9:45 am
by qw
osdevkid wrote:Is it correct ?
No, it isn't. You may compile the C function with the -S switch and study the assembly output.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 10:22 am
by JamesM
Hobbes wrote:
osdevkid wrote:Is it correct ?
No, it isn't. You may compile the C function with the -S switch and study the assembly output.
It's missing a second jmp; apart from that it is a correct transcription. If you're going to pooh-pooh something, please explain why so the OP can learn instead of being a whiny ***** about it.

Kthx.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 1:36 pm
by Brendan
Hi,

Jezze and JamesM are right - the "jmp 1f" jumps to the first "1:" after the instruction (not before, or at the beginning of the instruction).
turdus wrote:It may seem odd to you to jump right to the next instruction, but there's a point. Any branch (so the jmp) causes the instruction cache to flush, which takes time. This means that the cpu won't continue execution after outb for a while, so there's time for the hardware to settle down after IO.
It's not quite that simple, and relates to some strange historical oddities.

On slow computers (e.g. for 8086 and some 80286) the CPU's bus interface was about as slow as the devices, and no IO delays were necessary. The CPUs got faster but the devices didn't, and this led to design flaws that made the IO delay necessary (mainly in some 80286 and 80386 machines). For later computers (some 80386 and all "80486 and later") the problem is entirely fixed - the CPU waits for the chipset to tell it when to continue after an IO port access, and the chipset uses conservative timing for "I/O wait states" to ensure there's no problem with slow devices.

On old CPUs (80486 and older?), the JMP instruction does cause a delay (due to pipeline stalls, instruction fetch, etc) which is enough to "solve" the problem on dodgy old motherboards. On newer CPUs the JMP instruction may not cause any delay at all, but that's fine because the IO delay isn't needed in the first place.

For an operating system like Linux, people can be too scared to touch anything if they don't have to - it's "safer" to leave the IO delays in the code (even if it's not necessary) just in case it does break something on some ancient old computer.


Cheers,

Brendan

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 2:07 pm
by turdus
I have a knowledge about how to do something, and came Brendan who tell us why is it so. That's really cool, I like it :-)

Now I can recall you are absolutely right about IO waits and buses, except one thing: I doubt that on modern computers jumps cause no delay at all. About ten years ago (at the university) my prof told us that every branch starts with dropping the prefetched instructions from cpu cache, and load the ones at new IP (regardless of architecture, it's common). With pipelines and superscalar cpus it's not better but even worse, takes more and more time as the cache capacity and complexity grows. I doubt it's otherwise nowdays, but fix me, did I miss some new, radically different technology?

Edit: I've found this in intel's manual vol 1, section 2.2.2.1:
"Two of these problems contribute to major sources of delays:
• time to decode instructions fetched from the target
• wasted decode bandwidth due to branches or branch target in the middle of cache lines"
The cpu is using not only one branch, but several, called traces. So basically it's a prefetch prefetch. This is good for near jumps, but does not solve the problem of jumping far, somewhere that's not cached at all. But jump to the next instruction is a near jump indeed, so it's absolutely pointless and just a bad habbit.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 2:22 pm
by Brendan
Hi,
turdus wrote:Now I can recall you are absolutely right about IO waits and buses, except one thing: I doubt that on modern computers jumps are fast. About ten years ago (at the university) my prof told us that every branch starts with dropping the prefetched instructions from cpu cache, and load the ones at new IP (regardless of architecture, it's common). With pipelines and superscalar cpus it's not better but even worse, takes more and more time as the cache capacity and complexity grows. I doubt it's otherwise nowdays, but fix me, did I miss some new, radically different technology?
Modern 80x86 has very complex branch prediction capabilities (and speculative execution and other things like hyper-threading), specifically to avoid the performance problems associated with both conditional branches and unconditional branches. Intel has been using some form of branch prediction since Pentium CPUs (1993?), but has continually increased complexity (and decreased mispredictions) since.

About ten years ago (probably a little more), a university professor told my entire class that "big endian" was better than "little endian" (he was a Motorola fan) and that an 8-bit variable can hold a value from 0 to 148.... ;)


Cheers,

Brendan

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 2:35 pm
by turdus
Brendan wrote: About ten years ago (probably a little more), a university professor told my entire class that "big endian" was better than "little endian" (he was a Motorola fan) and that an 8-bit variable can hold a value from 0 to 148.... ;)
Not bad :-) i had a teacher (supposed to teach us programming and OO) who failed to power on a computer (he was in his late 60's)...

But my prof was basically right, since prediction only works for near jumps (where I mean near is when the target trace is in the cache too). Cache (being a fast access memory) is expensive and therefore limited in size, sooner or later you will get a cache miss. But this does not apply to our subject, since the target is in cache, it's damn sure :-)

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 3:03 pm
by Tosi
I wouldn't recommend using things that depend on the CPU speed like relative jumps to wait for I/O completion. Instead do a dummy write/read to a port. Also Linux source code is a very bad thing to learn from, it's ugly.

Re: Need help to understand this inline assembler code...

Posted: Thu Mar 24, 2011 7:19 pm
by Chandra
The ongoing discussion is amazing. But the fact is that, the jump was used to introduce significant delay, no matter whether that works or not on modern computers. It was basically Torvald's logic.

You can't see that particular code in newer Kernel Souce. It is already depreciated. Instead, a newer method of delaying is introduced, by writing to the port that doesn't exit (0x80). This seems reasonable. :wink:

Here's a quick view of the info on the top of the IO.h file:
/*
* Thanks to James van Artsdalen for a better timing-fix than
* the two short jumps: using outb's to a nonexistent port seems
* to guarantee better timings even on fast machines.
*
* On the other hand, I'd like to be sure of a non-existent port:
* I feel a bit unsafe about using 0x80 (should be safe, though)
*
* Linus
*/
Cheers