Need help to understand this inline assembler code...

osdevkid · Post by **osdevkid** » Wed Mar 23, 2011 11:05 pm

Dear All,

The below macro code is a part of Linux Kernel code from "IO.H" file

#define outb_p(value,port) \
__asm__ ("outb %%al,%%dx\n" \
		"\tjmp 1f\n" \
		"1:\tjmp 1f\n" \
		"1:"::"a" (value),"d" (port))

For the above inline ASM macro code, I have written similar ASM code in below:

Code: Select all

       lea _value, %ax
       lea _port, %dx
       outb %al, %dx
1:    jmp 1f
1:

Is it correct ?

If yes then, it will go to infinite loop when execute the sentence "1: jmp 1f"

Also, why this one more lable "1:" at last line is required?

Jezze · Post by **Jezze** » Thu Mar 24, 2011 12:53 am

Someone else might be better to answer this than me but as far as I've understood jumping to 1f means to jump to the address of next occurrance of the label 1. f stands for forward. So it doesnt seem like it would go into an infinite loop but I'm far from sure.

JamesM · Post by **JamesM** » Thu Mar 24, 2011 5:27 am

Jezze wrote:Someone else might be better to answer this than me but as far as I've understood jumping to 1f means to jump to the address of next occurrance of the label 1. f stands for forward. So it doesnt seem like it would go into an infinite loop but I'm far from sure.

That is exactly correct. The '1f' syntax is a way of creating anonymous symbols.

turdus · Post by **turdus** » Thu Mar 24, 2011 6:28 am

It may seem odd to you to jump right to the next instruction, but there's a point. Any branch (so the jmp) causes the instruction cache to flush, which takes time. This means that the cpu won't continue execution after outb for a while, so there's time for the hardware to settle down after IO.

Solar · Post by **Solar** » Thu Mar 24, 2011 7:41 am

turdus wrote:Any branch (so the jmp) causes the instruction cache to flush...

Is that documented and guaranteed? I could picture a halfway-smart branch prediction unit to optimize that flush away...

turdus · Post by **turdus** » Thu Mar 24, 2011 7:54 am

Solar wrote:
turdus wrote:Any branch (so the jmp) causes the instruction cache to flush...
Is that documented and guaranteed? I could picture a halfway-smart branch prediction unit to optimize that flush away...

Yes, it was so with M68k, x86 family, Sparc and so on (I can't remember if it's in the amd manual, but I'll check it for you, let's say it's empirically true). Since it's unconditional I don't think branch prediction came to play (it lacks special opcode selecting possible branch anyway). I could much more imagine that the compiler's optimizer throws it away, so it's advisable to use "volatile" keyword.

Jvac · Post by **Jvac** » Thu Mar 24, 2011 9:10 am

This function is used to do low level port input and output. It is designed for kernel use, but can be used from user
space. Must have optimization enabled, causing unresolved references at link time.

Are you applying it in kernel or user space?

You might have to use ioperm(2) or iopl(2) to tell the kernel to allow the user space application to access the I/O
ports in question. Failure to do this will cause the application to receive a segmentation fault.

Will it go into infinite loop?

Not sure it is hard to say.

qw · Post by qw » Thu Mar 24, 2011 9:45 am

osdevkid wrote:Is it correct ?

No, it isn't. You may compile the C function with the -S switch and study the assembly output.

JamesM · Post by **JamesM** » Thu Mar 24, 2011 10:22 am

Hobbes wrote:
osdevkid wrote:Is it correct ?
No, it isn't. You may compile the C function with the -S switch and study the assembly output.

It's missing a second jmp; apart from that it is a correct transcription. If you're going to pooh-pooh something, please explain why so the OP can learn instead of being a whiny ***** about it.

Kthx.

Brendan · Post by **Brendan** » Thu Mar 24, 2011 1:36 pm

Hi,

Jezze and JamesM are right - the "jmp 1f" jumps to the first "1:" after the instruction (not before, or at the beginning of the instruction).

turdus wrote:It may seem odd to you to jump right to the next instruction, but there's a point. Any branch (so the jmp) causes the instruction cache to flush, which takes time. This means that the cpu won't continue execution after outb for a while, so there's time for the hardware to settle down after IO.

It's not quite that simple, and relates to some strange historical oddities.

On slow computers (e.g. for 8086 and some 80286) the CPU's bus interface was about as slow as the devices, and no IO delays were necessary. The CPUs got faster but the devices didn't, and this led to design flaws that made the IO delay necessary (mainly in some 80286 and 80386 machines). For later computers (some 80386 and all "80486 and later") the problem is entirely fixed - the CPU waits for the chipset to tell it when to continue after an IO port access, and the chipset uses conservative timing for "I/O wait states" to ensure there's no problem with slow devices.

On old CPUs (80486 and older?), the JMP instruction does cause a delay (due to pipeline stalls, instruction fetch, etc) which is enough to "solve" the problem on dodgy old motherboards. On newer CPUs the JMP instruction may not cause any delay at all, but that's fine because the IO delay isn't needed in the first place.

For an operating system like Linux, people can be too scared to touch anything if they don't have to - it's "safer" to leave the IO delays in the code (even if it's not necessary) just in case it does break something on some ancient old computer.

Cheers,

Brendan

turdus · Post by **turdus** » Thu Mar 24, 2011 2:07 pm

I have a knowledge about how to do something, and came Brendan who tell us why is it so. That's really cool, I like it

Now I can recall you are absolutely right about IO waits and buses, except one thing: I doubt that on modern computers jumps cause no delay at all. About ten years ago (at the university) my prof told us that every branch starts with dropping the prefetched instructions from cpu cache, and load the ones at new IP (regardless of architecture, it's common). With pipelines and superscalar cpus it's not better but even worse, takes more and more time as the cache capacity and complexity grows. I doubt it's otherwise nowdays, but fix me, did I miss some new, radically different technology?

Edit: I've found this in intel's manual vol 1, section 2.2.2.1:
"Two of these problems contribute to major sources of delays:
• time to decode instructions fetched from the target
• wasted decode bandwidth due to branches or branch target in the middle of cache lines"
The cpu is using not only one branch, but several, called traces. So basically it's a prefetch prefetch. This is good for near jumps, but does not solve the problem of jumping far, somewhere that's not cached at all. But jump to the next instruction is a near jump indeed, so it's absolutely pointless and just a bad habbit.

Brendan · Post by **Brendan** » Thu Mar 24, 2011 2:22 pm

Hi,

turdus wrote:Now I can recall you are absolutely right about IO waits and buses, except one thing: I doubt that on modern computers jumps are fast. About ten years ago (at the university) my prof told us that every branch starts with dropping the prefetched instructions from cpu cache, and load the ones at new IP (regardless of architecture, it's common). With pipelines and superscalar cpus it's not better but even worse, takes more and more time as the cache capacity and complexity grows. I doubt it's otherwise nowdays, but fix me, did I miss some new, radically different technology?

Modern 80x86 has very complex branch prediction capabilities (and speculative execution and other things like hyper-threading), specifically to avoid the performance problems associated with both conditional branches and unconditional branches. Intel has been using some form of branch prediction since Pentium CPUs (1993?), but has continually increased complexity (and decreased mispredictions) since.

About ten years ago (probably a little more), a university professor told my entire class that "big endian" was better than "little endian" (he was a Motorola fan) and that an 8-bit variable can hold a value from 0 to 148....

Cheers,

Brendan

turdus · Post by **turdus** » Thu Mar 24, 2011 2:35 pm

Brendan wrote: About ten years ago (probably a little more), a university professor told my entire class that "big endian" was better than "little endian" (he was a Motorola fan) and that an 8-bit variable can hold a value from 0 to 148....

Not bad

i had a teacher (supposed to teach us programming and OO) who failed to power on a computer (he was in his late 60's)...

But my prof was basically right, since prediction only works for near jumps (where I mean near is when the target trace is in the cache too). Cache (being a fast access memory) is expensive and therefore limited in size, sooner or later you will get a cache miss. But this does not apply to our subject, since the target is in cache, it's damn sure

Tosi · Post by **Tosi** » Thu Mar 24, 2011 3:03 pm

I wouldn't recommend using things that depend on the CPU speed like relative jumps to wait for I/O completion. Instead do a dummy write/read to a port. Also Linux source code is a very bad thing to learn from, it's ugly.

Chandra · Post by **Chandra** » Thu Mar 24, 2011 7:19 pm

The ongoing discussion is amazing. But the fact is that, the jump was used to introduce significant delay, no matter whether that works or not on modern computers. It was basically Torvald's logic.

You can't see that particular code in newer Kernel Souce. It is already depreciated. Instead, a newer method of delaying is introduced, by writing to the port that doesn't exit (0x80). This seems reasonable.

Here's a quick view of the info on the top of the IO.h file:

/*
* Thanks to James van Artsdalen for a better timing-fix than
* the two short jumps: using outb's to a nonexistent port seems
* to guarantee better timings even on fast machines.
*
* On the other hand, I'd like to be sure of a non-existent port:
* I feel a bit unsafe about using 0x80 (should be safe, though)
*
* Linus
*/

Cheers

OSDev.org

Need help to understand this inline assembler code...

Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...

Re: Need help to understand this inline assembler code...