Need help to understand this inline assembler code...

Programming, for all ages and all languages.
User avatar
osdevkid
Member
Member
Posts: 72
Joined: Sun Nov 21, 2010 11:15 am
Location: India, Chennai

Need help to understand this inline assembler code...

Post by osdevkid »

Dear All,

The below macro code is a part of Linux Kernel code from "IO.H" file

Code: Select all

#define outb_p(value,port) \
__asm__ ("outb %%al,%%dx\n" \
		"\tjmp 1f\n" \
		"1:\tjmp 1f\n" \
		"1:"::"a" (value),"d" (port))
For the above inline ASM macro code, I have written similar ASM code in below:

Code: Select all

       lea _value, %ax
       lea _port, %dx
       outb %al, %dx
1:    jmp 1f
1: 
Is it correct ?

If yes then, it will go to infinite loop when execute the sentence "1: jmp 1f"

Also, why this one more lable "1:" at last line is required?
User avatar
Jezze
Member
Member
Posts: 395
Joined: Thu Jul 26, 2007 1:53 am
Libera.chat IRC: jfu
Contact:

Re: Need help to understand this inline assembler code...

Post by Jezze »

Someone else might be better to answer this than me but as far as I've understood jumping to 1f means to jump to the address of next occurrance of the label 1. f stands for forward. So it doesnt seem like it would go into an infinite loop but I'm far from sure.
Fudge - Simplicity, clarity and speed.
http://github.com/Jezze/fudge/
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Need help to understand this inline assembler code...

Post by JamesM »

Jezze wrote:Someone else might be better to answer this than me but as far as I've understood jumping to 1f means to jump to the address of next occurrance of the label 1. f stands for forward. So it doesnt seem like it would go into an infinite loop but I'm far from sure.
That is exactly correct. The '1f' syntax is a way of creating anonymous symbols.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Need help to understand this inline assembler code...

Post by turdus »

It may seem odd to you to jump right to the next instruction, but there's a point. Any branch (so the jmp) causes the instruction cache to flush, which takes time. This means that the cpu won't continue execution after outb for a while, so there's time for the hardware to settle down after IO.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Need help to understand this inline assembler code...

Post by Solar »

turdus wrote:Any branch (so the jmp) causes the instruction cache to flush...
Is that documented and guaranteed? I could picture a halfway-smart branch prediction unit to optimize that flush away...
Every good solution is obvious once you've found it.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Need help to understand this inline assembler code...

Post by turdus »

Solar wrote:
turdus wrote:Any branch (so the jmp) causes the instruction cache to flush...
Is that documented and guaranteed? I could picture a halfway-smart branch prediction unit to optimize that flush away...
Yes, it was so with M68k, x86 family, Sparc and so on (I can't remember if it's in the amd manual, but I'll check it for you, let's say it's empirically true). Since it's unconditional I don't think branch prediction came to play (it lacks special opcode selecting possible branch anyway). I could much more imagine that the compiler's optimizer throws it away, so it's advisable to use "volatile" keyword.
User avatar
Jvac
Member
Member
Posts: 58
Joined: Fri Mar 11, 2011 9:51 pm
Location: Bronx, NY

Re: Need help to understand this inline assembler code...

Post by Jvac »

This function is used to do low level port input and output. It is designed for kernel use, but can be used from user
space. Must have optimization enabled, causing unresolved references at link time.

Are you applying it in kernel or user space?

You might have to use ioperm(2) or iopl(2) to tell the kernel to allow the user space application to access the I/O
ports in question. Failure to do this will cause the application to receive a segmentation fault.

Will it go into infinite loop?

Not sure it is hard to say.
"The best way to prepare for programming is to write programs, and
to study great programs that other people have written." - Bill Gates


Think beyond Windows ReactOS®
User avatar
qw
Member
Member
Posts: 792
Joined: Mon Jan 26, 2009 2:48 am

Re: Need help to understand this inline assembler code...

Post by qw »

osdevkid wrote:Is it correct ?
No, it isn't. You may compile the C function with the -S switch and study the assembly output.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Need help to understand this inline assembler code...

Post by JamesM »

Hobbes wrote:
osdevkid wrote:Is it correct ?
No, it isn't. You may compile the C function with the -S switch and study the assembly output.
It's missing a second jmp; apart from that it is a correct transcription. If you're going to pooh-pooh something, please explain why so the OP can learn instead of being a whiny ***** about it.

Kthx.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Need help to understand this inline assembler code...

Post by Brendan »

Hi,

Jezze and JamesM are right - the "jmp 1f" jumps to the first "1:" after the instruction (not before, or at the beginning of the instruction).
turdus wrote:It may seem odd to you to jump right to the next instruction, but there's a point. Any branch (so the jmp) causes the instruction cache to flush, which takes time. This means that the cpu won't continue execution after outb for a while, so there's time for the hardware to settle down after IO.
It's not quite that simple, and relates to some strange historical oddities.

On slow computers (e.g. for 8086 and some 80286) the CPU's bus interface was about as slow as the devices, and no IO delays were necessary. The CPUs got faster but the devices didn't, and this led to design flaws that made the IO delay necessary (mainly in some 80286 and 80386 machines). For later computers (some 80386 and all "80486 and later") the problem is entirely fixed - the CPU waits for the chipset to tell it when to continue after an IO port access, and the chipset uses conservative timing for "I/O wait states" to ensure there's no problem with slow devices.

On old CPUs (80486 and older?), the JMP instruction does cause a delay (due to pipeline stalls, instruction fetch, etc) which is enough to "solve" the problem on dodgy old motherboards. On newer CPUs the JMP instruction may not cause any delay at all, but that's fine because the IO delay isn't needed in the first place.

For an operating system like Linux, people can be too scared to touch anything if they don't have to - it's "safer" to leave the IO delays in the code (even if it's not necessary) just in case it does break something on some ancient old computer.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Need help to understand this inline assembler code...

Post by turdus »

I have a knowledge about how to do something, and came Brendan who tell us why is it so. That's really cool, I like it :-)

Now I can recall you are absolutely right about IO waits and buses, except one thing: I doubt that on modern computers jumps cause no delay at all. About ten years ago (at the university) my prof told us that every branch starts with dropping the prefetched instructions from cpu cache, and load the ones at new IP (regardless of architecture, it's common). With pipelines and superscalar cpus it's not better but even worse, takes more and more time as the cache capacity and complexity grows. I doubt it's otherwise nowdays, but fix me, did I miss some new, radically different technology?

Edit: I've found this in intel's manual vol 1, section 2.2.2.1:
"Two of these problems contribute to major sources of delays:
• time to decode instructions fetched from the target
• wasted decode bandwidth due to branches or branch target in the middle of cache lines"
The cpu is using not only one branch, but several, called traces. So basically it's a prefetch prefetch. This is good for near jumps, but does not solve the problem of jumping far, somewhere that's not cached at all. But jump to the next instruction is a near jump indeed, so it's absolutely pointless and just a bad habbit.
Last edited by turdus on Thu Mar 24, 2011 2:24 pm, edited 1 time in total.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Need help to understand this inline assembler code...

Post by Brendan »

Hi,
turdus wrote:Now I can recall you are absolutely right about IO waits and buses, except one thing: I doubt that on modern computers jumps are fast. About ten years ago (at the university) my prof told us that every branch starts with dropping the prefetched instructions from cpu cache, and load the ones at new IP (regardless of architecture, it's common). With pipelines and superscalar cpus it's not better but even worse, takes more and more time as the cache capacity and complexity grows. I doubt it's otherwise nowdays, but fix me, did I miss some new, radically different technology?
Modern 80x86 has very complex branch prediction capabilities (and speculative execution and other things like hyper-threading), specifically to avoid the performance problems associated with both conditional branches and unconditional branches. Intel has been using some form of branch prediction since Pentium CPUs (1993?), but has continually increased complexity (and decreased mispredictions) since.

About ten years ago (probably a little more), a university professor told my entire class that "big endian" was better than "little endian" (he was a Motorola fan) and that an 8-bit variable can hold a value from 0 to 148.... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Need help to understand this inline assembler code...

Post by turdus »

Brendan wrote: About ten years ago (probably a little more), a university professor told my entire class that "big endian" was better than "little endian" (he was a Motorola fan) and that an 8-bit variable can hold a value from 0 to 148.... ;)
Not bad :-) i had a teacher (supposed to teach us programming and OO) who failed to power on a computer (he was in his late 60's)...

But my prof was basically right, since prediction only works for near jumps (where I mean near is when the target trace is in the cache too). Cache (being a fast access memory) is expensive and therefore limited in size, sooner or later you will get a cache miss. But this does not apply to our subject, since the target is in cache, it's damn sure :-)
Tosi
Member
Member
Posts: 255
Joined: Tue Jun 15, 2010 9:27 am
Location: Flyover State, United States
Contact:

Re: Need help to understand this inline assembler code...

Post by Tosi »

I wouldn't recommend using things that depend on the CPU speed like relative jumps to wait for I/O completion. Instead do a dummy write/read to a port. Also Linux source code is a very bad thing to learn from, it's ugly.
User avatar
Chandra
Member
Member
Posts: 487
Joined: Sat Jul 17, 2010 12:45 am

Re: Need help to understand this inline assembler code...

Post by Chandra »

The ongoing discussion is amazing. But the fact is that, the jump was used to introduce significant delay, no matter whether that works or not on modern computers. It was basically Torvald's logic.

You can't see that particular code in newer Kernel Souce. It is already depreciated. Instead, a newer method of delaying is introduced, by writing to the port that doesn't exit (0x80). This seems reasonable. :wink:

Here's a quick view of the info on the top of the IO.h file:
/*
* Thanks to James van Artsdalen for a better timing-fix than
* the two short jumps: using outb's to a nonexistent port seems
* to guarantee better timings even on fast machines.
*
* On the other hand, I'd like to be sure of a non-existent port:
* I feel a bit unsafe about using 0x80 (should be safe, though)
*
* Linus
*/
Cheers
Programming is not about using a language to solve a problem, it's about using logic to find a solution !
Post Reply