bewing's complete bochs rewrite: test request

This forums is for OS project announcements including project openings, new releases, update notices, test requests, and job openings (both paying and volunteer).
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: bewing's complete bochs rewrite: test request

Post by bewing »

Artlav wrote:
bewing wrote:If you do a REP and ECX is 0, it should repeat 4 billion times.
It should? I couldn't find a mention in the Intel docs.
If you look at the pseudocode for REP in the manual, it clearly shows that ECX is predecremented before testing against 0. I will try my own tests on real hardware. I never use REP with ECX = 0.
And rebochs is not locked up AFAIK -- it is just busy repeating your instruction 4 billion times, which takes awhile.
Not quite stupid - it's a quick and easy way to write out debug information, in two commands you can send a mark into the console that something went wrong or right. It's like a serial port.
I support that, but I also use port e9 for many other things. Point ESI at your output string, EDI at the string that should appear inside the [], set AL to [0 - 4] (info to fatal) and do an OUT 0xe9, AL. Your string will appear in the logfile.
By breakpoint i meant this sequence: ...
Which in Bochs breaks execution and gives you debug prompt. Very handy.
I see. I will probably support that with a special port 0xe9 call. Why not use XCHG EBX, EBX instead? That's a much simpler "magic breakpoint".
Last edited by bewing on Wed Aug 11, 2010 1:17 pm, edited 1 time in total.
User avatar
Artlav
Member
Member
Posts: 178
Joined: Fri Aug 21, 2009 5:54 am
Location: Moscow, Russia
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Artlav »

bewing wrote:If you look at the pseudocode for REP in the manual, it clearly shows that ECX is predecremented before testing against 0. I will try my own tests on real hardware. I never use REP with ECX = 0.
Strange.
Case in question:
-Real pc, step-by-step debugger
-mov ecx,0; rep stosb
-Nothing is written a given address, debugger shows nothing done

Special case?
Something i missed?
bewing wrote:Point ESI at your output string, EDI at the string that should appear inside the [], set AL to [0 - 4] (info to fatal) and do an OUT 0xe9, AL. Your string will appear in the logfile.
Handy by being somewhat higher-level. But, that needs a known memory and pre-defined string, while simple out can be done position and situation-independent.
bewing wrote: I see. I will probably support that with a special port 0xe9 call. Why not use XCHG EBX, EBX instead? That's a much simpler "magic breakpoint".
I agree, that might be better.
Regardless of how is it done, it being done would be a good thing.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: bewing's complete bochs rewrite: test request

Post by bewing »

Handy by being somewhat higher-level. But, that needs a known memory and pre-defined string, while simple out can be done position and situation-independent.
Not really.

Code: Select all

push 0x414243
mov esi, esp
push 0x444546
mov edi, esp
mov al, 3
out 0xe9, al
pop esi   ; fix the stack
pop esi
Just allocate enough stack space for a local byte array and fill it at runtime. Nothing needs to be predefined.
Regardless of how is it done, it being done would be a good thing.
The XCHG REG, REG magic breakpoints are already supported, on both bochs and rebochs.
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: bewing's complete bochs rewrite: test request

Post by stlw »

Owen wrote:AMD says that "Repetition is terminated when rCX reaches zero", which implies to me that it is terminated if rCX is zero *after* the decrement. In fact, if Bochs is not doing the repetition in that case, it is very, very buggy and will choke randomly on a lot of optimized code, because "rep ret" is a very common optimization
Ok, quite nice way to misunderstand the documentation :)

If count is ZERO at the beginning of string instruction it won''t execute any iteration. Actual Intel manual pseudocode matching this.
Intel manual says:

Code: Select all

WHILE CountReg  0
DO
...
OD;
which is supposed to similar to analog plain-C while loop, the exit condition checked BEFORE the loop iteration is executed.
Try to argue - Bochs is verified vs real hardware in this case.
Owen wrote:(Why is "rep ret" used? If your ret is a branch target, then K8s will utterly fail to predict it; AMD's recommended optimization is to prefix the ret with a rep. This obviously also implies that branches should disable the repetition)
Rep prefix has no affect to ret instruction as well as on many others. For example there are no single byte opcodes except string instructions that affected by REP prefix.
The rep prefix used as extension to opcode for many MMX/SSE instructions, for all others it just ignored.

Stanislav
User avatar
Artlav
Member
Member
Posts: 178
Joined: Fri Aug 21, 2009 5:54 am
Location: Moscow, Russia
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Artlav »

bewing wrote:Just allocate enough stack space for a local byte array and fill it at runtime. Nothing needs to be predefined.
What if there is no stack set up?
What if there's just enough place to fit a few bytes?
What if i want to write out several chars at large intervals, and then read what it spells out (keyboard input vs video driver testing)?
What, after all, if i want a simple command to write out?
bewing wrote:The XCHG REG, REG magic breakpoints are already supported, on both bochs and rebochs.
Nice.
Then there's a case of lack of documentation or it not being read by question asker.
Either way, not a priority at this stage.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: bewing's complete bochs rewrite: test request

Post by jal »

Owen wrote:because "rep ret" is a very common optimization
Common perhaps in a very small subset of code, since it is only for a very specific branch of AMD CPUs. See here.
Why is "rep ret" used? If your ret is a branch target, then K8s will utterly fail to predict it; AMD's recommended optimization is to prefix the ret with a rep. This obviously also implies that branches should disable the repetition
REP is only defined for a small set of instructions. It doesn't do anything for a RET (not even decrease CX, which is a good thing, or else CX was always destroyed on return from such an "optimized" function).


JAL
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Owen »

jal wrote:
Owen wrote:because "rep ret" is a very common optimization
Common perhaps in a very small subset of code, since it is only for a very specific branch of AMD CPUs. See here.
Its the kind of optimization that compilers should have on when generating generic code (after all, it doesn't harm other CPUs)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Combuster »

I costs more space, and it only goes wrong when you branch to the ret (in which case you can just invert the branch and add a ret at that specific location)

Pure architectural bloat.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Owen »

Code: Select all

.. some other code ...

theRet: 
    rep ret
.. some more code...
    cmp eax, ecx
    jz theRet
    ... more code
vs

Code: Select all

.. some other code ...

theRetThatsThereAnyway: 
    ret
.. some more code...
    cmp eax, ecx
    jnz dontRet
    ret
dontRet:
    ... more code
So, it's no smaller. It also penalizes the not-returning case with a branch.

Oh, and if you end up doing this

Code: Select all

    cmp eax, ecx
    jnz dontRet1
    ret
dontRet1:
    cmp ebx, ecx
    jnz dontRet2
    ret
dontRet2:
    ... more code
You probably just blew out the K8 branch predictor, since you squashed 4 branches into a 16 byte block. Congratulations: The processor will now completely fail to predict the final ret branch.

Optimization at this level is non-trivial. rep ret is ugly... but it costs you just one byte, can be reused quite heavily, and is the shortest and most general option.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Brendan »

Hi,
Owen wrote:Its the kind of optimization that compilers should have on when generating generic code (after all, it doesn't harm other CPUs)
Intel are quite specific about this - "The behaviour of the REP prefix is undefined when used with non-string instructions".

The reason is that for x86 the opcode map is almost full. When Intel or AMD want to add new instructions they need to be clever to find unused opcodes; and can easily decide to use the "REP with non-string instructions" encodings for entirely new instructions. This has happened already (some encodings that used to be "REP with non-string instructions and undefined behaviour" were defined as new MMX and SSE instructions). Basically, "REP RET" should never be used (at least not without ensuring it's an AMD CPU with borked return prediction), and there's no guarantee it won't become something entirely different in future CPUs.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Owen »

If it does... then Intel will be breaking a massive quantity of applications. If there is one rep-prefixed instruction they're not going to reuse, its "rep ret".

Additionally, Intel just opened up a massive chunk of coding space for themselves (Yes, themselves; they refuse to share it) with VEX.

(Crap like that is one of the reasons I think some anti trust commission should require that development of the x86 architecture be handed over to an independent organization, modelled somewhat like Power.org)
User avatar
Love4Boobies
Member
Member
Posts: 2111
Joined: Fri Mar 07, 2008 5:36 pm
Location: Bucharest, Romania

Re: bewing's complete bochs rewrite: test request

Post by Love4Boobies »

Owen wrote:If it does... then Intel will be breaking a massive quantity of applications.
Such as? Can you at least point to one compiler that generates this "optimization"?
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Owen »

Love4Boobies wrote:
Owen wrote:If it does... then Intel will be breaking a massive quantity of applications.
Such as? Can you at least point to one compiler that generates this "optimization"?
Every single GCC release post-K8.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re: bewing's complete bochs rewrite: test request

Post by Candy »

Owen wrote:Additionally, Intel just opened up a massive chunk of coding space for themselves (Yes, themselves; they refuse to share it) with VEX.
What's VEX?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: bewing's complete bochs rewrite: test request

Post by Combuster »

a prefix used in AMD's new 256-bit vector instructions
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply