Page 2 of 2

Re:A20 Gate

Posted: Sun Jun 30, 2002 8:19 am
by drizzt
/* At last my main.c program... */
/* ================== MAIN.C ==================*/
extern read_mem_32();
extern void write_mem_32();

int main(void)
{
   static unsigned long i;
   static char x;
   
   for (i=1; i<=65535; i++)
      write_mem_32(i+0x200000, 17); /* write 17 above 2MB in mem */
   for (i=1; i<=65535; i++)
   {
      x=read_mem_32(i+0x200000); /* check if the write is correct */   
      if (x==17)
         printf("OK!");   /* ...it should have to fill the screen with "OK!" msg... */
   }

   hang:
   goto hang;   /* just hang */
}

PS Sorry for the big space...

Re:A20 Gate

Posted: Mon Jul 01, 2002 1:48 am
by Pype.Clicker
I've always been confused by the unreal mode, but it seems dangerous to me to switch to protected mode and back without flushing your instruction pipeline and defining a proper code selector for the protected mode instructions.
it should be
init_unreal:
mov cr0,cr0|1
jmp CODESEL:.pmode

.pmode:
<all your instructions in protected mode>
mov cr0,cr0&~1
jmp 1000:.done

.done:
<now in clean unreal mode>

Problem with using jumpless code is that the processor will fail to retreive any instruction using cs:ip (segmented real address) that it still has in its registers. As long as the instruction has already been prefetched, there's no problem (and the early jmp is prefetched), but depending on the pipeline length you can have a segfault virtually anywhere (and you have no handler for it yet).

note that to use this code "as is", your code segment will have to be 16 bits and with its base starting at 0x00010000.

Re:A20 Gate

Posted: Mon Jul 01, 2002 7:07 am
by crazybuddha
Pype.Clicker wrote: and defining a proper code selector for the protected mode instructions.
There are no "protected mode instructions" in unreal mode. CS is unaffected by the switch.

Unreal mode is ONLY changing the offset limit by temporarily loading a descriptor, then putting back the orginal segment register value. So segment:offset addressing of real mode still works the same. But now when adding the offset, it is a 32 bit value. The segment register value is still 16 bit and treated the same as ever.

Re:A20 Gate

Posted: Mon Jul 01, 2002 9:42 am
by Pype.Clicker
did i confused you, buddah? It seems to me that all instructions between the two "mov cr0, eax" ARE in protected mode.

mov cr0, eax ; partial switch to 32-bit pmode
mov bx, DATA_SEL ; selector to segment w/ 4G limit
mov ds, bx
mov es, bx ; set seg limits in descriptor caches
dec al
mov cr0, eax ; back to (un)real mode

therefore, they'll need a valid protected-mode CS descriptor if they exceed the pipeline size (this is, if when executing the first mov cr0 instruction, the last one is not yet in the prefetch stage of the pipeline.
and as there is 5 stages in pentium pipeline, here's what could happen:

EXEC: u=mov cr0,eax v=nop (according to my memory, cr0 doesn't pair).
ALU: u=mov bx,08 v=nop (next int uses bx-> stalls)
EA: u=mov ds,bx v=? (segment loading in pmode surely doesn't pair)
DECODE: u=mov es,bx v=? (see above)
PREFETCH: u=dec al

oops. no cr0 in pipeline !
Maybe i'm completely wrong, but i'm pretty sure if someone has such a code that works on his computer, it's likely to fail on many others if you have no luck enough...

Do not take it as a personnal offense, though ;) it's pretty tricky to correctly initialize pmode, the fact you wrote an unreal handler is the evidence of some superior technical skills ... but it's a _really_ hard thing to do ...

Re:A20 Gate

Posted: Mon Jul 01, 2002 10:08 am
by crazybuddha
Of course, no offense taken. In fact, this is an interesting point. And yes, I wasn't following your point. But I think I am now.

If I get some time, I'll bang on it a bit and get back to you.

Re:A20 Gate

Posted: Mon Jul 01, 2002 12:56 pm
by Pype.Clicker
I think it would be interresting you tell us what kind of hardware you used to run your bootsector ... i bet a reboot this is an AMD, isn't it ?

(they have completely different way handling jumps and prefetching ...)

Re:A20 Gate

Posted: Tue Jul 02, 2002 4:15 am
by drizzt
Ok! i resolve the problem... now it works!

The error is in _write_mem_32 & _read_mem_32.
When in main.c i call write_mem_32, the program put into the stack a 32bit offset and a char; but the 32bit offset don't take 1 location of the stack, but 2!!!

So if i wanna take the char to write in mem i must specifie not [bp+6] but [bp+8]!!!

I also include my new write_mem_32 & read_mem_32 system calls:

_write_mem_32:?????????; void write_mem(unsigned long i, char c)

push bp
mov bp, sp

???mov dword eax, [bp+4]???; OFFSET (32 BIT)
mov ???bl, [bp+8]???; bp+8 because in the stack there's a dword => 2 position of 16 bit!!!

mov byte [ds:eax], bl???; write in mem!

mov sp, bp
pop bp

ret

_read_mem_32: ; char read_mem_32(unsigned long i)

???push bp
mov bp, sp

???mov dword eax, [bp+4]???; OFFSET (32 BIT)

???mov ???bl, [ds:eax]???; read from mem!

mov sp, bp
pop bp

xor eax, eax
mov al, bl
mov dx, ax ; Into ax and dx are results.
ret


I've tested this in a Pentium standard 100Mhz, in a Pentium III 600Mhz and in a Mobile Pentium III 1.13Ghz.
It seems to work correctly...

...and now i can access to all my memory by direct addressing!

Re:A20 Gate

Posted: Tue Jul 02, 2002 7:01 am
by Pype.Clicker
Well, seems i've lost my bet and will have to reboot ...

Maybe there's something mystical that happens to CS in the "unreal trip" ... maybe the CPU is just using a cached version of CS or something alike... it could be interresting to set CS to a 16bits code selector with its 'page gran' bit set and see if one can access code at any place after being back to real mode ...

Re:A20 Gate

Posted: Tue Jul 02, 2002 2:12 pm
by drizzt
drizzt wrote: mov ah, 2401h
int 15h

Is it possible to use this code to enable A20 Gate whitout switching in PMode?!
...and about my first question... yes, it's possible to enable a20 line with interrupt 15h, function 2401h... but only in recent motherboard... after PS/2... it doesn't work in your 386 ;)

Re:A20 Gate

Posted: Thu Jul 04, 2002 1:33 am
by drizzt
...ehm... sorry... another problem about my write_mem_32 & read_mem_32...

The problem is that when i pass a long argument as location it goes well. But if i pass a long+int it doesn't work... it takes an int as the sum result and not a long!!!

...but i suppose that in C when i do

c=a+b;

where a is long & b is int, the result c must be a long!!!

For example if I write this program in main.c:

#define XMS_START 0x100000
#define XMS_END 0x900000

static unsigned long i;
static char c;
.
.
.
c=1; /*Smile char*/
for (i=XMS_START; i<=XMS_END; i++)
write_mem_32(i, c); /* write c at location i */

...it's ok! any problem... it fill 9MB with the smile char.

Now, if i modify this program so:

#define XMS_START 0x100000
#define XMS_END 0x900000

static unsigned long i;
static char c;
static unsigned int a=32;

.
.
.
c=1;
for (i=XMS_START; i<=XMS_END; i++)
write_mem_32(i+a, c); /* write c at location i */

...it doesn't work!!! it fill only 64K of mem!!! it seems that it takes an int value as the result of the sum i+a!

Is there something wrong??? Could anybody help me??
Thanks in advance...

Re:A20 Gate

Posted: Thu Jul 04, 2002 5:40 am
by Pype.Clicker
did you declared your mem_write & mem_read functions in C (i.e. void mem_read(long, long) ?

maybe you can try to enforce the type casting with
i + (long) a
or
(long)(i + (long) a)

Re:A20 Gate

Posted: Thu Jul 04, 2002 12:48 pm
by drizzt
I've tried to declare read_mem_32 & write_mem_32 with long argument... and I've also tried to enforce casting, but it doesn't work... it's very strange :o

But if i try to enforce casting like this:

static unsigned long location;
static unsigned int a, b;

location=(long)(a+b);
write_mem_32(location, 1);
.
.
.

it should work... (I suppose!) but when I link asm program and main.c (with JLOC) it generate an error like:

Undefined symbol in LXMUL@ in main.c

This is my JLOC script:

ALL:
asm_prog.o
c_prog.o

START: 0,000000000
,,,START
CODE: 0
,,code
DATA: 0,#10
,,data
BSS: 0,#10
,,bss

Re:A20 Gate

Posted: Fri Jul 05, 2002 2:55 am
by drizzt
sorry... i said wrong...
If I write a+b there's no problem... the problem came when i do for example:

write_mem_32(a*b+c, value);

where a is int, b int and c long!!!

So also enforce casting don't work! the procedure takes an integer value and not a long....

...now i resolve the problem making a*b as a for cycle... it isn't very good... but it works!

long_var=0;
for (i=1; i<=b; i++)
long_var+=a;

then i write...

write_mem_32(long_var+c); /*... ok! */

I think there is a problem because when i multiply an int to a long the result must be an integer value... and if i force casting as (long)a*b it return a long, but the linker JLOC gives some errors (but i'm not sure about that...)

The important is that it goes... i could implement an asm procedure to multiply long with int or int with long to resolve the problem... (ehm.. maybe in the future...)

So i hope this big discussion about extended memory can help other people that have the same problem...

Re:A20 Gate

Posted: Fri Jul 05, 2002 3:07 am
by .bdjames
A20: call WaitWrite
mov al, 0D0h
    out 64h, al
call WaitRead
in al, 60h
mov bx, ax
call WaitWrite
mov al, 0D1h
out 64h, al
call WaitWrite
mov ax, bx
or ax, 2
out 60h, al
call WaitWrite
mov al, 0D0h
    out 64h, al
call WaitRead
in al, 60h
test ax, 2
jz A20

WaitRead: push ax
WaitReadLoop: xor ax, ax
in al, 64h
    test ax, 1
jz WaitReadLoop
pop ax
ret

WaitWrite: push ax
WaitWriteLoop: xor ax, ax
in al, 64h
    test ax, 2
jnz WaitWriteLoop
pop ax
ret

Re:A20 Gate

Posted: Fri Jul 05, 2002 3:18 am
by Pype.Clicker
drizzt wrote: sorry... i said wrong...
If I write a+b there's no problem... the problem came when i do for example:

write_mem_32(a*b+c, value);

where a is int, b int and c long!!!
Hmm ... Do you expect the multiplication result (a*b) to be a long or an int ? if it's a long, then it would probably safer to write
{ long la = (long) a;
long lb = (long) b;
la*=lb;

write_mem_32(la+c,value);
}

I guess you're still with a 16 bits C and thus longs are 32 bits. If this is so, you could *greatly* simplify your problem by performing multiplications _inside_ your assembly code (remember a simple imul eax,edx could save you ;), or at least specify a base address (32 bits) and an offset (n bits) in your function call that would be added in asm with proper type size...

Now, i don't exactly know why this multiplication is so important for you but remember address multiplications isn't clean C programming ...

and, well, the LXMUL symbol JLOC complains about is the C function to perform longxlong multiplication (as your 16 bits compiler ignores you're using a 16/32 architecture that could do it in hardware ;)