multi-core initialization -- 16/32 bit issues
multi-core initialization -- 16/32 bit issues
Hi
I have implemented multi-core initialization and managed to get my APs (application processors) into their real mode boot code. I did this by:
a) copying the AP boot code down below 1Mb (to address 0x1000)
b) then going through the whole SIPI initialization steps
My APs all get to their real mode boot code just fine -- i.e. they are running the code I copied down to below 1Mb.
Now comes the part that has me stumped -- how to get my APs out of real mode and into protected mode (I know how to get out of real mode and into protected mode in general but not in this instance -- read on).
I am using Grub2 as my boot loader, so I far as I understand things I cannot include any 16 bit code in the OS binary (when I do include 16 bit code Grub 2 refuses to load my OS code).
So my AP real mode boot code looks like this (I have questions about the ??? parts).
.code16
startAP:
lgdt ????
movl %cr0, %eax
orl 0x1, %eax
movl %eax, %cr0
ljmp 0x8, ???
I have included this code in my 32-bit OS binary using .incbin -- so I compiled it as a 16-bit raw binary and then I include it as raw bytes using .incbin.
Usually you would put some labels in for the ??? parts, and the linker would relocate, and it would work just great. But I can't do that since my code is not linked -- it is loaded via .incbin -- effectively as data.
So I did the following:
a) copied this code below 1Mb (to address 0x1000)
b) put a descriptor table at 0x2000,
c) put the address of my protected mode code at 0x3000.
My two questions are then:
a) how do I load the gdt with 0x2000 -- the location of my AP gdt? I want to simply do something like
lgdt $0x2000
But that didn't work (using gcc as my assembler).
b) How do I far jump to the address in location 0x300?.
It feels like I might be going about this the wrong way, but I haven't programmed in assembler since my days hacking around on PDP-11 device drivers!
thanks
graham
I have implemented multi-core initialization and managed to get my APs (application processors) into their real mode boot code. I did this by:
a) copying the AP boot code down below 1Mb (to address 0x1000)
b) then going through the whole SIPI initialization steps
My APs all get to their real mode boot code just fine -- i.e. they are running the code I copied down to below 1Mb.
Now comes the part that has me stumped -- how to get my APs out of real mode and into protected mode (I know how to get out of real mode and into protected mode in general but not in this instance -- read on).
I am using Grub2 as my boot loader, so I far as I understand things I cannot include any 16 bit code in the OS binary (when I do include 16 bit code Grub 2 refuses to load my OS code).
So my AP real mode boot code looks like this (I have questions about the ??? parts).
.code16
startAP:
lgdt ????
movl %cr0, %eax
orl 0x1, %eax
movl %eax, %cr0
ljmp 0x8, ???
I have included this code in my 32-bit OS binary using .incbin -- so I compiled it as a 16-bit raw binary and then I include it as raw bytes using .incbin.
Usually you would put some labels in for the ??? parts, and the linker would relocate, and it would work just great. But I can't do that since my code is not linked -- it is loaded via .incbin -- effectively as data.
So I did the following:
a) copied this code below 1Mb (to address 0x1000)
b) put a descriptor table at 0x2000,
c) put the address of my protected mode code at 0x3000.
My two questions are then:
a) how do I load the gdt with 0x2000 -- the location of my AP gdt? I want to simply do something like
lgdt $0x2000
But that didn't work (using gcc as my assembler).
b) How do I far jump to the address in location 0x300?.
It feels like I might be going about this the wrong way, but I haven't programmed in assembler since my days hacking around on PDP-11 device drivers!
thanks
graham
- BrightLight
- Member
- Posts: 901
- Joined: Sat Dec 27, 2014 9:11 am
- Location: Maadi, Cairo, Egypt
- Contact:
Re: multi-core initialization -- 16/32 bit issues
lgdt (0x2000)gmatthews wrote:how do I load the gdt with 0x2000 -- the location of my AP gdt? I want to simply do something like
You said 0x3000 above but 0x300 here, but anyways I'll assume it's 0x3000 because the memory used by BIOS.gmatthews wrote:How do I far jump to the address in location 0x300?.
To jump to 0x3000, just do a normal jump: jmp 0x08:0x3000
To jump at the value contained at 0x3000, what about:
Code: Select all
jmp 0x08:pmode
.code32
pmode:
; set up segments here, especially SS and DS
movl (0x3000), eax
jmp eax
You know your OS is advanced when you stop using the Intel programming guide as a reference.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: multi-core initialization -- 16/32 bit issues
The problem with incbin'ing a snippet is that it can't actually reference symbols elsewhere. The trick is however that you don't even need it as long as you write some code not to require relocations:
Code: Select all
SECTION .rodata
BITS 16
ap_trampoline_code:
MOV AX, 0
MOV DS, AX
LGDT [ap_gdtr - ap_trampoline_code + AP_TRAMPOLINE_OFFSET]
MOV EAX, CR0
OR AL, 1
MOV CR0, EAX
JMP FAR DWORD 0x08:ap_startup_code
ap_gdtr:
DW 0x1F
DD gdt
ap_trampoline_end:
ap_trampoline_size EQU ap_trampoline_end - ap_trampoline_code
Re: multi-core initialization -- 16/32 bit issues
For GDT, you can calculate addresses from known offsets and code location at run time.
Put the code at "aligned enough" address (offset and linear address parts do not overlap at or instruction above). Loading at page-aligned address (0x1000) should be safe. You should enter the code with IP=0 - set reset vector to something like (100:0).
As for ljmp part - I have not had any issues linking 16-bit code in binary, so I simply use label. But anyway, you can patch the code with correct addresses after loading it into place. You can either modify the bytes at ljmp instruction itself or reserve some bytes at known offset and load the jump address from there.
I know that self-modifying code is generally discouraged. But if you already load code as data and move it around, patching it does not seem too bad
Code: Select all
boot_start16:
/* Load DS */
mov %cs, %eax
mov %ax, %ds
/* Calculate linear address of "boot_start16" */
shl $4, %eax
/* Load GDP */
mov $(ap_tmp_gdp-boot_start16), %bx
/* Patch GDP's pointer to current linear address of ap_tmp_gdt.
Use 'or' instead of 'add' here, because it will do no harm if
executed multiple times. */
or %eax, 2(%bx)
lgdt (%bx)
/* ---- snip ---- */
ap_tmp_gdp:
.short ap_tmp_gdt_end - ap_tmp_gdt - 1
.long ap_tmp_gdt - boot_start16
ap_tmp_gdt:
/* NULL */
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
/* 32-bit code & data */
.byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00
.byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00
ap_tmp_gdt_end:
As for ljmp part - I have not had any issues linking 16-bit code in binary, so I simply use label. But anyway, you can patch the code with correct addresses after loading it into place. You can either modify the bytes at ljmp instruction itself or reserve some bytes at known offset and load the jump address from there.
I know that self-modifying code is generally discouraged. But if you already load code as data and move it around, patching it does not seem too bad
If something looks overcomplicated, most likely it is.
Re: multi-core initialization -- 16/32 bit issues
Hi,
I normally have the trampoline code at a certain address (let's call that "cs.base"); then put various values it will need at a fixed offset from "cs.base". Then I can just do (e.g.):
Of course the code to start AP CPUs would allocate a stack for the CPU and set the appropriate values in the trampoline. Using "CS override prefix" like this means that have can have several copies of the trampoline at different addresses, and start multiple CPUs at the same time (while still giving them different details - e.g. different values for ESP). Note: "cs.base" is set by the "startup IPI" that you send.
Cheers,
Brendan
I normally have the trampoline code at a certain address (let's call that "cs.base"); then put various values it will need at a fixed offset from "cs.base". Then I can just do (e.g.):
Code: Select all
mov eax,PAGING_FLAG | PROTECTED_MODE_FLAG
mov cr3, [cs:0xFFC]
mov esp,[cs:0xFF8]
lgdt [cs:0xFF0]
mov cr0,eax ;Enable paging and protected mode
jmp far [cs:0xFF8] ;Load 32-bit CS and jump to somewhere in kernel-space
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: multi-core initialization -- 16/32 bit issues
First, thanks for everyone's help.Brendan wrote:Hi,
I normally have the trampoline code at a certain address (let's call that "cs.base"); then put various values it will need at a fixed offset from "cs.base". Then I can just do (e.g.):Of course the code to start AP CPUs would allocate a stack for the CPU and set the appropriate values in the trampoline. Using "CS override prefix" like this means that have can have several copies of the trampoline at different addresses, and start multiple CPUs at the same time (while still giving them different details - e.g. different values for ESP). Note: "cs.base" is set by the "startup IPI" that you send.Code: Select all
mov eax,PAGING_FLAG | PROTECTED_MODE_FLAG mov cr3, [cs:0xFFC] mov esp,[cs:0xFF8] lgdt [cs:0xFF0] mov cr0,eax ;Enable paging and protected mode jmp far [cs:0xFF8] ;Load 32-bit CS and jump to somewhere in kernel-space
Cheers,
Brendan
Second, I am not sure what you mean by ""cs.base" is set by the "startup IPI" that you send.". My startup IPI code does this (my AP startup code is at 0x3000, hence the choice of $0x000C4601);
mov $APIC_BASE, %ebx # APIC address in EBX
mov $0x000C4500, %eax # broadcast INIT-IPI
mov %eax, 0x300(%ebx) # to all-except-self
# do ten-millisecond delay, enough time for APs to awaken
mov $100000, %eax # ten-thousand microseconds
call delay_EAX_microseconds # execute programmed delay
mov $0x000C4601, %eax
mov %eax, 0x300(%ebx) # to all-except-self
# do ten-millisecond delay, enough time for APs to awaken
mov $100000, %eax # ten-thousand microseconds
call delay_EAX_microseconds # execute programmed delay
I realize the code may be a bit primitive (their is lot of discussion on the net about the delays not being the best way to do this, but it is simple code that I understand, and I want to get something simple working first).
You suggest that this code has to set cs.base. But it's unclear to me how this code can communicate cs.base to the startup code. Or is the cs.base assumed to be 0x3000 since that is where the startup code is? Or perhaps I am missing part of the semantics of the SIPI -- the AP code starts running at 0x3000 -- but is it running with cs = 0 and ip = 0x3000, or cs = 0x3000 and ip = 0? Hopefully the latter
thanks
graham
Re: multi-core initialization -- 16/32 bit issues
Hi,
Note that (for an OS) broadcasting the "INIT SIPI SIPI" sequence (e.g. "to all excluding self") is a huge mistake. The problems are:
Another problem is that often the AP CPU will start on the first SIPI, execute some of your code, then get "restarted" by the second Startup IPI, which can cause bugs (e.g. if the AP CPU does "total_CPUs_present++;" then it can increment the counter twice). This means that you want some sort of synchronisation between the CPU being started and the CPU that's monitoring it. For example, as soon as the CPU starts it can set an "I started" flag in the trampoline and then wait for the other CPU to see this and set a "you can continue" flag before it continues. Also, if the other CPU sees the "I started" flag was set before the second Startup IPI is sent then you can skip the second Startup IPI completely.
This means that the full sequence would be more like:
To fix that (and boot faster) there's various ways to start CPUs in parallel (safely). One way is to send the INIT IPI to (up to) 4 CPUs, then wait for 10 ms once, then do the Startup IPIs one CPU at a time. In this case, with 128 CPUs it'd take at least 320 ms to start all of them. Another way is to have one CPU start another CPU, then both of those CPUs start a CPU each, then all 4 CPUs start a CPU each, and so on. In that case it'd take at least 70 ms to start 128 CPUs. These can be combined - e.g. one CPU starts 4 CPUs, then all 5 CPUs start 4 more CPUs each, then all 25 CPUs start 4 CPUs each, etc. This is the fastest (and most complicated) way, and adds up to at least 40 ms to start 128 CPUs.
In any case; when you're starting CPUs in parallel (safely) you're going to want a different trampoline for each CPU. For example, if you start (up to) 4 CPUs in parallel, you're going to want 4 copies of the trampoline (with different values for "address of top of stack", separate "I started" flags, etc).
With multiple separate trampolines you need to adjust the "vector" field in the Startup IPI to tell the CPU which trampoline it should use.
Cheers,
Brendan
The lowest 8 bits of the Startup IPI (the "vector" field) are loaded into the highest 8 bits of the AP CPU's CS register, so if the vector field is 0x01 the AP CPU's CS register ends up being 0x0100, which means the trampoline must be at "0x0100:0x0000" (in real mode) which is 0x00001000.gmatthews wrote:Second, I am not sure what you mean by ""cs.base" is set by the "startup IPI" that you send.". My startup IPI code does this (my AP startup code is at 0x3000, hence the choice of $0x000C4601);Brendan wrote:Note: "cs.base" is set by the "startup IPI" that you send.
Note that (for an OS) broadcasting the "INIT SIPI SIPI" sequence (e.g. "to all excluding self") is a huge mistake. The problems are:
- It can start CPUs that were disabled because they're faulty
- It can start CPUs that were disabled because the user disabled hyper-threading in the firmware options
- It makes it virtually impossible to detect when a CPU (that should start) has failed to start
- It makes it hard to give each CPU different data (e.g. a different "top of stack" address)
Another problem is that often the AP CPU will start on the first SIPI, execute some of your code, then get "restarted" by the second Startup IPI, which can cause bugs (e.g. if the AP CPU does "total_CPUs_present++;" then it can increment the counter twice). This means that you want some sort of synchronisation between the CPU being started and the CPU that's monitoring it. For example, as soon as the CPU starts it can set an "I started" flag in the trampoline and then wait for the other CPU to see this and set a "you can continue" flag before it continues. Also, if the other CPU sees the "I started" flag was set before the second Startup IPI is sent then you can skip the second Startup IPI completely.
This means that the full sequence would be more like:
- For each CPU mentioned by ACPI or MultiProcessor Specification:
- Allocate stack for that CPU
Set info in trampoline (address of stack to use, etc) and clear the "I started" flag and the "you can continue" flag
Send INIT IPI to that CPU only
Wait for 10 ms
Send first Startup IPI to that CPU only
Wait for up to 200 us or until "I started flag" set (whichever happens first)
If "I started flag" not set:- Send second Startup IPI to that CPU only
Wait for up to maybe 500 ms or until "I started flag" set (whichever happens first)
If "I started flag" not set:- CPU failed to start (display error message and assume CPU is faulty and don't use it)
- Send second Startup IPI to that CPU only
- Allocate stack for that CPU
To fix that (and boot faster) there's various ways to start CPUs in parallel (safely). One way is to send the INIT IPI to (up to) 4 CPUs, then wait for 10 ms once, then do the Startup IPIs one CPU at a time. In this case, with 128 CPUs it'd take at least 320 ms to start all of them. Another way is to have one CPU start another CPU, then both of those CPUs start a CPU each, then all 4 CPUs start a CPU each, and so on. In that case it'd take at least 70 ms to start 128 CPUs. These can be combined - e.g. one CPU starts 4 CPUs, then all 5 CPUs start 4 more CPUs each, then all 25 CPUs start 4 CPUs each, etc. This is the fastest (and most complicated) way, and adds up to at least 40 ms to start 128 CPUs.
In any case; when you're starting CPUs in parallel (safely) you're going to want a different trampoline for each CPU. For example, if you start (up to) 4 CPUs in parallel, you're going to want 4 copies of the trampoline (with different values for "address of top of stack", separate "I started" flags, etc).
With multiple separate trampolines you need to adjust the "vector" field in the Startup IPI to tell the CPU which trampoline it should use.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: multi-core initialization -- 16/32 bit issues
We had a similar discussion a few years back, but I'm still wondering if there is something wrong with my proposed routine.Brendan wrote:To fix that (and boot faster) there's various ways to start CPUs in parallel (safely). One way is to send the INIT IPI to (up to) 4 CPUs, then wait for 10 ms once, then do the Startup IPIs one CPU at a time. In this case, with 128 CPUs it'd take at least 320 ms to start all of them. Another way is to have one CPU start another CPU, then both of those CPUs start a CPU each, then all 4 CPUs start a CPU each, and so on. In that case it'd take at least 70 ms to start 128 CPUs. These can be combined - e.g. one CPU starts 4 CPUs, then all 5 CPUs start 4 more CPUs each, then all 25 CPUs start 4 CPUs each, etc. This is the fastest (and most complicated) way, and adds up to at least 40 ms to start 128 CPUs.
Code: Select all
Set up trampoline (I use only one)
Build an array of structs, containing LAPIC ID and CPU_STATUS = Not_started, containing each CPU mentioned by ACPI or MultiProcessor Specification
set boot CPU's status as Running (for convenience)
Allocate neccessary number of stacks, put an array of pointers to known location
For each item in array, where CPU_STATUS == Not_started:
Send INIT IPI
Wait 10 ms
For each item in array, where CPU_STATUS == Not_started:
Send Startup IPI
Wait 200 us, or until all items in array have CPU_STATUS == Running
If there are CPUs not running:
For each item in array, where CPU_STATUS == Not_started:
Send Startup IPI
Wait for up to maybe 500 ms or until all items in array have CPU_STATUS == Running
If there are CPUs not running:
Report failed CPUs or ...
Clean up
Code: Select all
Loads GDT, switches mode, enables paging, etc.
Obtains next available stack from prepared stack array (using proper locking, of course)
Retrieves its own LAPIC id
Finds corresponding item in CPU state array and Sets CPU_STATUS = Running
Signals Boot CPU that it should re-check CPU array
If something looks overcomplicated, most likely it is.
Re: multi-core initialization -- 16/32 bit issues
Hi,
Without knowing how long it takes to send an IPI, the only thing you can know is that all the time delays may be much longer than intended.
I don't know if "time delays may be much longer than intended" can cause issues or not. Maybe it's fine on all CPUs that exist now (and maybe it's not), and maybe next year Intel will decide to do "after 400 us CPU decides it should go back to waiting for INIT IPI" and it breaks.
Cheers,
Brendan
If it takes 10 us to send an IPI (e.g. before the "delivery status" flag clears and it's safe to send the next IPI) and you have 128 CPUs, how long does it take to send 127 separate Startup IPIs? In this case, the first CPU would have already waited for 1270 us before you even begin the "wait 200 us" delay.Velko wrote:We had a similar discussion a few years back, but I'm still wondering if there is something wrong with my proposed routine.Brendan wrote:To fix that (and boot faster) there's various ways to start CPUs in parallel (safely). One way is to send the INIT IPI to (up to) 4 CPUs, then wait for 10 ms once, then do the Startup IPIs one CPU at a time. In this case, with 128 CPUs it'd take at least 320 ms to start all of them. Another way is to have one CPU start another CPU, then both of those CPUs start a CPU each, then all 4 CPUs start a CPU each, and so on. In that case it'd take at least 70 ms to start 128 CPUs. These can be combined - e.g. one CPU starts 4 CPUs, then all 5 CPUs start 4 more CPUs each, then all 25 CPUs start 4 CPUs each, etc. This is the fastest (and most complicated) way, and adds up to at least 40 ms to start 128 CPUs.Then each AP on starting up:Code: Select all
Set up trampoline (I use only one) Build an array of structs, containing LAPIC ID and CPU_STATUS = Not_started, containing each CPU mentioned by ACPI or MultiProcessor Specification set boot CPU's status as Running (for convenience) Allocate neccessary number of stacks, put an array of pointers to known location For each item in array, where CPU_STATUS == Not_started: Send INIT IPI Wait 10 ms For each item in array, where CPU_STATUS == Not_started: Send Startup IPI Wait 200 us, or until all items in array have CPU_STATUS == Running If there are CPUs not running: For each item in array, where CPU_STATUS == Not_started: Send Startup IPI Wait for up to maybe 500 ms or until all items in array have CPU_STATUS == Running If there are CPUs not running: Report failed CPUs or ... Clean up
Wouldn't it take 10 ms + 200 us + whatever time it takes to send IPIs, regardless of number of CPUs? Are there any pitfalls?Code: Select all
Loads GDT, switches mode, enables paging, etc. Obtains next available stack from prepared stack array (using proper locking, of course) Retrieves its own LAPIC id Finds corresponding item in CPU state array and Sets CPU_STATUS = Running Signals Boot CPU that it should re-check CPU array
Without knowing how long it takes to send an IPI, the only thing you can know is that all the time delays may be much longer than intended.
I don't know if "time delays may be much longer than intended" can cause issues or not. Maybe it's fine on all CPUs that exist now (and maybe it's not), and maybe next year Intel will decide to do "after 400 us CPU decides it should go back to waiting for INIT IPI" and it breaks.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: multi-core initialization -- 16/32 bit issues
Brendan wrote:
jmp far [cs:0xFF8] ;Load 32-bit CS and jump to somewhere in kernel-space
How do I code that in gcc/gas? And what exactly has to be in location cs:0xFF8? Do I need to have a 4-byte absolute address or a 6-byte address with the first two bytes being my cs selector (so 0x8 or something like that), and the next 4-bytes being an offset from the base of that selector?
thanks
graham
jmp far [cs:0xFF8] ;Load 32-bit CS and jump to somewhere in kernel-space
How do I code that in gcc/gas? And what exactly has to be in location cs:0xFF8? Do I need to have a 4-byte absolute address or a 6-byte address with the first two bytes being my cs selector (so 0x8 or something like that), and the next 4-bytes being an offset from the base of that selector?
thanks
graham
Re: multi-core initialization -- 16/32 bit issues
Brendan
I have tried your suggestion re a trampoline and can't make it work. Conceptually I get it -- it's quite straightforward -- but the assembler is tripping me up (especially since I learned assembler on a machine with no stupid segments).
For your trampoline you have code like this:
So the way I read the first line is that:
a) we calculate a linear address A = cs * 16 + 0xFF8
b) we load the 32-bit value at A into the esp register, so esp = *A (in C-speak)
Is that correct?
If that is correct then I assume that the second line says:
a) we calculate a linear address A = cs * 16 + 0xFF0
b) A should be the linear address of 6 bytes -- the first 2 of which are a size, and the last 4 of which are the linear address of a global descriptor table (so A is the address of a gdtr)
And the final line says:
a) we calculate a linear address A = cs * 16 + 0xFF8
b) A should be the linear address of ???? -- I am not sure how we specify the new value of cs, and the offset .. I am not sure what is at address A.
I am guessing my understanding isn't correct, since I can't figure out why my trampoline doesn't work.
graham
I have tried your suggestion re a trampoline and can't make it work. Conceptually I get it -- it's quite straightforward -- but the assembler is tripping me up (especially since I learned assembler on a machine with no stupid segments).
For your trampoline you have code like this:
Code: Select all
mov esp,[cs:0xFF8]
lgdt [cs:0xFF0]
...
jmp far [cs:0xFF8] ;Load 32-bit CS and jump to somewhere in kernel-space
a) we calculate a linear address A = cs * 16 + 0xFF8
b) we load the 32-bit value at A into the esp register, so esp = *A (in C-speak)
Is that correct?
If that is correct then I assume that the second line says:
a) we calculate a linear address A = cs * 16 + 0xFF0
b) A should be the linear address of 6 bytes -- the first 2 of which are a size, and the last 4 of which are the linear address of a global descriptor table (so A is the address of a gdtr)
And the final line says:
a) we calculate a linear address A = cs * 16 + 0xFF8
b) A should be the linear address of ???? -- I am not sure how we specify the new value of cs, and the offset .. I am not sure what is at address A.
I am guessing my understanding isn't correct, since I can't figure out why my trampoline doesn't work.
graham
Re: multi-core initialization -- 16/32 bit issues
Hi,
It's a little bit like calling a function via. a function pointer in C; where the function pointer contains the address of the function; except that it's a jump and not a call (so it'd be more like "goto myFunctionPointer();" which isn't something that a C compiler will appreciate.. ), and except that it loads CS and EIP (and doesn't just load EIP).
Cheers,
Brendan
Yes.gmatthews wrote:For your trampoline you have code like this:
So the way I read the first line is that:Code: Select all
mov esp,[cs:0xFF8] lgdt [cs:0xFF0] ... jmp far [cs:0xFF8] ;Load 32-bit CS and jump to somewhere in kernel-space
a) we calculate a linear address A = cs * 16 + 0xFF8
b) we load the 32-bit value at A into the esp register, so esp = *A (in C-speak)
Is that correct?
Yes.gmatthews wrote:If that is correct then I assume that the second line says:
a) we calculate a linear address A = cs * 16 + 0xFF0
b) A should be the linear address of 6 bytes -- the first 2 of which are a size, and the last 4 of which are the linear address of a global descriptor table (so A is the address of a gdtr)
For this case, the memory at "[cs:0xFF8]" would contain the values to load into CS and EIP (the CS and EIP to jump to).gmatthews wrote:And the final line says:
a) we calculate a linear address A = cs * 16 + 0xFF8
b) A should be the linear address of ???? -- I am not sure how we specify the new value of cs, and the offset .. I am not sure what is at address A.
I am guessing my understanding isn't correct, since I can't figure out why my trampoline doesn't work.
It's a little bit like calling a function via. a function pointer in C; where the function pointer contains the address of the function; except that it's a jump and not a call (so it'd be more like "goto myFunctionPointer();" which isn't something that a C compiler will appreciate.. ), and except that it loads CS and EIP (and doesn't just load EIP).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: multi-core initialization -- 16/32 bit issues
Hi Brendan
You wrote:
graham
You wrote:
So if A = cs * 16 + 0xFF8, what does A actually point to? 6 bytes -- the first 2 being a 16 bit CS, the next 4 being a 32 bit EIP?For this case, the memory at "[cs:0xFF8]" would contain the values to load into CS and EIP (the CS and EIP to jump to).
graham
Re: multi-core initialization -- 16/32 bit issues
Hi,
Note that in 16-bit code you'd probably end up with a 16-bit far jump (with 16-bit IP) as default, and you'd have to tell the assembler that you want a 32-bit jump instead.
Cheers,
Brendan
80x86 is "little-endian"; which means the small end goes first - the first 4 bytes would be EIP and then next 2 bytes would be CS.gmatthews wrote:So if A = cs * 16 + 0xFF8, what does A actually point to? 6 bytes -- the first 2 being a 16 bit CS, the next 4 being a 32 bit EIP?For this case, the memory at "[cs:0xFF8]" would contain the values to load into CS and EIP (the CS and EIP to jump to).
Note that in 16-bit code you'd probably end up with a 16-bit far jump (with 16-bit IP) as default, and you'd have to tell the assembler that you want a 32-bit jump instead.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: multi-core initialization -- 16/32 bit issues
Thanks for the help Brendan. I knew the chip was little endian but would never have thought that would extend to CS:EIP pairs. Again thanks for all the help -- works now!
graham
graham