Jumping to GRUB module adress causes rpl !=CPL in bochs

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
nergzd723
Posts: 10
Joined: Thu Oct 24, 2019 7:01 am

Re: Jumping to GRUB module adress causes rpl !=CPL in bochs

Post by nergzd723 »

Thank you, MichaelPetch!
I`ve done some research and discovered that even a completely bare bones kernel(obviously, with GDT) can`t jump to that location. I`ve tried doing the same without GDT but than it does something strange:

Code: Select all

00078268775e[DEV  ] write to port 0x0000 with len 4 ignored
00078269025e[DEV  ] read from port 0x0000 with len 4 returns 0xffffffff
00078269031e[CPU0 ] read_RMW_virtual_dword_32(): segment limit violation
Segment limit violation is obvious, but that port reads and writes are strange...
The sequence of code that is looping repeatedly is the while statement here
And it seems it somewhy broke ports since COM1 is not working. Why would it?
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Jumping to GRUB module adress causes rpl !=CPL in bochs

Post by bzt »

MichaelPetch wrote:Some of what bzt has told you is inaccurate about what you can do without loading your own GDT. He seems to be under the mistaken impression that you can't put things on the stack without setting your own SS (it is correct you do need to set ESP).
No, I wasn't incorrect. I actually had a bug with Multiboot recently where the SS was set correctly, however the shadow SS register was not, and it caused troubles. And I had exactly the same strange 0 address IO read/write errors too, may I add. It's better to be safe than sorry, so set segments to a know value.
Plus changing ESP __inside__ a function will loose the return address, I'm 100% certain of that.

With all the other things you said I agree.
nergzd723 wrote:Why does it just loop that? call -469 returns execution to beginning of it.
That's because the loop is waiting for a condition that always returns false. My guess would be "is_transmit_empty" has a bug.

Create a symbol table for your kernel:

Code: Select all

$ nm kernel.elf | colrm 16 18 | grep -ve '^[^0-9]' >kernel.sym
then load it into the bochs debugger with

Code: Select all

bochs:0> ldsym global kernel.sym
This way you will see function (and variable) names instead of just a cryptic "call -469". Furthermore, debugging using qemu -S -s + gdb and loading symbol table from your kernel compiled with the "-g" flag will show you exactly the executed C source line too next to the instruction.

Also with the bochs debugger, you can dump the state of the UART registers to see if "is_transmit_empty" is checking the correct bit in the correct UART register or not.

Cheers,
bzt
MichaelPetch
Member
Member
Posts: 797
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Jumping to GRUB module adress causes rpl !=CPL in bochs

Post by MichaelPetch »

bzt wrote:No, I wasn't incorrect. I actually had a bug with Multiboot recently where the SS was set correctly, however the shadow SS register was not, and it caused troubles. And I had exactly the same strange 0 address IO read/write errors too, may I add. It's better to be safe than sorry, so set segments to a know value.
How about this, you show me a minimal complete verifiable example (MCVE) where that happened and we will talk. There are very few reasons that situation could ever occur. You are on an old processor 386 that used a LOADALL instruction to read a valid value into the segment register (which is only used for display purposes) and invalid entries in the descriptor cache. The other situation would be SMM doing the same thing; a bug in an emulator; or what you believe you saw isn't accurate and the problem was something else.

I've been using Multiboot loaders (GRUB and QEMU `-kernel` option) for OS dev since late 90s and never experienced the issues whether that was a Multiboot loader running in BOCHS/QEMU/real hardware. You show me a piece of code that behaves the way you claim (and the environment it is running) and I am more than willing to discuss it.

If you DO NOT reload ANY of the segment registers including SS you do not have to load your own GDT. You could write an entire (useless) OS without interrupts/traps/exceptions/hardware task switching with no FAR calls and Jumps and FAR returns (or IRET), or direct manipulation of a segment register (ie MOV/POP) and be completely safe. If you are observing what you suggest there is something else amiss and likely has nothing to do with Multiboot at all but some more fundamental problem.
bzt wrote:Plus changing ESP __inside__ a function will loose the return address, I'm 100% certain of that.
No where did I say anything about modifying ESP inside a function. I only said that you need to initialize ESP to something, but what I didn't specifically say that needs to be done prior to using the stack. I assumed that was a given! You want to use the stack you need to set ESP as Multiboot makes no guarantees about the value in that register when your kernel starts running. You do not have to set SS before using the stack (it is a selector cached with flat memory model attributes, base, and limit). It is that simple.

It should be noted that using interrupts without setting your own GDT is a problem because that will reload CS and eventually SS (and any other segments registers you interrupt handler changes). I already mentioned that in my original post (about not being able to use interrupts because that requires segment registers to be reloaded directly and/or indirectly).
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Jumping to GRUB module adress causes rpl !=CPL in bochs

Post by bzt »

HI,

I see you had a bad day. Please calm, no-one is attacking you.
MichaelPetch wrote:How about this, you show me a minimal complete verifiable example (MCVE) where that happened and we will talk.
You can find a minimal PoC on this very forum. In order to use the BIOS, my code switched back to real mode, and some shadow registers left unchanged causing faults in the BIOS routines, producing exactly the same error messages in bochs. (I never said not wanted to imply the OP's problem is also BIOS related, or related to my exact PoC, other than it resulted in the same error message. I just said it could be stack segment issue that causes their bug, just like it caused in my case. Until you rule that out, that's a possibility).

Sure, you can write a code that won't directly refer to SS and could avoid all far addressing, exceptions, interrupts, etc., if you don't want to use your kernel for anything meaningful (just a hello world). All I'm saying is it is better to simply set GDT and the segment registers too. It won't hurt, and then you can be sure you've eliminated all (possible) segment related issues for good, including implicit segment reloads. I've spoken :-)
MichaelPetch wrote:I've been using Multiboot loaders (GRUB and QEMU `-kernel` option) for OS dev since late 90s
Good for you. Grub was always an overcomplicated and overengineered mess to my taste. That's why I've implemented my own loader, and I have absolutely no regrets. I made it Multiboot compliant just because I can, then I happily forgot about the whole thing and used my loader as a native OS loader under multiple platforms ;-) Every year once or twice I check it with Grub too, just to see if it's still working (they haven't changed Grub or qemu in an imcompatible way), but that's all.
MichaelPetch wrote:No where did I say anything about modifying ESP inside a function.
Nope, you didn't, the OP did. Their code had that which I warned them about in my post. Since you replied to that particular post saying I was incorrect, I just made it clear. One should not change ESP inside a function indeed (unless they know exactly what they're doing, like in a task switcher for example).

Cheers,
bzt
MichaelPetch
Member
Member
Posts: 797
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Jumping to GRUB module adress causes rpl !=CPL in bochs

Post by MichaelPetch »

TL;DR After a Multiboot loader transfers control to your code - to use the stack you only need to set ESP to a usable memory address, you do not need to set SS. SS (and all the other segment registers) will still have flat memory model characteristics in their associated descriptor cache entries until a time when the segments registers are reloaded. If you don't reload a segment register there is no problem.

---
bzt wrote:HI,
I see you had a bad day. Please calm, no-one is attacking you.
Who said anyone was attacking me? I'm dealing with someone who makes incorrect statements and pushes misinformation to people.

You clearly never read what I said in my original post, or you decided to read it and see it say something else. You made the claim that:
Calling a function requires accessing SS as well as ESP, neither is at a known state, you should not rely on them.
This is statement is FALSE. ESP doesn't have any guaranteed value (which I said in my original comment and was a point I agreed with you on), but SS is guaranteed to have a descriptor cache with a flat memory model. SS isn't guaranteed to be any particular selector value either. What also isn't guaranteed is for you to reload SS with a selector (even the same one) without your own GDT and corresponding GDT Record. I stated in other words that AS LONG AS YOU DO NOT LOAD A SEGMENT REGISTER EITHER DIRECTLY OR INDIRECTLY, YOU DO NOT NEED ANYTHING OTHER THAN WHAT A MULTIBOOT LOADER PROVIDED.

This means that in order to use an instruction that references the stack in 32-bit protected mode you only need to set ESP. You DO NOT have to set SS. It will just work. Period, end of story. Anything else is nonsense unless you have a CPU with a bug. The access info, base, limit etc come from the descriptor cache which were loaded when the Multiboot loader created a GDT and set up all the segment registers initially (flat model) - THAT is guaranteed by the spec. After that the descriptor cache is referenced for all memory operands. The GDT Record (or LDT record) is not referenced (nor is the GDT or LDT itself) when you use a memory operand - period.

Things change when you want to LOAD a segment register with a selector. That is when the GDT (or the LDT) will be accessed (via the GDT record which Multiboot may leave in an invalid state), and it as that point that things are undefined. It may or may not work, but you can't make any assumptions about it. Loading a selector into a segment register requires the CPU looking at the GDT or LDT to get the descriptor information. That requires a GDT (or LDT) record loaded into the CPU that points at a GDT/LDT that has valid selectors in it. The Multiboot spec makes it clear that the GDT record (and what it points at) may be invalid and can't be relied on. What is in the descriptor cache is valid, as Multiboot guarantees that the entries in the descriptor cache is set as a flat data and code segment memory model.

With that being said it is very safe to set ESP to an appropriate memory address. The value in ESP isn't guaranteed to be usable (by the Multiboot spec) so you do need to set it before using the stack (like pushing parameters on the stack and calling into a kernel entry point). The kernel code can then do as it pleases until it needs to load a segment register (directly or indirectly). If you want interrupts you'll need a GDT. The kernel can set that up when it needs it. That would require reloading the segment registers with appropriate selectors.
bzt wrote: You can find a minimal PoC on this very forum. In order to use the BIOS, my code switched back to real mode, and some shadow registers left unchanged causing faults in the BIOS routines, producing exactly the same error messages in bochs.
Again, this isn't what I am referring to. To enter real mode your code needs to load new values into the selector registers. That is where you need to have a GDT that is in a known good state and a GDT record (or LGDT) that points at it. If you load the SS register you of course can run into problems because you are loading the segment registers and without a valid GDT record it may not work as expected (if at all).

I'm looking for an example that fails when you NEVER modify any of the segment registers (directly or indirectly). What you are talking about is not what I am talking about. The multiboot spec doesn't guarantee that a particular selector is the code or segment selector. Those values can change between implementations (real GRUB vs QEMU `-kernel` option behave differently). Beyond a CPU bug or buggy Multiboot compliant bootloader all the segment registers and descriptor cache are loaded with required flat model values per the Multiboot spec.

I'm getting the general impression you don't understand how segment selectors, segment registers, descriptors, and the descriptor cache work at the architectural level. If you understood these things, and understood what the Multiboot spec actually says you would know what is safe and what isn't and why.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Jumping to GRUB module adress causes rpl !=CPL in bochs

Post by bzt »

MichaelPetch wrote:You DO NOT have to set SS. It will just work.
The OP has problems with the Grub provided segments registers, they do not just work for them, that's what this topic is all about.
MichaelPetch wrote:
Calling a function requires accessing SS as well as ESP, neither is at a known state, you should not rely on them.
This is statement is FALSE.
Oh but it is true by all means. Read Intel Software Developer Manual 3A, Chapter 3 "Protected-mode Memory Management": All stack operations use the SS register to find the stack segment. This includes both CALL and RET, they access SS as well as ESP. According to the Multiboot spec, in the shadow stack segment descriptor of SS only the base and size are guaranteed, but not the other bits like privileges or direction (being the subject of this topic); and you also said yourself the ESP doesn't have any guaranteed value.

You may rely on Grub passing valid values in those registers, but you shouldn't. It is better to have all the bits in the shadow descriptor set to a known state not just base and size. I did not say "have to" or "must", I said "you should not rely on them" because it's better to be safe than sorry. Sorry if this wasn't obvious, I mean "have to", "must" and "should" in an RFC-sense.
MichaelPetch wrote:Again, this isn't what I am referring to.
I know. You elegantly forget to quote the part where I said "I never said nor wanted to imply the OP's problem is also BIOS related, or related to my exact PoC". That was just yet another example which required not relying on the Grub provided register values, nothing more.
MichaelPetch wrote:I'm getting the general impression you don't understand how segment selectors, segment registers, descriptors, and the descriptor cache work at the architectural level.
No need to lower yourself to personal insults. It is you who have forgotten that descriptors have much more than base and size fields, and that Multiboot does not specifies those, just base and size. And just for the records, I've learned these things about 25 years ago.

I see no point in continuing this discussion, I think it's safe to say we both agree on that it's better for the OP to set up a known SS and ESP as soon as possible, and that it will likely solve their problem.

Cheers,
bzt
Post Reply