OSDev.org

Posted: **Tue Nov 11, 2008 4:17 pm**

A very stupid question on PAE.
This is a 32-bit compiled, single-thread, NON-PAGING, x86 based RTOS. If a PAE enabled Linux Host gives a 64-bit Host address (high_32 & low_32), Is there any trick by which this 32-bit RTOS can copy data from that 64-bit address? The RTOS has no PAE and GDT is set for up to 4GB. AFAIK, such a thing is not possible, but I'm just throwing this question out of optimism.

Posted: **Tue Nov 11, 2008 4:53 pm**

What host? If you run your OS in a virtual machine, then the virtual machine monitor takes care of all those. Otherwise, your question is meaningless.

Posted: **Tue Nov 11, 2008 5:05 pm**

I'm not sure I understand what you're asking. Are you trying to make the host and guest OS communicate? Like when using VMWare Tools or VirtualBox Guest Additions?

Posted: **Tue Nov 11, 2008 7:20 pm**

Love4Boobies wrote:I'm not sure I understand what you're asking. Are you trying to make the host and guest OS communicate? Like when using VMWare Tools or VirtualBox Guest Additions?

The host (linux) and the guest-OS (my os) are already communicating and transferring data in 32-bit.

Posted: **Tue Nov 11, 2008 11:26 pm**

Hi,

sawdust wrote:This is a 32-bit compiled, single-thread, NON-PAGING, x86 based RTOS. If a PAE enabled Linux Host gives a 64-bit Host address (high_32 & low_32), Is there any trick by which this 32-bit RTOS can copy data from that 64-bit address? The RTOS has no PAE and GDT is set for up to 4GB. AFAIK, such a thing is not possible, but I'm just throwing this question out of optimism.

To access anything in the physical address space that's above 4 GiB you normally need to use PAE, PSE36 or long mode. If paging isn't an option, then...

One thing you might be able to try is bus mastering on a 64-bit PCI device. For example, ask a SATA controller to transfer the data onto disk sectors (using 64-bit bus mastering), and then read those sectors back into memory below 4 GiB (using any method you like). If there is no suitable 64-bit PCI device, then a 32-bit PCI device might still work *if* you've got an I/O MMU you can use.

Failing that, there might be a way to do something with the memory controller/northbridge (e.g. disable one bank of RAM and reconfigure other banks of RAM so they appear at lower addresses). The chance of this working reliably for a specific memory controller is extremely small - there's lots of things to be careful of (CPU caches, SMM, etc). The chance of this working reliably for all memory controllers is virtually zero.

Lastly, there might be special features in the CPU's MSRs. For example, 64-bit Athlons have a "32-bit Address Wrap Disable" flag (bit 17 in the HWCR Register, MSR 0xC0010015), which I assume allows you to have a 32-bit segment where the base is 0xFFFFFFFF and the limit is 4 GiB, that could be used to access physical addresses 0xFFFFFFFF to 0x1FFFFFFFE.

IMHO probably the best option for a "no paging OS" is to have a function that enables paging and PAE, copies data to/from an address, and then disables paging - basically a function that does 64-bit block memory copies. It'd still mean that almost everything runs without paging; and it'd also be useful for other things (e.g. on a computer with 12 GiB of RAM, you could use the normally inaccessible RAM as swap space).

Cheers,

Brendan

Posted: **Wed Nov 12, 2008 10:28 am**

Brendan wrote: IMHO probably the best option for a "no paging OS" is to have a function that enables paging and PAE, copies data to/from an address, and then disables paging - basically a function that does 64-bit block memory copies. It'd still mean that almost everything runs without paging; and it'd also be useful for other things (e.g. on a computer with 12 GiB of RAM, you could use the normally inaccessible RAM as swap space).

Hi Brendan,
Thanks a lot for very thoughtful ideas. Since my OS is a single-task one, do you recommend that I should go with a 'bare minimum paging' ? I have no need for mallocs.
My application does a lot of mempy, enabling back & forth of PAE mode seems less desirable. Currently the GDT is set for full 4GB segments, using 1:1 address and am afraid to change too much.
All of your suggestions and pointers are appreciated.
TIA

Posted: **Wed Nov 12, 2008 11:46 am**

Hi,

sawdust wrote:Thanks a lot for very thoughtful ideas. Since my OS is a single-task one, do you recommend that I should go with a 'bare minimum paging' ? I have no need for mallocs.

That depends - is there a reason you chose to not use paging to start with?

For example, I can imagine an RTOS for small/embedded systems where the unpredictability involved with TLB misses is undesirable (although to be honest, this doesn't seem like a good reason after you consider the unpredictability that SMM introduces on 80x86 systems).

sawdust wrote:My application does a lot of mempy, enabling back & forth of PAE mode seems less desirable.

Would enabling PAE and leaving it enabled remove the need for a lot of these memory copies? From a performance perspective, the overhead of paging (TLB misses, etc) may be much less than the overhead of segmentation (dealing with physical RAM fragmentation).

sawdust wrote:Currently the GDT is set for full 4GB segments, using 1:1 address and am afraid to change too much.

For a "64-bit memory copy" function, you wouldn't necessarily need to change anything - just allocate some pages (page directory pointer table, page directory, and at least one page table), identity map the page/s used by the "64-bit memory copy" function, map the source pages, map the destination pages, disable IRQs, enable paging/PAE, do the copy, then disable paging/PAE, enable IRQs and free any pages you allocated (unless they're permanently allocated to save allocation/deallocation time).

However, in this case you'd introduce extra IRQ latency, which can be bad for a RTOS. For example, if someone asks to copy a large amount of data then IRQs could be disabled for far too long. One way to avoid that would be to have IRQ handlers that work when paging/PAE is enabled, which could become considerably complicated as you'd need to worry about everything that any IRQ handler could rely on. An alternative way would be to limit the size of each memory copy (e.g. split a large memory copy into many small memory copies); or perhaps have some IRQ handlers that are used when paging is enabled that disable paging and call the real IRQ handlers and then enable paging again after the real IRQ handlers return.

Mostly, I need to know why you've made the design decisions you have. Without knowing why, my default opinions start taking over (e.g. IMHO 80x86 is a bad architecture for "hard real time", and "soft real time" is mostly just a marketing term for any general purpose OS. Single-tasking is mostly pointless now ("OMG! I've got 16 CPUs and I can only run *one* task???") and "no paging" sounds like "masochist" to me/)...

Cheers,

Brendan

Posted: **Wed Nov 12, 2008 3:44 pm**

Brendan wrote:Hi,
That depends - is there a reason you chose to not use paging to start with?
Mostly, I need to know why you've made the design decisions you have. Without knowing why, my default opinions start taking over (e.g. IMHO 80x86 is a bad architecture for "hard real time", and "soft real time" is mostly just a marketing term for any general purpose OS. Single-tasking is mostly pointless now ("OMG! I've got 16 CPUs and I can only run *one* task???") and "no paging" sounds like "masochist" to me/)...
Brendan

Ease of implementation was the only reason for not choosing paging when I started. Now that I'm faced with this >4GiB access, looks like I should go with paging. My RTOS app is a kind of dedicated weather-bug. It gets raw data from hard drives and does weather calculations. This involves heavy disk i/o and dedicated processing.

Posted: **Wed Nov 12, 2008 11:02 pm**

Hi,

sawdust wrote:Ease of implementation was the only reason for not choosing paging when I started. Now that I'm faced with this >4GiB access, looks like I should go with paging. My RTOS app is a kind of dedicated weather-bug. It gets raw data from hard drives and does weather calculations. This involves heavy disk i/o and dedicated processing.

I tend to start a new rewrite every 12 months or so - when I'm writing one version of my OS I learn more, and when I've learnt enough my previous designs don't sound as good or as ambitious as they used to, and I don't like spending time working on a project that isn't as good as possible. That's just me though...

If you were god and had an infinite amount of time and an infinite amount of knowledge and wanted to write the perfect OS for weather calculations, what would that OS look like?

Cheers,

Brendan

Posted: **Thu Nov 13, 2008 12:00 pm**

Brendan wrote: I tend to start a new rewrite every 12 months or so - when I'm writing one version of my OS I learn more, and when I've learnt enough my previous designs don't sound as good or as ambitious as they used to, and I don't like spending time working on a project that isn't as good as possible. That's just me though...
Brendan

You are very right. I wasn't venturing enough to change my design. This is just my evening pet project and I should be ready to change as it demands.

OSDev.org

A dumb qn on 64-bit access

A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access

Re: A dumb qn on 64-bit access