OSDev.org

Posted: **Sun Oct 30, 2022 9:02 am**

rdos wrote:
devc1 wrote:What if a high-end server computer with 20 hard drives and 64 gb of RAM. What would you do in this case ?
I think this is one of the limitations that 64 bit mode addresses ?
Not at all. My new disc buffering scheme will use physical address, not linear. File systems will be run the "micro kernel" way in their own processes having 2G of private linear memory. That's enough to cache meta data for any reasonable file system. Of course, each of the hard drives will have their own server process.

I don't think the issue is so much what cannot be done with protected mode, but rather how long protected mode will work and possible drawbacks performance wise as Intel and AMD optimize their processors for long mode.

This is not what I mean, I mean that a high end server will use MMIO hard drive controllers (AHCI, NVMe..) because they are faster. And you can't access the full address space or the full RAM because your OS can only see 64gb of address space.

Posted: **Sun Oct 30, 2022 10:51 am**

devc1 wrote:
rdos wrote:
devc1 wrote:What if a high-end server computer with 20 hard drives and 64 gb of RAM. What would you do in this case ?
I think this is one of the limitations that 64 bit mode addresses ?
Not at all. My new disc buffering scheme will use physical address, not linear. File systems will be run the "micro kernel" way in their own processes having 2G of private linear memory. That's enough to cache meta data for any reasonable file system. Of course, each of the hard drives will have their own server process.

I don't think the issue is so much what cannot be done with protected mode, but rather how long protected mode will work and possible drawbacks performance wise as Intel and AMD optimize their processors for long mode.
This is not what I mean, I mean that a high end server will use MMIO hard drive controllers (AHCI, NVMe..) because they are faster. And you can't access the full address space or the full RAM because your OS can only see 64gb of address space.

I have an AHCI driver, and I could write a NVMe driver (probably will do some day). My OS can see all physical RAM because PAE use 64-bit addresses and so can access all physical RAM. As I wrote before in the thread, my spectrum analyzer has 128 GB of RAM, and I can use all of it in my analysis program by memory mapping a 2M buffer at a time.

The analyzer hardware is a PCIe FPGA that use BAR0 as an array of physical addresses that it should write sample data to. The FPGA will use 128 byte PCIe transactions to write directly to physical memory. At the OS side, the OS will allocate the desired number of 2M memory blocks, put them in BAR0 and start the data collection. Then the analysis program will start 12 processor cores that will memory map their current buffer and do the analysis in parallel with sampling. I think this is a high end application that I originally thought would require using long mode, but this is not so. It works perfectly well in protected mode too, even though a bit slower since I cannot use 64-bit registers.

I have a limit for how much physical memory I support, but it is a software limit. I currently have a limit of 256 GB, but it can be increased if needed. This is based on the need to map the physical bitmap in kernel linear memory. Each byte can map 8 4k pages, 1kb can map 32MB, 1MB can map 32GB, and 1GB can map 32TB. So, there is some limit around 10-20TB or so, but I have no hardware with that much memory, and so I don't reserve space for such a large bitmap. Actually, the largest kernel linear area is for the page based allocator, which currently is mostly used for disc buffers. With the new filesystem structure, this area will no longer be used for disc buffers, and so can been reduced considerably, and the physical bitmap can be increased.

I think it is possible to increase the amount of supported physical memory further. One idea would be to use a page based bitmap only up to 8 G or so, and for higher addresses use a 2M bitmap instead. In this bitmap, 1 byte can map 8x2MB, 1k can map 16G, 1M can map 16T and 1G can map 16384T. This configuration can support at least 5000TB of physical memory.

Long mode OSes that map physical memory in the 48-bit linear address space can at most support 256T of memory, probably a lot less.

So, no, there are no practical limits for how much physical memory a 32-bit OS can support. It's a matter of smart algorithms only.

Posted: **Sun Oct 30, 2022 11:50 am**

That's really interesting, rdos. Right before you posted I was thinking you could implement bank switching for a single process to access the entire physical memory, even if you could only have a 2GB window at one time.

Posted: **Sun Oct 30, 2022 12:10 pm**

AndrewAPrice wrote:That's really interesting, rdos. Right before you posted I was thinking you could implement bank switching for a single process to access the entire physical memory, even if you could only have a 2GB window at one time.

Well, a particular process would use paging to access up to 2GB of physical memory. By creating many such processes (file system partition servers), a lot more of directly accessible memory would be possible. The other way is to allocate a 2M (or larger) linear area, and then memory map it to a large physical memory buffer with a syscall.

Actually, I made the mistake that I didn't believe that files could be larger than 4G (actually 2G), but I've changed file positioning to use 64-bits in the new API. While a 32-bit app could not directly memory map a file larger than 4GB, a long mode app could not memory map a file that is 256TB either, and so generally speaking, file data access must be based on using windows into the file regardless if the OS is 32-bit or 64-bit. This is how my new file API will work too. It will create one or more windows into the file. As long as the apllication do reads inside these windows, no syscalls are need. When the application tries to read data outside of them, the filesystem must be called to read more data and optimize the cached buffers. Note that it will be the application that provide the linear space for file data, not the kernel, which means no kernel linear memory is consumed for file data.

Posted: **Sun Oct 30, 2022 12:14 pm**

devc1 wrote:However, the only thing that I hate in arm is that arm is a RISC architecture, which makes it lower in performance than CISC cpus like x86. I'm I right ?

This seems to be a common misconception. RISC is faster than CISC. In fact, modern CISC CPUs convert CISC instructions into a RISC format first.

RISC is faster since execution logic is simpler. This means that things like caches, branch prediction tables, etc can all be larger and hence make the system faster.

Posted: **Sun Oct 30, 2022 12:44 pm**

Indeed. You only have to look at the new RISC Macs compared to the Intel ones.

Posted: **Sun Oct 30, 2022 1:01 pm**

Speed is not everything. RISC architectures are too unstable and rely on portable C code. I don't particularly enjoy portable C code scattered with ifdefs that I don't know the values of.

If I want speed, I use FPGAs instead of RISC processors. Verilog is a lot more scalable than any CPU technology available.

If I want something I know how it operates and that can be easily adapted, I use x86 and my own OS.

Posted: **Sun Oct 30, 2022 1:10 pm**

rdos wrote:RISC architectures are too unstable

By whom may I ask

rdos wrote:and rely on portable C code.

That's kind of the idea.

rdos wrote:I don't particularly enjoy portable C code scattered with ifdefs that I don't know the values of.

No different than assembly where you have no idea what values are in what registers.

Posted: **Sun Oct 30, 2022 1:22 pm**

nexos wrote:
rdos wrote:RISC architectures are too unstable
By whom may I ask

The protected mode interface in x86 processors has existed since the 80s, and still is functional. Give me an example of a RISC processor where the binary code written a decade ago still works on a modern variant. That's what I meant by unstable.

nexos wrote:
rdos wrote:I don't particularly enjoy portable C code scattered with ifdefs that I don't know the values of.
No different than assembly where you have no idea what values are in what registers.

The expected register content should be documented for each procedure. The values of ifdefs are never documented per source file, might not even be found in the projects include files, and sometimes are created by autoconf. Basically no project will document what the ifdefs are supposed to be set to or what their functions are.

Personally, I've adopted ifdef free programming. There are no ifdefs in the code I write, and I typically will remove ifdefs in code I port. Something that increase readability a lot.

Posted: **Sun Oct 30, 2022 1:30 pm**

rdos wrote:Give me an example of a RISC processor where the binary code written a decade ago still works on a modern variant.

ARM. That problem is just a consequence of the research nature of RISC.

rdos wrote:The expected register content should be documented for each procedure.

I'm talking in the internal logic of a procedure. Trust me, it becomes a mess.

rdos wrote:The values of ifdefs are never documented per source file

That's because they apply to multiple source files.

rdos wrote:might not even be found in the projects include files,

At least they have a decent names unlike registers.

rdos wrote:and sometimes are created by autoconf.

Autoconf is a big stain on the face of the earth. Don't equate autoconf and C. If you don't like autoconf, don't use it!

rdos wrote:Basically no project will document what the ifdefs are supposed to be set to or what their functions are.

Give me an example. Any decent project will document this; e.g., on CMake, cache variables that correspond to macros are documented.

rdos wrote:Personally, I've adopted ifdef free programming. There are no ifdefs in the code I write, and I typically will remove ifdefs in code I port. Something that increase readability a lot.

And in the process you make versioning and support optional features a million times harder.

Besides, macros have a small place in C in the end. They indicate optional features, options, include guards, version, and little else. What's your problem with macros?

Posted: **Sun Oct 30, 2022 1:32 pm**

rdos wrote:Give me an example of a RISC processor where the binary code written a decade ago still works on a modern variant. That's what I meant by unstable.

Sure. RISCOS, originally written many decades ago, runs on the Raspberry Pi. And RISCOS userland programs run just fine on the Pi. The modern ARM processors bear a lot more resemblance to their ancestors than an i9 does to an 8086.

Posted: **Sun Oct 30, 2022 1:53 pm**

nexos wrote: ARM. That problem is just a consequence of the research nature of RISC.

I've worked on ARM projects using Keil. It's not a pleasant experience when you want to do something with the hardware the software interface doesn't support. Generally, when I implement a driver for some device or protocol, I will write code to interface with the hardware or protocol. With ARM, I'm stuck with the software interface that Keil exports and cannot use features of the hardware they decided as not important.

nexos wrote:
rdos wrote:and sometimes are created by autoconf.
Autoconf is a big stain on the face of the earth. Don't equate autoconf and C. If you don't like autoconf, don't use it!

Well, if the project generates include files with autoconf, I obviously cannot decide not to use it. Several projects I've looked at use this method. For instance, OpenSSL creates configuration include files the whole project is dependent on with autoconf. Since autoconf doesn't run on my OS, I had to run it on a Cygwin32 installation and then copying the resulting files & adding them to my SVN.

Posted: **Sun Oct 30, 2022 2:09 pm**

iansjack wrote:
rdos wrote:Give me an example of a RISC processor where the binary code written a decade ago still works on a modern variant. That's what I meant by unstable.
Sure. RISCOS, originally written many decades ago, runs on the Raspberry Pi. And RISCOS userland programs run just fine on the Pi. The modern ARM processors bear a lot more resemblance to their ancestors than an i9 does to an 8086.

According to Wikipedia, RISCOS is not a binary compatible platform, but a source code compatible platform. IOW, you need to compile the source for your target RISC processor. Or use some precompiled image.

Additionally, RISCOS rely on the quite outdated concept of cooperative multitasking.

Posted: **Sun Oct 30, 2022 2:19 pm**

Wow, how do you access 64 bit addresses using this PAE thing ? Do you remap them or what ? I don't know anything about PAE else than it just adds a few bits to addressing.

-----

I just wonder, why there is only AVX512. Why can't we have some extension that let's us do math on 1000 integers at once like gpus ? Or perform parallel conditionnal jumps on those 1000 integers ? I tried (started) making a chess engine, and it seemed that there is no big help from SIMD. Because for e.g. if I popcnt 8 bitboards, I will always need to iterate through them one by one.

Posted: **Sun Oct 30, 2022 2:20 pm**

Obviously the operating system needs a degree of adaptation to work on totally different hardware. But user programs written for RISCOS work fine on the Pi.

I’ve no doubt that your other comments about RISCOS have some value - it’s a very old operating system. But that is completely irrelevant to your assertion.

And, in any case, what does it matter if software has to be recompiled to run on newer hardware? If it requires no significant change to the source code it doesn’t matter. I’m sure that most people here are reasonably familiar with compiling software.

OSDev.org

What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services

Re: What are services