OSDev.org

Posted: **Sat May 03, 2025 6:43 am**

My PMM uses bitmap to track allocated pages. I am about to implement shared memory and I need to track what is shared. Can I use reference counter? Instead of one bit for page I can use two bits it represents number between 0 and 3 and serve as reference counter. Is it used by anyone in practice? Why have I never heard about it?

Posted: **Sat May 03, 2025 8:47 am**

Shared memory is a special application, and you don't really want to burden all allocations with a detail about one specific application, do you? Better to have a list of memory shares separately. That way you don't burden the physical allocator with that stuff all the time, and you manage your shared memory through another mechanism.

Using multiple bits is not really a good idea, since it reduces the information density, so you now need more memory for the management structures. And 3 is probably not going to be enough for heavily shared memory. Think of a libc, which is shared among all processes in the system. Or if you use static linking, think of just having a couple instances of the same process. But increasing the size further would further reduce the density and therefore increase the memory requirements. And for what? Very little gain.

Posted: **Sat May 03, 2025 12:38 pm**

Free pages have no reason to have bits for shared memory since this is only relevant for allocated pages.

Information about shared memory, for instance by supporting fork, should either be in the page tables, or if this is not enough, in separate structures. The copy-on-write bit should be in the page tables, and so should various flags, such as whether it is accessed or modified, which are all used when implementing shared memory.

Posted: **Sun May 04, 2025 11:18 am**

nullplan wrote: ↑Sat May 03, 2025 8:47 am Shared memory is a special application, and you don't really want to burden all allocations with a detail about one specific application, do you? Better to have a list of memory shares separately. That way you don't burden the physical allocator with that stuff all the time, and you manage your shared memory through another mechanism.

Using multiple bits is not really a good idea, since it reduces the information density, so you now need more memory for the management structures. And 3 is probably not going to be enough for heavily shared memory. Think of a libc, which is shared among all processes in the system. Or if you use static linking, think of just having a couple instances of the same process. But increasing the size further would further reduce the density and therefore increase the memory requirements. And for what? Very little gain.

I thought that every application will have some shared memory with device drivers (either direct, or thru libraries) to do IO. So there will be a lot of shared memory and space overhead for memory manager is worth it, because separate list of memory shares also requires space. But maybe you are right, most of the memory is not shared.

Posted: **Sun May 04, 2025 11:24 am**

Applications typically do not share memory with device drivers. Application buffers might be passed to device drivers, which then might add them (temporarily) to the physical schedule of a device. While this happens, the application should not be able to access the buffer. So, there is no real sharing.

Posted: **Sun May 04, 2025 1:01 pm**

rdos wrote: ↑Sun May 04, 2025 11:24 am Applications typically do not share memory with device drivers. Application buffers might be passed to device drivers, which then might add them (temporarily) to the physical schedule of a device. While this happens, the application should not be able to access the buffer. So, there is no real sharing.

If I want application to do what it typically does, I would use Linux. But I'm building my own OS to do something different. As modern hardware tends to have more CPUs (CPU cores), I assume it can improve latency if application and drivers run in parallel and exchange information with minimal involvement of syscalls. So I need shared memory to do the communication.

Posted: **Mon May 05, 2025 1:05 am**

vlad9486 wrote: ↑Sun May 04, 2025 1:01 pm
rdos wrote: ↑Sun May 04, 2025 11:24 am Applications typically do not share memory with device drivers. Application buffers might be passed to device drivers, which then might add them (temporarily) to the physical schedule of a device. While this happens, the application should not be able to access the buffer. So, there is no real sharing.
If I want application to do what it typically does, I would use Linux. But I'm building my own OS to do something different. As modern hardware tends to have more CPUs (CPU cores), I assume it can improve latency if application and drivers run in parallel and exchange information with minimal involvement of syscalls. So I need shared memory to do the communication.

Sometimes this is possible, sometimes not. When it is possible, the application typically needs to own the device, otherwise it cannot be allowed to control it directly. So, this could work with display if the application runs full screen, or with things like audio if the application reserves the audio driver. It will not work with discs, since applications typically use filesystems rather than raw sectors and filesystems are shared between all running applications (and potentially kernel too). I use a dynamic memory mapping of files that allows more parallelism and avoids syscalls, but this does not operate directly with the disc device. It uses a filesystem server process to request new mappings.

I think a more useful design will not connect applications directly with device drivers, rather define interfaces for functions that can be handled with schedules rather than syscalls. Similar to how I defined the new file interface with a structure with pointers to mapped file data.

Also, if you desire to use multiple cores, you need to design a kernel that can handle multiple cores, which is a complex task in itself, but very useful.

Still, this doesn't have anything to do with physical memory handling. It's related to linear memory handling and paging. This type of sharing is done with paging in typical systems. The sharing is done with linear addresses, and physical memory only gets involved because it must be mapped to physical storage.

However, if you want to work directly with physical memory queues of devices, then you need to give the application access to all physical memory, which means you have no protection between applications and kernel, and then you might just as well run your applications in kernel ring too.

Posted: **Mon May 05, 2025 6:00 am**

vlad9486 wrote: ↑Sat May 03, 2025 6:43 am My PMM uses bitmap to track allocated pages. I am about to implement shared memory and I need to track what is shared. Can I use reference counter? Instead of one bit for page I can use two bits it represents number between 0 and 3 and serve as reference counter. Is it used by anyone in practice? Why have I never heard about it?

While a PTE might have spare bits for use by the OS, I tend not to use them for portability reasons. Different architectures can have differences in what bits are available for system use, and can't be relied on.

Case in point, how many bits do you have available in an x86 PTE? And how many bits do you need to count references to that PTE?

Shared memory is a concept of virtual memory, and as such, should be handled at the virtual memory level, not at the physical memory allocation and mapping level.

So I hide completely the format of the PTE from the rest of the kernel, and implement copy on write and shared memory entirely in platform independent VM code and data structures that have no knowledge of how virtual address are mapped to physical addresses.

Each allocated physical page has a corresponding VM page structure, which contains the physical page number of the allocated page, a count of copy on write references, and various flags which represent the state of the page (referenced, dirty, pinned).

So all the virtual memory code, entirely written in C, works only with the abstract VM page code to handle VM operations like sharing and copy on write, and defers the actual mapping to opaque platform specific methods that map a VM page to some virtual address.

As an added bonus (in my eyes, anyway), the physical data structures used to implement the mapping become entirely transient, and can be discarded as needed, because they can be entirely recreated from the platform independent data structures.

As an example, I'm aiming my kernel to be able to scale down to platforms with small amounts of memory, and my kernel can operate with multiple address spaces using a single page table in the extreme case, discarding mappings whenever we switch address spaces. Not ideal or optimal for runtime, and for small memory limited systems, you might only be discarding two or three page tables worth of mappings on an address space switch, which for a single user process, would never happen anyway.

On the flip side, I can also scale up the number of concurrent address spaces I retain data for, so a working set of processes can switch address spaces without discarding their existing mappings, up to the limit implemented in the kernel.

Posted: **Mon May 05, 2025 12:40 pm**

Interesting design of sharing. I didn't design my sharing like that since I usually don't have forked code, rather applications typically are started with "load process", and I only have a light-weight fork implementation which needs to handle sharing and copy-on-write because I wanted a bit of Posix compatibility. My fork uses PTEs and it cannot handle complex scenarios with multiple forks and new threads. It was mostly written so it could be used to load a new program with exec.

OSDev.org

Shared memory

Shared memory

Re: Shared memory

Re: Shared memory

Re: Shared memory

Re: Shared memory

Re: Shared memory

Re: Shared memory

Re: Shared memory

Re: Shared memory