Paging to the web?

joshw · Post by **joshw** » Wed Jun 18, 2025 3:09 am

Seems pretty quiet in here these days.

I know right now there's a limitation, like 48 bits or so actually useful for addressing in 64-bit, but say it's not far off being able to address 64 bits or more.

What if we paged to the web? Moreover, what if we gave every document or piece of data an address, and when you need something, you just reference that memory address, and it's downloaded, swapped in, and the OS lets you access it seamlessly as if it's always in memory?

Maybe things are content-addressable, grouped by similarity, or grouped by identity.

Being intentionally vague to see what comes up. Have you guys floated this idea around?

iansjack · Post by **iansjack** » Wed Jun 18, 2025 3:35 am

joshw wrote: ↑Wed Jun 18, 2025 3:09 amwhat if we gave every document or piece of data an address, and when you need something, you just reference that memory address, and it's downloaded, swapped in, and the OS lets you access it seamlessly as if it's always in memory?

Isn't that what is called a URL?

(Note that the Internet is estimated to occupy more than 64 zetabytes - and growing. A zetabyte would need a 70-bit address space.)

joshw · Post by **joshw** » Wed Jun 18, 2025 5:06 am

iansjack wrote: ↑Wed Jun 18, 2025 3:35 am Isn't that what is called a URL?

Sure, it's one form of addressing. But not a permanent pointer-based address. You can't get the resource with a permanent pointer in memory unless you devised a scheme to page it in. You could also write to it and (if you had access), it could be published by paging it out.

So, for example, file hashes could map to a memory region. Trim the bit length, and any file can be loaded.

iansjack wrote: ↑Wed Jun 18, 2025 3:35 am(Note that the Internet is estimated to occupy more than 64 zetabytes - and growing. A zetabyte would need a 70-bit address space.)

64 bits is definitely not enough, much less 48. So, we use a 128-bit or arbitrary address space. The page tables would be too large, but a theoretical MMU could expect a hash map instead (or whatever may have been proposed). We only use as much page table space as we have addressable things currently in use.

iansjack · Post by **iansjack** » Wed Jun 18, 2025 5:42 am

But a memory address is no good on its own; nobody can remember an address. You need some form of name to translate to that address. And a URL is pretty good for that purpose - far better than a simple hierarchical directory would be with such a large amount of data. And you are going to need some form of distributed database to keep track of all those addresses and their updates - exactly what we already have; just think of the volumes of updates if your computer had to know all those addresses upfront without doing the equivalent of DNS queries.

It is already possible for data providers to allow their web pages to be written to (this page for example). But I suspect that most data providers would prefer to keep most of their data as read-only.

I don't really see any point in permanently mapping every available piece of data to a fixed numerical address. It's just too complicated and would require too much maintenance. I guess my question is what would be the point of the scheme you posit? It seems to me to be a solution looking for a requirement.

One thing you can be sure of - however large an address space you allocate for this purpose, this time next year you'll need an address space twice as large. All those pictures of cute kittens.

joshw · Post by **joshw** » Wed Jun 18, 2025 6:39 am

Here is an example setup that addresses those things. The hypothetical OS could join the IPFS network, tracking file identifiers using the Kademlia Distributed Hash Table. Your file system is just a big mapping of full path names to file identifiers. Just like IPFS, there's a scheme for local caching, and one for indexing the network. This would also add a scheme to map files into memory, e.g. by taking the first 96 bits of the file hash and leaving 4gb address space for the contents of the file (32 bits).

If you try to read or write to a file that's not present, the OS would just pull it from IPFS instead of from disk, by decoding the memory address and mapping it back to the network address.

Since 96 bits is probably not enough to reconstruct the full file hash to look it up, a similar KDHT would probably have to be modified to include something derived from the 96 bits.

No practical use in mind, just an interesting thought experiment. Could spark ideas for some other uses for large address spaces.

eekee · Post by **eekee** » Wed Jun 18, 2025 1:11 pm

I think I've found a problem where the simple solution is to just not do this, but you do get to mmap(url) by the end of this paragraph.

It goes like this: If you want an integer to address any byte on the web (HTTP), you'd need a database mapping address ranges to URLs. The content at any URL may change size at any time without notification to the database. When it grows longer than the address range allocated in the database, it needs to be moved to a new address. On the client side, this looks like realloc(); all pointers will have to be updated. Thus, if you address any byte of any web content at any time and expect parts to be paged in automatically, you have to make allowances for content changing during a program's run; you have to be ready to change the base address of the region you're accessing at any time. It would be better if the system required some sort of open or map function to be called before access, downloading the content to cache, mapping the cache into memory, and returning the base address. It would entirely make sense for the function to take a URL instead of an address, obviating the single web address space.

This is the first I've heard of IPFS, and all a quick look didn't give me enough to comment on.

I don't know half as much about content stores as I'd like; only 1 concrete example which was designed for an arbitrary number of users freely uploading content. It gave every uploaded item a uuid. Replacement content got a new uuid. Theoretically, such a store could be simply addressed as uuid_bits + bits(max_file_size). Let's say bits(max_file_size) = 64; that means 192 bit addresses, woof! You'd need layers and layers of caching to get any sort of performance out of it because even decoding 32 bits down to byte addresses runs into performance limits. Also, now we're hardcoding max_file_size, this is technically not future-proof.

Honestly, this all reaffirms my belief that large single-address spaces create deceptively complex conditions. Simple memory management for all but the simplest programs requires arbitrary limits or yield fragmentation, leakage and worse. Complex memory management code inevitably becomes very complex for performance. Every time this comes up, I get one step closer to coding an OS for the 8086

joshw · Post by **joshw** » Wed Jun 18, 2025 8:56 pm

eekee wrote: ↑Wed Jun 18, 2025 1:11 pm The content at any URL may change size at any time without notification to the database.

That's true! I guess the content-addressable space would be better for immutable files, and read-write stuff would be better represented by uuids, which can retain their address. For outgrowth, I'm sure there could be a method that can e.g. compute a derived uuid for subsequent blocks. Complicated, but not much moreso than current file sharing protocols.

eekee wrote: ↑Wed Jun 18, 2025 1:11 pm Every time this comes up...

heh... must be a popular thought!

eekee wrote: ↑Wed Jun 18, 2025 1:11 pm ...one step closer to coding an OS for the 8086

I'm all about it!

Alright, well good exercise. How about this idea (barring program isolation, etc.): memory and storage are unified, and files are objects referencing each other with pointers. You just map to your local storage or home network, and wire paging to that. Of course, this doesn't preclude you from having to write an efficient data store or sensible layout. But it does potentially make it simpler to open and read files. Could be fun to code for

joshw · Post by **joshw** » Wed Jun 18, 2025 9:18 pm

iansjack wrote: ↑Wed Jun 18, 2025 5:42 am All those pictures of cute kittens.

Given the number of combinations possible, could a generative AI be trained to map a given cat image into n-dimensional bit space and back?

iansjack · Post by **iansjack** » Wed Jun 18, 2025 11:31 pm

joshw wrote: ↑Wed Jun 18, 2025 8:56 pm Alright, well good exercise. How about this idea (barring program isolation, etc.): memory and storage are unified, and files are objects referencing each other with pointers.

Ah - IBM i (aka OS/400).

OSDev.org

Paging to the web?

Paging to the web?

Re: Paging to the web?

Re: Paging to the web?

Re: Paging to the web?

Re: Paging to the web?

Re: Paging to the web?

Re: Paging to the web?

Re: Paging to the web?

Re: Paging to the web?