Writing a good, modular network stack

CaydendW · Post by **CaydendW** » Tue Jun 13, 2023 11:32 am

Hi all,

I recently made my kernel suck less by making it be able to load modules at runtime and be able to build them into the kernel. Basically, Linux's and FreeBSD's module system had a Frankenstein child.

I am now on a somewhat modular kernel binge and want to make the kernel a little more modular and make components be loadable only when needed.

Enter my problem: the network stack I'm planning to build. I've made a nice little e1000 driver and have it as a loadable module but now comes the issue of the rest of the stack. I'm finding it somewhat hard to figure out how I should structure the rest of the network stack as modules as the different layers of a network stack seem to intermingle a little more than I expected. For example, it seems as though ARP has different headers depending on what device they came over (Ethernet and the like) and I'm not exactly sure how to structure the modules such that my network stack is completely modular. Furthermore, I need to be able to get the MAC address/IP address and whatever other protocols's information of a certain card. I can't hardcode the ioctls because the protocol in question that the ioctl queries might not be loaded so I need some dynamic way of testing if the protocol is loaded (not hard) and then going to it to get the ioctl for the card but something about that seems somewhat like a kludge.

I've tried checking out a few kernel's source code but I am not really familiar with any of the ones I checked (Linux, FreeBSD, NetBSD) so I didn't really understand much of what was going on in each of their network stacks and of the 2 OSDEV projects I checked, both weren't very modular by design in their network stacks so I didn't really learn anything out of them either.

So I guess I'm asking, how would one go about designing a really good, modular network stack? Is this something that is even desirable or am I wasting my time?

To clarify, I'm not having an issue implementing the protocols in question. I'm just struggling to link them together neatly, modularly and expendably. This is more of a design question than it is a "How do I do IPv4?" question.

If someone could talk me through how a half decent network stack should look, has a decent article or 2 about how Linux or someone does it or has a few relevant files to check out in any kernel, it'd be greatly appreciated.

Thanks in advance for any advice/code/ideas/articles given
- CaydendW

P.S: Sorry if this is a stupid question/Incomprehensible mess. I'm a little bit at my wits' end with designing.

Octocontrabass · Post by **Octocontrabass** » Thu Jun 15, 2023 12:36 pm

There's additional discussion on this topic elsewhere. (I don't have anything else to add, sorry!)

CaydendW · Post by **CaydendW** » Thu Jun 15, 2023 2:35 pm

Octocontrabass wrote:There's additional discussion on this topic elsewhere. (I don't have anything else to add, sorry!)

That's actually my post!

I made it because I thought the mod's weren't going to approve my post. I guess I was just a little impatient lol

nullplan · Post by **nullplan** » Thu Jun 15, 2023 8:58 pm

So here's how I would do it: The IP code is the core of the entire thing, so I would have it be its own module. Other modules (and indeed the IP code itself) can register network interfaces with the core. Each interface has a type of some description, and a hardware address, and at least a callback for how to send an IP packet on that interface. Also, the IP code has some interface to register an incoming IP packet. IP core initialization must always register the loopback interface and set its address and netmask.

To my knowledge, ARP is part of IPv4. IPv6 uses ICMP neighbor discovery. Consequently the IPv4 code must also contain the ARP code. Do note that whether ARP is used on an interface or not ought to be settable. For example, you don't do ARP on loopback. You also don't do ARP on PPP, because there you are connected to precisely one peer in the switching domain. So if an interface doesn't do ARP, it can only talk to one preset peer, or else broadcast everything.

I would initialize the Ethernet code once an Ethernet driver is loaded. The Ethernet code ought to make it possible, somehow, to register an upper layer for a given protocol number. But since the IP code is always running at that point, Ethernet must register the connections to ARP, IPv4, and IPv6 itself.

Obviously you also need upper layers on top of IP. But you are always going to need TCP, UDP, and ICMP, so those ought to be initialized always. Make the IP code capable of registering handlers for protocols (by number). As well as capable of sending packets with a given protocol number.

The end effect is that no connections are hardcoded. On input, the Ethernet core looks up the receiving L3 handler by protocol number. If that happens to be IP, then the IP core (once it is finished with whatever processing it needs to do) looks up its upper layer by the IP protocol number, and then TCP or UDP can do their magic. On the send path, TCP and UDP send a packet to the appropriate IP layer (I think all OSes I have seen so far keep TCP over IPv4 and TCP over IPv6 separate). IP layer routes the packet, finds by the routing table what interface the packet is meant to go on and what the next hop's IP address is, checks if it knows the next hop's hardware address, if not it does the ARP thing if requested. And IPv6 would do the neighbor discovery. And then it calls down to the send routine of the interface.

rdos · Post by **rdos** » Fri Jun 16, 2023 5:34 am

nullplan wrote:On the send path, TCP and UDP send a packet to the appropriate IP layer (I think all OSes I have seen so far keep TCP over IPv4 and TCP over IPv6 separate). IP layer routes the packet, finds by the routing table what interface the packet is meant to go on and what the next hop's IP address is, checks if it knows the next hop's hardware address, if not it does the ARP thing if requested. And IPv6 would do the neighbor discovery. And then it calls down to the send routine of the interface.

To avoid scatter gather (which not all NICs support), I do the send process in two stages. First the TCP or UDP will request a buffer from the layer below (ipv4 or ipv6) with a given data size. Next, it will fill-out it's data part in the buffer and do a send request to the layer below. This will only require one allocate in the send process and doesn't need scatter-gather. The NIC can also decide to keep it's own physical buffers, and return one of those in response to the buffer allocation request.

Checking the route and generating ARPs will be done on buffer allocate, and then then the allocation will fail. So, the TCP/UDP never assembles the packet unless the route is clear.

OSDev.org

Writing a good, modular network stack

Writing a good, modular network stack

Re: Writing a good, modular network stack

Re: Writing a good, modular network stack

Re: Writing a good, modular network stack

Re: Writing a good, modular network stack