Network handling

Brendan · Post by **Brendan** » Sat Oct 11, 2008 7:54 pm

Hi,

JackScott wrote:Using the newer APIC-based interrupt system, I believe the number of hardware interrupts is much higher. I haven't checked though. OSDev wiki will have more information.

Yes.

There's typically 16 or more IRQs per I/O APIC (for example, there could be one I/O APIC with 24 inputs/IRQs, or a pair of I/O APICs with 16 inputs/IRQs each, or....). The most I've heard of is a system with four I/O APICs. I'm not sure how many inputs each of these I/O APICs had, but with 16 inputs/IRQs per I/O APIC it works out to a total of 64 possible IRQs.

On top of that there's MSI (Message Signaled Interrupts) where a PCI device can send it's IRQ directly over the bus without using any I/O APIC inputs. This means that (for e.g.) if you've got 24 I/O APIC inputs and 8 devices using MSI, then you've got a total of 32 possible IRQs.

The only limit is that interrupt vectors are 8-bit. Given that the first 32 interrupts are reserved for exceptions, this leaves a maximum of 224 IRQs.

However, in theory that limit is "per CPU". With 4 CPUs you could have up to 896 IRQs as long as each IRQ is only ever sent to a specific CPU (and as long as each CPU uses a different IDT). With x2APIC there's a limit of about 4 billion CPUs per computer, so the theoretical maximum for the 80x86 architecture would be about 900 billion IRQs. Note: typically an OS would reserve some interrupt vectors for IPIs (Inter-Processor Interrupts) so the practical limit is a little lower.

Cheers,

Brendan

Jeko · Post by **Jeko** » Sun Oct 12, 2008 2:29 pm

Ok. I've decided how to handle network in my OS.

I will develop a driver that will handle sockets.
This socket driver will pass sockets to transport protocols, and them will handle these sockets (they will use lower levels to send/receive packets, and lower levels will do the same with their lower levels). Each level will know which is its lower level. For example TCP will know that it must send packets to IP.

Now, there are only two problems:
1 - How a level can know which is its lower level? For example the TCP's lower level is, by standard, IP, but there are also other protocols that can be TCP's lower levels, for example Netsukuku ( http://en.wikipedia.org/wiki/Netsukuku - http://netsukuku.freaknet.org/ )

2 - What can I do if there are more than one internet/lan connection? How can my socket driver choose which connection use?

Combuster · Post by **Combuster** » Mon Oct 13, 2008 3:39 am

1: for BSD style sockets, you'd have to provide both (i.e. IP and TCP)

2: the IP layer generally knows about all the connections it can use. it can then select the interface based on which IP address is used.

mystran · Post by **mystran** » Tue Oct 14, 2008 5:32 am

For basic Ethernet -> IP -> TCP style stack, you basically need:

- network drivers (send packet, receive packet)
- ethernet layer which can queue packet's to be sent, handle ARP requests/replies/cache, send the packets after ARPs are resolved
- IP layer, that can do at least so called "first-hop" routing (start by "local" vs. "send to gateway" and you've covered the single-interface case)
- ICMP layer that can handle the basic packets like ping (for testing) and the error packets and maybe some flow control?
- UDP is simple (dispatch to correct application, keep some queues)
- once that works, TCP is really just a state-machine and some transfer window management

I'd personally build it very simply.. have your network drivers call a "receive packet" function in the ethernet layer when they receive packet.
Have them export "send packet" and "what is my MAC" functions, and you're pretty much set (ok, maybe another "ready to send more" but).

Then ethernet layer can basically figure out of it's ARP or IP, and dispatch to ARP cache if it's ARP, otherwise send to an IP receive function.
For sending, the ethernet layer has to queue, unless it has the necessary MAC in ARP cache.

On IP level, I wouldn't bother queue anything in any direction, if your ethernet has a queue (which it has to have). So basically you either fill or parse headers, and dispatch to ethernet (sending) or ICMP/UDP/TCP (receiving). Same thing pretty much with ICMP: parse messages, possibly send information to other protocols and/or send reply.

With UDP/TCP most of the work is in the dispatching by sockets (IP:Port/IP:Port pairs), and with the application level interface. The TCP windowing adds a bit more, but quite simplistic implementation should work acceptably.

I'd try to make every layer as simple as possible at first, then go over the specifications with details, and replace each layer with a more tolerant and polished version. I'd start with answering ping-request from both link-local and foreign addresses, get the driver and ARP work reliably, and then move on from there.

--

Oh and my extra 2 cents: start by supporting non-fragmented packets only. That'll simplify your life a lot. You'll have to add the support for IP fragments later, but in practice (especially in local networks) they are rare (most TCP/IP stacks never send packets larger than ethernet can carry) and you can get pretty far without bothering, greatly simplifying the initial work-load.

Jeko · Post by **Jeko** » Tue Oct 14, 2008 6:44 am

mystran wrote:.

To mystran: I'm not developing the network stack with TCP, IP, UDP, Ethernet, etc.. I'm developing a way to handle network drivers and network stack...
It's the same when you develop the Virtual File System, and not FAT16, FAT32, or ext3, or any other fs... It must be an abstraction.

mystran · Post by **mystran** » Tue Oct 14, 2008 6:55 am

Jeko wrote:
mystran wrote:.
To mystran: I'm not developing the network stack with TCP, IP, UDP, Ethernet, etc.. I'm developing a way to handle network drivers and network stack...
It's the same when you develop the Virtual File System, and not FAT16, FAT32, or ext3, or any other fs... It must be an abstraction.

Oh.. I'm not sure if that's a good idea...

Jeko · Post by **Jeko** » Tue Oct 14, 2008 7:01 am

mystran wrote:
Jeko wrote:
mystran wrote:.
To mystran: I'm not developing the network stack with TCP, IP, UDP, Ethernet, etc.. I'm developing a way to handle network drivers and network stack...
It's the same when you develop the Virtual File System, and not FAT16, FAT32, or ext3, or any other fs... It must be an abstraction.
Oh.. I'm not sure if that's a good idea...

Why? For example, there are also other network protocols (level 3 of the OSI model), not only IP.

pcmattman · Post by **pcmattman** » Tue Oct 14, 2008 6:56 pm

That's a good point, but I personally believe one of the few things you don't want to attempt to abstract out is the network stack. It's simply because each level of it is so directly related to each other level that it can bring out some extremely confusing bugs and logic errors. You can implement the extra level 3 protocols without abstraction (and any other level too).

If you do succeed in abstracting it all out though, I'd love to read up about it - not necessarily look at code (because I get nothing out of reading other's code) but some form of documentation.

mystran · Post by **mystran** » Wed Oct 15, 2008 2:04 am

Jeko wrote: Why? For example, there are also other network protocols (level 3 of the OSI model), not only IP.

There are several differences with regards to filesystems, for which abstraction works much better:

- In file system, you essentially have random access block devices, and you don't care about the details.
- Buffer-caching policies can be essentially independent from both layers below (hardware) and layers above (filesystem).
- You want all file-systems to generally look the same, behave the same, after all they are just filesystems.
- In file-system, you're mostly dealing with blocks of data, so you only have to copy around some meta-data in the worst case.

In networking, you usually have to care about the differences between different protocols, if only because that's necessary to write correctly working programs (especially in case of network failures) in the first place. Each protocol will generally have different requirements for how and where to buffer stuff, and avoiding copying packets unnecessarily is more important performance problem, because we are usually dealing with arbitrary sized chunks called packets, rather than something you can put onto a hardware memory page, and map around. The meta-data (headers) are also stored (on the physical layers) together with the real data, and you have to parse that stuff before you finally pass it to the user.

Now, I'm not saying you can't do a generalized network handling model. But I'm pretty sure you'd better ignore the traditional concept of "network stacks" when it comes to structuring your code. I mean, network stacks are all good for the purpose of designing protocols with separation of concerns, but when it comes to implementation, I would look into other more important things. Generalized buffering scheme with a nice way to handle nested headers should be generally useful. A generalized protocol to build packets "in place" without copying anything would be nice. But protocol layers? Nah, that's not nearly as useful. Besides, everybody nowadays agrees that the OSI model looks nice on paper, and generally sucks in practice...

EddyDeegan · Post by **EddyDeegan** » Mon Nov 03, 2008 6:45 pm

Interesting thread. I was having a bit of a heated debate with a colleague of mine earlier today on this very subject. I see the processing of packet I/O as a number of discrete functions, which need to take into account the OSI model (at least from layers 2-4 inclusive) that break down something like the following:

- Start with the link layer. This is a known format which will (should!) be consistent for the media (probably ethernet) to which your interface is physically connected. When a packet is received, assuming the payload it is received as a series of 'raw bits' then you should be able to extract the L2 payload simply by skipping the 14 byte ethernet header from the front of the received data (I'm not counting the 1 byte preamble/start frame dlimiter that precedes this data on the wire - I'm not sure if this gets passed through usually or is stripped off by the NIC). Of those 14 bytes, the first 12 are the dest and src MAC addresses respectively, and the the next 2 are the ethertype/length field.

- Checking the value in the ethertype field will provide clues as to what the L3 format is. A value of 0x0008 indicates IPv4, so if that value is present you can pass the data to the routine that parses an IP packet. Of course other packet types could be ARP (0x0806), SNMP (0x814C) or any one of the other supported types (RFCs are your friend). You can decide to parse/handle/ignore as many of these as you wish. Discarding anything you don't need is usually 'safe'.

- The routine that parses IP would then check the Protocol field (offset 10 bytes from the start of the L3 portion of the packet). A value of 0x06 indicates that L4 is TCP, a value of 0x11 indicates UDP). Again, based on this value pick the routine to parse the next layer with and so on.

There is little point trying to analyse a packet any way other than from the start of Layer 2, peeling each layer as you go and passing the shrinking payload over to the appropriate routine. From that single starting point you will be able to decode each header in turn as each one points to the next (at least up to L4).

Of course the further up you go the OSI layers, the more variety and complexity is likely to emerge (there are a heck of a lot more types of L7 payload than there are types of L3 payload for example) and whereas layers 2-4 are designed to be quick to read and relatively easy to analyse (CRCs at L3 and L4 and flow control at L4 are the tricky parts), higher layers are more likely to contain payloads you need to do more decoding on (think SIP, XML, HTML etc...)

Going back to my heated debate with my friend as mentioned above, I was and remain of the opinion that the process of receiving and decoding the packet data is no more than parsing. He remained adamant that this was actually part of the stack.

I think that you need a stack if you want to actually do something meaningful with the packet data (even if this is as simple as sending a response to a protocol request) but that the actual decoding of the headers is just parsing. The stack is also where you keep track of sessions/streams/other stateful data etc.

Parse the packet (Decode/Classify/Analyse) to decide what you want to do, and have the stack actually do it. Of course the two could be combined, in that each 'Layer handler' could have the parse function included, but conceptually you could write a simple data recorder that doesn't actually do any proper stack operations, just counts IP packets and maintains a set of statistics. That would be more of a tree parser than a stack.

I waffle. Enough already.

Eddy

rdos · Post by **rdos** » Fri Nov 07, 2008 2:17 pm

This is how I handle networks:

1. The lowest network driver defines an API for network cards and for higher level protocols (mostly IP). It also implements ARP and caches ARP information

2. The network card(s) register a table of functions to the network driver (basical virtual methods, but implemtented in assembler).

3. The IP protocol driver register itself as a network protocol with the network driver, and registers a table of pointers for callbacks from the stack. ICMP and DHCP is also implemented in the IP driver.

4. The TCP and UDP protocols registers themselves with the IP protocol. These protocols also export functions for user-level code (a socket interface).

I also have a DHCP-server that allows the use of one network card as a link to a smaller network. I do not support the usual way of handling several network cards with separate IPs and configuration. Actually, there is no required configuration settings in the network stack, but setting a fixed IP, DNS and network mask is supported (but not required). This saves a lot of manual configuration. Usually, everything is configured with the DHCP client.

OSDev.org

Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling

Re: Network handling