Page 3 of 4

Posted: Mon Apr 16, 2007 5:58 am
by mystran
Ok, so I ended up doing the easiest thing, and essentially copied a working version from RFC 1071 (which is where they define the stupid sum).

Here's a version that works:

Code: Select all

unsigned short inet_checksum(void * data, int len) {

    // Compute Internet Checksum as defined in RFC 1071
    unsigned long sum = 0;

    while( len > 1 )  {
        /*  This is the inner loop */
        sum += *((unsigned short *) data)++;
        len -= 2;
    }

    /*  Add left-over byte, if any */
    if( len > 0 )
        sum += * (unsigned char *) data;

    /*  Fold 32-bit sum to 16 bits */
    while (sum>>16)
        sum = (sum & 0xffff) + (sum >> 16);

    return ~sum;
}
And as a bonus point, here comes something fun. There's some problem with my driver I think (reception times are awful, so it's probably some timeout somewhere), and the packet loss is related to ARP issues (it doesn't queue the ICMP reply if it can't send it because it doesn't have the ethernet address in it's ARP cache), but 192.168.1.123 is Voix. ;)

Happy hacking:

Posted: Mon Apr 16, 2007 2:22 pm
by Kevin McGuire
If you get it working let us know. I would still like to know what is wrong so when I get to doing something like that I can watch out for it.

Posted: Mon Apr 16, 2007 3:24 pm
by pcmattman
Got it.


The checksum has to be for the entire packet, not just for the header. I can get about 50% of packets in Windows (the rest time out still, but that's probably either firewall or ttl).

Posted: Mon Apr 16, 2007 4:22 pm
by Kevin McGuire
I am glad you figured it out, as for the fifty percent success problem.

TTL is is decremented each time the packet is routed by IP. It has nothing to do with the timeout problems. You had it set at sixty-four hops max, and the Windows or Linux box you were using had it's set at one-hundred-twenty-eight.

You say you can get only about fifty percent and since I have never bothered to work with the network card I could suggest making sure the card retries on ethernet collisions. I have no idea exactly how these cards are designed but I imagine that there is a option somewhere to turn this off or set a maximum number of retries.

The checksum has to have zeros appended on the end of the packet is not a multiple of two bytes too in which you might be over running you're packet buffer sometimes and picking up zeros or some values that still produce a correct packet and other times ones that do not which might be giving you a strange amount of success.

You also might have a large amount of network traffic which using tcpdump does not report since you used filtering options and this could be over running the network card's RX ring buffer, or making it extremely hard for you to get the packets out fast enough. I noticed that you're kernel is rather slow at getting the keyboard keys when pressed and at times will completely miss some key presses.

Posted: Mon Apr 16, 2007 6:50 pm
by mystran
pcmattman wrote:The checksum has to be for the entire packet, not just for the header. I can get about 50% of packets in Windows (the rest time out still, but that's probably either firewall or ttl).
I win. I'm joking ofcourse, screenshot below is over 100Mbit LAN, with Voix running in Bochs, and Bochs is rather slow. The error in the screenie about ICMP packet prepare is the ARP issue (I actually wonder why it only shows one of those, would expect more when dealing with a flood ping..)

Turns out QEMU doesn't support any advanced (read programmer friendly) features of the NE2000 so stuff like the "send packet" command (automatic remote DMA read for the next packet receive) won't work, and it checks current vs. boundary with a logic that mandates one extra page between CURR and BNDRY. Just so you know. I'll attach my NE2k code as well, in case it's useful (it's not superclean, and it assumes new threads run with interrupts disabled, and that there's no SMP issues, and so on)..

The interface config is in the NIC code because there's no better place to put it yet. The PCI detection code (at the end of the file) is a kludge. And there's probably some other kludge somewhere as well, as well as possibly not-so-up-to-date comments (hope not many of those).

Posted: Mon Apr 16, 2007 7:02 pm
by mystran
Kevin McGuire wrote: You say you can get only about fifty percent and since I have never bothered to work with the network card I could suggest making sure the card retries on ethernet collisions. I have no idea exactly how these cards are designed but I imagine that there is a option somewhere to turn this off or set a maximum number of retries.
NE2k (or well, at least the National 8390D whose datasheet I've been using) does 15 retries on collisions automatically, and reports error if that fails. I had a problem with traffic lost, or very slow, and turns out it was because you have to loop for reception after interrupt, until the card says there's no more packets in it's buffers (any interrupt could give you zero or more packets) and there was some logic problems with my looping code... oh well.

So far I've not seen a single transmission error. As can be seen in my driver above, I'd get a warning on screen if that ever happened, but then again I'm running on Bochs and QEMU so far..
You also might have a large amount of network traffic which using tcpdump does not report since you used filtering options and this could be over running the network card's RX ring buffer, or making it extremely hard for you to get the packets out fast enough. I noticed that you're kernel is rather slow at getting the keyboard keys when pressed and at times will completely miss some key presses.
It could be that most of his kernel is running with interrupts disabled.

OSDev would be a lot of easier if one didn't have to try to keep the kernel from blocking interrupts too long. Personally, I've right now got a priority inversion in my VFS which can basicly stop all system calls from happening (more or less) if a low-priority thread is doing a system call and a higher priority thread doesn't want to block anywhere (say, it loops infinitely).

Posted: Mon Apr 16, 2007 7:11 pm
by mystran
Haha, and I actually broke my code (the one above is broken). :D

Doesn't work in QEMU anymore.. oh so fun is this life.. :P

Posted: Mon Apr 16, 2007 7:15 pm
by Kevin McGuire
I win. I'm joking ofcourse, screenshot..... <snip>
<PSYCHOLOGICAL TEST 03E589.02 OUTPUT>
.3k23.o39.323eo.3o320.9ask.23k32902.wekn2m2.z9k2

Posted: Mon Apr 16, 2007 8:11 pm
by mystran
Blah.. more stuff to do fix because Bochs is too nice...

Oh well, have to fetch current page pointer instead of just relying on the status it seems...?

Posted: Mon Apr 16, 2007 8:34 pm
by mystran
Oh ok. Turns out the following (which may be of use for anyone doing Ne2k drivers): It's necessary to fetch CURR (from page1) and then read packets until the next packet (in the packet headers) points to CURR page, which would mean there's no longer anything to read.

I guess the reason my method worked in Bochs okayish but not on QEMU, is because Bochs checks interrupts all the time, and QEMU just once in a while... or something.

Anyway, the method using CURR works in Bochs and QEMU properly. In fact.... the following is a ping, same setup as in the screenshot posted earlier (Voix running in Bochs, pinger is remote over 100Mbit LAN):

Code: Select all

<05:33:31|root@wall:~># ping 192.168.1.123 -f
PING 192.168.1.123 (192.168.1.123) 56(84) bytes of data.
.. 
--- 192.168.1.123 ping statistics ---
2542 packets transmitted, 2540 received, 0% packet loss, time 25482ms
rtt min/avg/max/mdev = 4.260/10.730/36.619/3.685 ms, pipe 3, ipg/ewma 10.028/10.852 ms
Here we go. :)

Posted: Mon Apr 16, 2007 8:46 pm
by mystran
Removed the old buggy driver, attaching a new version which actually seems to work properly (now I just have to find some hardware to test it with)...

Posted: Mon Apr 16, 2007 11:50 pm
by pcmattman
I found that if I disable my firewall it works 100%... oops.

Edit: enabled firewall, set TTL to 128, no problems, 0 packets lost :D. Now I've implemented an ARP cache system so I'm ready for TCP, UDP etc...

Thanks to everyone for their help. The drivers and protocols are on my CVS in the 'net' folder.

Posted: Tue Apr 17, 2007 4:01 am
by Kevin McGuire
If the TTL is the problem then how the hell is a trace route supposed to work?
http://en.wikipedia.org/wiki/Traceroute

You might really want to do some reading unless you want a fly by night ICMP/IP stack...

Posted: Tue Apr 17, 2007 4:36 am
by pcmattman
My mistake, it's actually not the TTL that made the difference. Turning off the firewall did. Sometimes the firewall just decides to block the ping, other times it decides to let it through.

Disabling the firewall always works.

And I plan to do a lot of reading soon. I have been doing a lot of reading.

Posted: Tue Apr 17, 2007 7:10 am
by mystran
pcmattman wrote:My mistake, it's actually not the TTL that made the difference. Turning off the firewall did. Sometimes the firewall just decides to block the ping, other times it decides to let it through.

Disabling the firewall always works.

And I plan to do a lot of reading soon. I have been doing a lot of reading.

Code: Select all

  296 void neIrqHandler( struct regs* r )
  297 {
  298 	struct ne2000_t* ne = neDevs[0];
  299 	uint8_t status;
  300 	packet_t *p;
  301 	while(1)
  302 	{
  303 		status = inportb(ne->ioAddress + INTERRUPTSTATUS);
  304 		outportb(ne->ioAddress + INTERRUPTSTATUS, status);
  305 
  306 		if(status & 0x1)
  307 		{
  308 			//kprintf( "NE2000: Packet Recieved\n" );
  309 			p = readPacket(ne);
  310 			handleEthernet(ne->eth, p);
  311 		}
  312 		else if(status & 0xa)
  313 		{
  314 			//kprintf( "NE2000: Packet Transmitted\n" );
  315 		}
  316 		else
  317 		{
  318 			break;
  319 		}
  320 	}
  321 }
That won't work reliably. My code looked like that initially, but there's a race: if you don't handle a packet before another arrives (say you've got interrupts disabled too long) it'll queue another packet, but when you clear the ISR bit for receive it will NOT automatically set the bit again simply because you have more packets to be read. It'll only set it once the next packet arrives, which means you'll go out of sync.

The only reliable solution I've found is to do the following, in order:

1. check ISR, if says no packets received, we're done
2. reset the ISR bit about packets received
3. read CURRENT pointer from Page1.
4. read packets until "next packet" of the last packet = CURRENT value that was read at step 3
5. Goto 1 (in case we got more packets while reading)

Also notice that you can't actually have BOUNDARY=CURRENT since at least QEMU would think the NIC memory is full. So you must leave one unused page in between, complicating the logic a bit. National datasheet actually recommends doing this, so I guess there's also hardware that requires it, but said datasheet also shows how to do it.