Packets not arriving

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Ok, so I ended up doing the easiest thing, and essentially copied a working version from RFC 1071 (which is where they define the stupid sum).

Here's a version that works:

Code: Select all

unsigned short inet_checksum(void * data, int len) {

    // Compute Internet Checksum as defined in RFC 1071
    unsigned long sum = 0;

    while( len > 1 )  {
        /*  This is the inner loop */
        sum += *((unsigned short *) data)++;
        len -= 2;
    }

    /*  Add left-over byte, if any */
    if( len > 0 )
        sum += * (unsigned char *) data;

    /*  Fold 32-bit sum to 16 bits */
    while (sum>>16)
        sum = (sum & 0xffff) + (sum >> 16);

    return ~sum;
}
And as a bonus point, here comes something fun. There's some problem with my driver I think (reception times are awful, so it's probably some timeout somewhere), and the packet loss is related to ARP issues (it doesn't queue the ICMP reply if it can't send it because it doesn't have the ethernet address in it's ARP cache), but 192.168.1.123 is Voix. ;)

Happy hacking:
Attachments
voix-2007-04-16.png
voix-2007-04-16.png (56.75 KiB) Viewed 2447 times
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

If you get it working let us know. I would still like to know what is wrong so when I get to doing something like that I can watch out for it.
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

Got it.


The checksum has to be for the entire packet, not just for the header. I can get about 50% of packets in Windows (the rest time out still, but that's probably either firewall or ttl).
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

I am glad you figured it out, as for the fifty percent success problem.

TTL is is decremented each time the packet is routed by IP. It has nothing to do with the timeout problems. You had it set at sixty-four hops max, and the Windows or Linux box you were using had it's set at one-hundred-twenty-eight.

You say you can get only about fifty percent and since I have never bothered to work with the network card I could suggest making sure the card retries on ethernet collisions. I have no idea exactly how these cards are designed but I imagine that there is a option somewhere to turn this off or set a maximum number of retries.

The checksum has to have zeros appended on the end of the packet is not a multiple of two bytes too in which you might be over running you're packet buffer sometimes and picking up zeros or some values that still produce a correct packet and other times ones that do not which might be giving you a strange amount of success.

You also might have a large amount of network traffic which using tcpdump does not report since you used filtering options and this could be over running the network card's RX ring buffer, or making it extremely hard for you to get the packets out fast enough. I noticed that you're kernel is rather slow at getting the keyboard keys when pressed and at times will completely miss some key presses.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

pcmattman wrote:The checksum has to be for the entire packet, not just for the header. I can get about 50% of packets in Windows (the rest time out still, but that's probably either firewall or ttl).
I win. I'm joking ofcourse, screenshot below is over 100Mbit LAN, with Voix running in Bochs, and Bochs is rather slow. The error in the screenie about ICMP packet prepare is the ARP issue (I actually wonder why it only shows one of those, would expect more when dealing with a flood ping..)

Turns out QEMU doesn't support any advanced (read programmer friendly) features of the NE2000 so stuff like the "send packet" command (automatic remote DMA read for the next packet receive) won't work, and it checks current vs. boundary with a logic that mandates one extra page between CURR and BNDRY. Just so you know. I'll attach my NE2k code as well, in case it's useful (it's not superclean, and it assumes new threads run with interrupts disabled, and that there's no SMP issues, and so on)..

The interface config is in the NIC code because there's no better place to put it yet. The PCI detection code (at the end of the file) is a kludge. And there's probably some other kludge somewhere as well, as well as possibly not-so-up-to-date comments (hope not many of those).
Attachments
voix-2007-04-17.png
voix-2007-04-17.png (18.18 KiB) Viewed 2421 times
Last edited by mystran on Mon Apr 16, 2007 8:45 pm, edited 1 time in total.
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Kevin McGuire wrote: You say you can get only about fifty percent and since I have never bothered to work with the network card I could suggest making sure the card retries on ethernet collisions. I have no idea exactly how these cards are designed but I imagine that there is a option somewhere to turn this off or set a maximum number of retries.
NE2k (or well, at least the National 8390D whose datasheet I've been using) does 15 retries on collisions automatically, and reports error if that fails. I had a problem with traffic lost, or very slow, and turns out it was because you have to loop for reception after interrupt, until the card says there's no more packets in it's buffers (any interrupt could give you zero or more packets) and there was some logic problems with my looping code... oh well.

So far I've not seen a single transmission error. As can be seen in my driver above, I'd get a warning on screen if that ever happened, but then again I'm running on Bochs and QEMU so far..
You also might have a large amount of network traffic which using tcpdump does not report since you used filtering options and this could be over running the network card's RX ring buffer, or making it extremely hard for you to get the packets out fast enough. I noticed that you're kernel is rather slow at getting the keyboard keys when pressed and at times will completely miss some key presses.
It could be that most of his kernel is running with interrupts disabled.

OSDev would be a lot of easier if one didn't have to try to keep the kernel from blocking interrupts too long. Personally, I've right now got a priority inversion in my VFS which can basicly stop all system calls from happening (more or less) if a low-priority thread is doing a system call and a higher priority thread doesn't want to block anywhere (say, it loops infinitely).
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Haha, and I actually broke my code (the one above is broken). :D

Doesn't work in QEMU anymore.. oh so fun is this life.. :P
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

I win. I'm joking ofcourse, screenshot..... <snip>
<PSYCHOLOGICAL TEST 03E589.02 OUTPUT>
.3k23.o39.323eo.3o320.9ask.23k32902.wekn2m2.z9k2
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Blah.. more stuff to do fix because Bochs is too nice...

Oh well, have to fetch current page pointer instead of just relying on the status it seems...?
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Oh ok. Turns out the following (which may be of use for anyone doing Ne2k drivers): It's necessary to fetch CURR (from page1) and then read packets until the next packet (in the packet headers) points to CURR page, which would mean there's no longer anything to read.

I guess the reason my method worked in Bochs okayish but not on QEMU, is because Bochs checks interrupts all the time, and QEMU just once in a while... or something.

Anyway, the method using CURR works in Bochs and QEMU properly. In fact.... the following is a ping, same setup as in the screenshot posted earlier (Voix running in Bochs, pinger is remote over 100Mbit LAN):

Code: Select all

<05:33:31|root@wall:~># ping 192.168.1.123 -f
PING 192.168.1.123 (192.168.1.123) 56(84) bytes of data.
.. 
--- 192.168.1.123 ping statistics ---
2542 packets transmitted, 2540 received, 0% packet loss, time 25482ms
rtt min/avg/max/mdev = 4.260/10.730/36.619/3.685 ms, pipe 3, ipg/ewma 10.028/10.852 ms
Here we go. :)
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Removed the old buggy driver, attaching a new version which actually seems to work properly (now I just have to find some hardware to test it with)...
Attachments
net_ne2k.c
(10.99 KiB) Downloaded 58 times
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

I found that if I disable my firewall it works 100%... oops.

Edit: enabled firewall, set TTL to 128, no problems, 0 packets lost :D. Now I've implemented an ARP cache system so I'm ready for TCP, UDP etc...

Thanks to everyone for their help. The drivers and protocols are on my CVS in the 'net' folder.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

If the TTL is the problem then how the hell is a trace route supposed to work?
http://en.wikipedia.org/wiki/Traceroute

You might really want to do some reading unless you want a fly by night ICMP/IP stack...
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Post by pcmattman »

My mistake, it's actually not the TTL that made the difference. Turning off the firewall did. Sometimes the firewall just decides to block the ping, other times it decides to let it through.

Disabling the firewall always works.

And I plan to do a lot of reading soon. I have been doing a lot of reading.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

pcmattman wrote:My mistake, it's actually not the TTL that made the difference. Turning off the firewall did. Sometimes the firewall just decides to block the ping, other times it decides to let it through.

Disabling the firewall always works.

And I plan to do a lot of reading soon. I have been doing a lot of reading.

Code: Select all

  296 void neIrqHandler( struct regs* r )
  297 {
  298 	struct ne2000_t* ne = neDevs[0];
  299 	uint8_t status;
  300 	packet_t *p;
  301 	while(1)
  302 	{
  303 		status = inportb(ne->ioAddress + INTERRUPTSTATUS);
  304 		outportb(ne->ioAddress + INTERRUPTSTATUS, status);
  305 
  306 		if(status & 0x1)
  307 		{
  308 			//kprintf( "NE2000: Packet Recieved\n" );
  309 			p = readPacket(ne);
  310 			handleEthernet(ne->eth, p);
  311 		}
  312 		else if(status & 0xa)
  313 		{
  314 			//kprintf( "NE2000: Packet Transmitted\n" );
  315 		}
  316 		else
  317 		{
  318 			break;
  319 		}
  320 	}
  321 }
That won't work reliably. My code looked like that initially, but there's a race: if you don't handle a packet before another arrives (say you've got interrupts disabled too long) it'll queue another packet, but when you clear the ISR bit for receive it will NOT automatically set the bit again simply because you have more packets to be read. It'll only set it once the next packet arrives, which means you'll go out of sync.

The only reliable solution I've found is to do the following, in order:

1. check ISR, if says no packets received, we're done
2. reset the ISR bit about packets received
3. read CURRENT pointer from Page1.
4. read packets until "next packet" of the last packet = CURRENT value that was read at step 3
5. Goto 1 (in case we got more packets while reading)

Also notice that you can't actually have BOUNDARY=CURRENT since at least QEMU would think the NIC memory is full. So you must leave one unused page in between, complicating the logic a bit. National datasheet actually recommends doing this, so I guess there's also hardware that requires it, but said datasheet also shows how to do it.
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
Post Reply