Page 1 of 2

Gasp! some DiNS x86 source.

Posted: Tue Feb 26, 2008 11:39 pm
by 01000101
If you didn't know from a previous post, I've been working on an x86 implementation of my OS DiNS. I now have run into a very annoying issue with my RTL8139 NIC driver.

Normally, even on a bad day, I get speeds of around 16Mb/s down and around 2Mb/s upstream. For some reason, with this driver, I can only get about 4-5Mb/s down and the full 2Mb/s up. I've done tons of packet analysis and stream monitoring with WireShark (ethereal) and noticed that there is no real pattern to the random packet drop.

I get no errors reported back to the screen, and I can do things such as browse the web and download things at decent speed. If I didn't do any speed-tests I wouldn't really notice, but because the NIC speed is such a crucial aspect to the overal OS design, it needs to be fixed.

The source is a mix of my own code and the RTL8139too Linux source. I'm trying to completely get rid of all the *nix code, but I want to clear this issue up first before I continue to destroy the Linux source.

The basic structure is pretty simple, when one card gets a packet, it sends the packet to the other NIC's Transmit buffer and in turn, the sending NIC sends the packet on the other line.

BTW, the source is INSANELY commented as I don't like to mess up more than I have to on these things. I also replaced the Linux-style Enum's and named-variables for NIC ports as they were previously just the Hex association.

Any and all help welcomed,

Thanks.

PS. If you need to see the Typedef Struct ETH_T, I will be happy to share that.

Posted: Wed Feb 27, 2008 6:27 am
by t0xic
When I get home I can *try* porting it to my OS to test it. But I would need the ETH_H struct, if you're willing to surrender even more code

Posted: Wed Feb 27, 2008 11:03 am
by Brynet-Inc
One must ask though, is using GPL licenced code in your most obviously proprietary OS a good idea? ;)

EDIT: Distributing the code without their licence attached also sounds like a Bad Idea (TM). :roll:

Posted: Wed Feb 27, 2008 11:23 am
by 01000101
? this is my x86 version (the Open Source one). And if you are telling me that posting this code here w/o the liscence attached is a 'bad' idea, then just about everyone here is guilty of posting liscenced code for help.

Also, as stated above, I plan on re-doing this to eleminate the "linux" factor from the source. I only used it as a reference for now, and later it will be my own. and if not, it doesnt really matter as this is not the version that I plan to release.

I will PM you my eth_t and incoming_packet_s structures (used in Rx handling).

Thanks.

Posted: Wed Feb 27, 2008 11:47 am
by lukem95
could you also give pm them to me? it looks like you are using very similar function names etc to me, so porting it shouldnt be too much trouble.

I'll do a quick speed test once its complete and see if i get the problem too. It will also give me a good excuse to work on my own net interface stuff.

Posted: Wed Feb 27, 2008 11:59 am
by 01000101
How do you plan to test it? it relys on having two RTL8139D's with one NIC connected to a path leading to the internet, and one leading to a switch/computer cluster.

I'll PM them in a sec

Posted: Wed Feb 27, 2008 12:14 pm
by Dex
I had the same problem, even though my driver was coded from scrach in asm.
So are you sure it's the driver that's at fault ? and not the TCP/IP stack, also do you use int or polling ?.

Posted: Wed Feb 27, 2008 12:18 pm
by 01000101
That's the real issue, is that there is NO TCP/IP stack or any other top-level packet handling in this version. I just wanted to knock out bugs, so I just made a simple transfer from one nic to the other, no real packet handling (I plan on porting the packet handler from my closed-source one once this is resolved).

I use INT's, but I have also tried polling and with the same results.
I've also checked to make sure that while handling an interrupt, another interrupt wouldnt happen and steal time, but no such thing happens.

Most of the logic behind the packet handling is straight from the:
RTL8139(A/B) Programming Guide (v0.1) and the RTL8139 Datasheet.

I think there might be some unnecessary RxErr's that for some reason aren't being handled properly or... something..

Posted: Wed Feb 27, 2008 12:54 pm
by lukem95
i have an abundance of computers, ill do a bit of googling and fingers crossed come up with the goods. at the very least i can have a think about the problem and look through the code

Posted: Wed Feb 27, 2008 7:06 pm
by 01000101
Ok great.
The only thing I could think of for a possible issue, was the memcpy placed after an Rx interrupt is fired.

What my theory is (which I havent been able to prove thus far) is that, when a buffer-overflow happens (when cur_rx is > RX_BUF_LEN), the packet gets copied into memory fine, but after that, the cur_rx is still > RX_BUF_LEN and thus the next packet should be 'out of the buffer', but yet no errors arise.

Posted: Thu Feb 28, 2008 7:14 pm
by 01000101
Now that I think of it, you wont need 2 rtl8139's to see the issue. If only one card gets hammered with packets, the error still happens.

Posted: Tue Apr 08, 2008 1:00 pm
by 01000101
I have no hair left... I have pulled it all out. I really need some help as I am obviously not finding the answer from the RTL8139 1.1 or datasheet and even after creating a complete port of the linux rtl8139 driver, it STILL fails on the same error.

I have narrowed it down significantly though, the error comes from the packet overflow handler. when a packet's next buffer space is out of range (8,16,32,64k), then it must either wrap around or the software must provide a little extra room for one more packet post-buffer.

In my case, I chose the latter of the two options, and have provided an extra 2k at the end of the buffer to accept a trailing packet.

Code: Select all

unsigned char rx_ring[RX_BUF_TOT_LEN]; // (32k + 16 + 2k)
unsigned int cur_rx;	// Index into the Rx buffer of next Rx pkt.
also my rx_config enumeration is a direct copy from the linux driver (for testing purposes only), and it is as follows (as well as the actual port write of the int)

Code: Select all

static const unsigned int rtl8139_rx_config =
	  RxCfgEarlyRxNone | RxCfgRcv32K | RxNoWrap |
	  (RX_FIFO_THRESH << RxCfgFIFOShift) |
	  (RX_DMA_BURST << RxCfgDMAShift); // note the 'nowrap'
rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys | AcceptAllPhys; // promisc mode

outportl(ioaddr + RxConfig, rtl8139_rx_config | rx_mode);
and the main RX_RECIEVE function is where everything happens. its all standard naming conventions (p suffix for pointer).

Code: Select all

void RTL8139_RX_RECIEVE(eth_t *eth)
{

    unsigned long ioaddr = eth->ioaddr;
    
	while ((inportb(ioaddr + ChipCmd) & RxBufEmpty) == 0)							// is packet buffer empty?
    {  
        
        eth->Incoming_Packet.p_Rx_Read = eth->rx_ring + eth->cur_rx;                // pointer to CAPR
        eth->Incoming_Packet.Packet_Status = *(unsigned short*)(eth->Incoming_Packet.p_Rx_Read);       // +00 bits to get pkt_status
        eth->Incoming_Packet.Packet_Length = *(unsigned short*)(eth->Incoming_Packet.p_Rx_Read + 2);   // +16 bits to get pkt_length
        eth->Incoming_Packet.p_Packet = (eth->Incoming_Packet.p_Rx_Read + 4);       // +32 bits to jump over header
        
        if( eth->Incoming_Packet.Packet_Status & 0x0001 &&                          // make sure ROK is 1
           (eth->Incoming_Packet.Packet_Status & 0x001E) == 0 &&                    // perform multiple error checks at once
            eth->Incoming_Packet.Packet_Length <= ETH_FRAME_LEN &&                  // length must be less than 1536
            eth->Incoming_Packet.Packet_Length >= ETH_ZLEN)                         // length must be more than 60
        {
            
            eth->packetcount++;														// update packetcount
            
            if(eth->nic_num == 0){write_packetcount_eth0(eth->packetcount);}		// update packetcount window
            if(eth->nic_num == 1){write_packetcount_eth1(eth->packetcount);}		// update packetcount window
            
            eth->cur_rx = inportw(ioaddr + 0x3A);									// read next packet write buffer address (CBR)
            if(eth->cur_rx >= RX_BUF_LEN){eth->cur_rx -= RX_BUF_LEN;}				// if (bufferoverflow) { loop around to start }
            outportw(ioaddr + RxBufPtr, eth->cur_rx - 0x10);						// update CAPR 
            eth->packetlen = eth->Incoming_Packet.Packet_Length - 4;                // remove CRC
            
            //RTL8139_TX_TRANSMIT(eth);												// send to next NIC's tx buffer for transmission
            RTL8139_TX_TRANSMIT_LOOP(eth);											// send to current NIC's tx buffer for transmission  
            
        }
        else
        {
            printf("RX_ERROR! (%x) \n", eth->Incoming_Packet.Packet_Status);		// notify user of error
            RTL8139_RESET(eth);														// perform a complete reset
        }
        
        //rtl8139_update("rx");														// sync with other NIC
    }

}
I am completely stumped. It catches a good amount of packets ( a buffer full ) and then it drops one and sometimes provides an error message, but sometimes not.

Any help is GREATLY appreciated :)

Posted: Tue Apr 08, 2008 8:37 pm
by pcmattman
Hi,
I have narrowed it down significantly though, the error comes from the packet overflow handler. when a packet's next buffer space is out of range (8,16,32,64k), then it must either wrap around or the software must provide a little extra room for one more packet post-buffer.

In my case, I chose the latter of the two options, and have provided an extra 2k at the end of the buffer to accept a trailing packet.
Have you tried wrapping the pointer around instead of providing extra space?

Just a thought...

Posted: Tue Apr 08, 2008 11:03 pm
by 01000101
yeah, I have tried that, and it fails as well. same issue.

I really want to use the overflow buffer method as it seems more efficient and easier to manage.

I have even tried dumping the CBR packet area in hopes to find the packet just offset by a few bytes or something, but it doesnt even show up at all.

Posted: Tue Apr 08, 2008 11:49 pm
by pcmattman
Are you testing on real hardware, or emulated hardware? Real hardware may be "compatible" as opposed to emulated hardware which is created via the spec.