RTL8139 Sending issues on real hw.

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
GhelloWorld
Member
Member
Posts: 27
Joined: Thu Apr 19, 2018 5:31 am

RTL8139 Sending issues on real hw.

Post by GhelloWorld »

Hello Everyone,

When I finally thought that my rtl8139 driver was working perfectly another and hopefully the last problem occurred. I decided to look at the issue a couple times before posting here because I had already posted a issue with this driver a couple weeks back, unfortunately I can not figure this one out. So the situation is as followed. The driver now receives properly on my real pc but it has a weird behavior when sending data. When I send for example a dhcp packet over the network there are 2 bytes added to the packet I send, let me explain.
This is what I suspect to be send:
FF FF FF FF FF FF
45 00 01 48 00 01
40 00 FF FF FF FF...
But instead I capture this with wireshark:
00 00 FF FF FF FF FF FF
45 00 01 48 00 01 40 00
FF FF FF FF...
I have no idea why this is happening, I have already tried to zero out the memory and tried a lot of different configurations for the device but that did not fix it. In the attachment you can see the received packets, the bottom 5 should have been dhcp, you can ignore the first 2 packets. I really hope that anyone can help me with this problem since I have had way to much issues with this so called "simple device".
My code for sending:

Code: Select all

void RTL8139::SendData(uint8_t* data, uint32_t len)
{
    if(len > 1536)
    {
        printf("Packet to big to send!\n");
        while(1);
    }

    void* transfer_data = MemoryManager::activeMemoryManager->malloc(len);
    MemoryOperations::memset(transfer_data, 0, len);
    MemoryOperations::memcpy(transfer_data, data, len);

    // Second, fill in physical address of data, and length
    outportl(device->portBase + TSAD_array[tx_cur], (uint32_t)transfer_data);
    outportl(device->portBase + TSD_array[tx_cur++], len);
    if(tx_cur > 3)
        tx_cur = 0;
}
The driver code: https://github.com/Remco123/CactusOS/bl ... tl8139.cpp
And the repo: https://github.com/Remco123/CactusOS
Thank you in advance
Attachments
zerobug.pcap.txt
For some reason pcap files are not allowed so remove the .txt part from the filename
(2.12 KiB) Downloaded 43 times
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: RTL8139 Sending issues on real hw.

Post by BenLunt »

Just out of curiosity, it has been a while since I have looked at that particular card, but does the buffer require a certain alignment?

For example, if your buffer is at a word boundary but the controller requires a dword boundary, it will probably ignore the bottom two bits for the address and this is where you will get the two bytes added, so to speak.

i.e.: let's say your buffer is at 0x12345676, if the controller requires a dword alignment, the actual address is now 0x12345674, two bytes before your buffer. The controller will start at these two bytes before your buffer.

Just a thought...

Ben
- http://www.fysnet.net/osdesign_book_series.htm
GhelloWorld
Member
Member
Posts: 27
Joined: Thu Apr 19, 2018 5:31 am

Re: RTL8139 Sending issues on real hw.

Post by GhelloWorld »

Thank you Ben for your answer,

To be honest with you I was not familiar with the term memory alignment so I had to look it up. I think I now get the basic meaning of it. You were completely right the card indeed expects the memory to be double word aligned, I have probably read over it when I was reading the documentation. Using this guide (https://embeddedartistry.com/blog/2017/ ... ned-malloc I have created the following function for allocating aligned memory:

Code: Select all

void* MemoryManager::aligned_malloc(common::size_t align, common::size_t size)
{
    void * ptr = NULL;

    //We want it to be a power of two since align_up operates on powers of two
    if(!(align & (align - 1)) == 0)
        return 0;

    if(align && size)
    {
        /*
         * We know we have to fit an offset value
         * We also allocate extra bytes to ensure we can meet the alignment
         */
        common::uint32_t hdr_size = sizeof(offset_t) + (align - 1);
        void * p = malloc(size + hdr_size);

        if(p)
        {
            /*
             * Add the offset size to malloc's pointer (we will always store that)
             * Then align the resulting value to the arget alignment
             */
            ptr = (void *) align_up(((common::uintptr_t)p + sizeof(offset_t)), align);

            //Calculate the offset and store it behind our aligned pointer
            *((offset_t *)ptr - 1) = (offset_t)((common::uintptr_t)ptr - (common::uintptr_t)p);

        } // else NULL, could not malloc
    } //else NULL, invalid arguments

    return ptr;
And when I tested it it actually worked!
For the first 2 packets :(
It managed to send a dhcp packet and successfully receive the answer from my router (thank god that works), but when I capture the packets directly I can see that the 3th and 4th packet is still broken. These are also dhcp since my laptop does not respond to the pc with my os, and I try to send a dhcp maximum 5 times before giving up. In the attachment there is again the capture file, the code for sending is the same only the malloc is replaced with the aligned_malloc function above. For clarity I call this function with a alignment of 2 (because of the double word right?). I hope that anyone can help me figure this out completely.

Thanks in advance
Remco
Attachments
almostworking.pcapng.txt
Remove the .txt part and ignore the first to packets
(2.7 KiB) Downloaded 36 times
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: RTL8139 Sending issues on real hw.

Post by SpyderTL »

Apparently, no one can agree on the terminology, but in general, a byte is 1 byte, a word is 2 bytes, and a double-word is 4 bytes.

So, try 4. :)
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: RTL8139 Sending issues on real hw.

Post by BenLunt »

Just a few thoughts for you.

1) as SpyderTL stated, a "double word" = "double 16-bit word" = "2 times 16-bit" = 32-bit...
2) no matter your memory allocation technique, it is wise to align all allocation on at least a 16-byte boundary. At least. No matter the usage of the memory.
3) when allocating memory for hardware, whether the specs specify it or not, you should always align all physical memory accesses to a minimum of 16-byte, 32-byte, or 64-byte boundary. However, most hardware specifications will indicate this, and will usually specify a page (4096-byte) alignment.

Hope this helps,
Ben
GhelloWorld
Member
Member
Posts: 27
Joined: Thu Apr 19, 2018 5:31 am

Re: RTL8139 Sending issues on real hw.

Post by GhelloWorld »

SpyderTL, you are completely right and with an alignment of 4 the driver works as it should, finally! And Ben, are you suggesting I should completely remove the malloc function and just only use the aligned version? And if so for all the memory or only the memory that has to do with hardware?
Thanks to both of you guys for the help
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: RTL8139 Sending issues on real hw.

Post by BenLunt »

GhelloWorld wrote:SpyderTL, you are completely right and with an alignment of 4 the driver works as it should, finally! And Ben, are you suggesting I should completely remove the malloc function and just only use the aligned version? And if so for all the memory or only the memory that has to do with hardware?
Thanks to both of you guys for the help
No, I have both, a regular malloc(sz) and an amalloc(sz, a). The first one allocates memory anyway it can while the second allocates memory with an alignment of at least 'a'. What I am saying is that your regular malloc(sz) should align all memory on a minimum of 16 bytes, period. This will cause speed increases on many levels.

For example, your malloc(sz) must have some sort of table/linked list/whatever to indicate if a block of memory is used or not, correct? Just make sure that each available block starts on a 16-byte boundary.

My heap allocator starts with a count of X sized blocks aligned on an X sized boundary with a minimum count of "(X*2) / X" blocks. Then the very next block is sized greater than X but an even divisible of 2, with the same count agreement, etc. This guaranties that all memory is aligned on its size. For example:

High Address:
[128k][128k][128k][128k][128k][128k][128k][128k][128k][128k][128k][128k]
...
[64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k]
...
[32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k]
...
[4096][4096][4096][4096][4096][4096][4096][4096][4096][4096][4096][4096]
...
[512][512][512][512][512][512][512] [512][512][512][512][512][512][512]
[256][256][256][256][256][256][256][256][256][256][256][256][256][256]
[128][128][128][128][128][128][128][128][128][128][128][128][128][128]
[64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64]
[32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32]
Low Address:

Each block is either used or free and a bitmap is used for this purpose. These blocks are aligned so that if I need an alignment of Y, any block that has a size of Y or greater will be guaranteed to be aligned on a Y boundary. It is up to your kernel to determine how many [32]'s, [64]'s, [128]'s, etc. to build for each task, obviously a lot more smaller blocks than larger blocks are needed. The task's info file (registry, .INF file, whatever) can store information about this so that next time it is loaded, it can allocate the count used. (i.e.: If more [64]'s were needed than allocated at load time, next time allocate more [64]'s and less of something else. Keep a running total. Most of the time a task will use the same amount of memory each time it is loaded. Mostly)

My allocator also keeps track of which block in each sized row was last unallocated (freed) to that I can allocate that one next. This way, that memory just might still be in the cache line, speeding up the next process to use that memory block.

This is all explained in one of my books (http://www.fysnet.net/the_system_core.htm, Chapter 13).

When you are writing your heap allocator, remember that it takes a considerable amount of time to allocate memory if you don't do it correctly, and alignment has a lot to do with it.

Thanks,
Ben
Post Reply