Page 1 of 1

Intel 8254x driver crashes VirtualBox

Posted: Tue Sep 11, 2018 5:08 am
by mariuszp
Some time ago (by which I mean months ago), I asked for help with my Intel 8254x driver:
viewtopic.php?f=1&t=32195

Since I was spending a lot of time on all these bugs at the time already, I got bored of it and worked on other things in my OS. Yesterday I decided to work on it again, yay!

First I started tested, trying to reproduce the bug and such. What was extremely strange is that after it stopped receiving, I decided to unload the driver and load it again. This froze the VM. Not the OS, but the VM itself.

Now, I decided to follow advice from the last post in that topic: "I've just noticed that in your initialisation you don't set CTRL.ASDE and you don't clear CTRL.LRST after reseting the card, both of which are necessary for the card to automatically establish a link, and to my understanding you can't send or receive without them unless the card operates in internal PHY mode and the speed of the link happens to match the 1000Mb/s default of the card."

So now my initialization looks like this:

Code: Select all

		// set the link UP
		uint32_t *volatile regCtrl = (uint32_t*volatile) nif->mmioAddr;
		do
		{
			(*regCtrl) |= (1 << 26);					// reset
			__sync_synchronize();
			sleep(1);
		} while ((*regCtrl) & (1 << 26));
		(*regCtrl) = ((*regCtrl)
			| (1 << 6)							// set SLU
			| (1 << 5))							// set ASDE
			& ~(1 << 3)							// clear LRST
		;
		__sync_synchronize();
Now, as soon as the driver gets loaded at all, then at the very moment it receives a packet (as far as I can tell), it freezes the VM. Ubuntu says that the window stopped responding and gives me the option to force quit.

I don't know what's wrong with my driver, but it certainly shouldn't be able to freeze the VM. I tried searching for how I force the VM to dump core so that I could perhaps look at the VirtualBox code which freezes, but found no information on this. Does anyone know what could possibly be going on with the driver, or at least how to force the VM to dump core?

Re: Intel 8254x driver crashes VirtualBox

Posted: Tue Sep 11, 2018 2:06 pm
by mariuszp
I managed to obtain a core dump from VirtualBox. Of course, I had to abort it, which is asynchronous, so the information on what "caused" the crash is not really valid; but the list of threads could be:

Code: Select all

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7ffb1d472740 (LWP 12472) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x5565ef0f0420, nfds=6, timeout=99)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  2    Thread 0x7ffafefbf700 (LWP 12475) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x5565eefa5500, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  3    Thread 0x7ffae870f700 (LWP 12480) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x7ffae870ebd0, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  4    Thread 0x7ffb131cf700 (LWP 12474) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x7ffb131ced68, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  5    Thread 0x7ffae80a7700 (LWP 12482) 0x00007ffb1d03df85 in futex_abstimed_wait_cancelable (private=<optimised out>, abstime=0x7ffae80a6da0, expected=0, 
    futex_word=0x5565ef140698)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  6    Thread 0x7ffafe7be700 (LWP 12476) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x5565eefb6500, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  7    Thread 0x7ffada46e700 (LWP 12484) 0x00007ffb1d03df85 in futex_abstimed_wait_cancelable (private=<optimised out>, abstime=0x7ffada46db60, expected=0, 
    futex_word=0x5565ef19254c)
---Type <return> to continue, or q <return> to quit---
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  8    Thread 0x7ffad8175700 (LWP 12506) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaac000d0c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  9    Thread 0x7ffad8726700 (LWP 12488) 0x00007ffb1d03df85 in futex_abstimed_wait_cancelable (private=<optimised out>, abstime=0x7ffad8725e10, expected=0, 
    futex_word=0x7ffae000ec18)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  10   Thread 0x7ffad80f4700 (LWP 12507) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaac001d4c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  11   Thread 0x7ffad84f7700 (LWP 12499) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffab0000d9c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  12   Thread 0x7ffad0b7e700 (LWP 12510) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaac003c08)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  13   Thread 0x7ffafccff700 (LWP 12477) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x7ffaf40101b0, nfds=4, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  14   Thread 0x7ffad0e8f700 (LWP 12508) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaac002ca8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
---Type <return> to continue, or q <return> to quit---
  15   Thread 0x7ffad00fd700 (LWP 12512) 0x00007ffb1d041c60 in __GI___nanosleep
    (requested_time=0x7ffad00fcdd0, remaining=0x7ffad00fcde0)
    at ../sysdeps/unix/sysv/linux/nanosleep.c:28
  16   Thread 0x7ffa945de700 (LWP 12515) 0x00007ffb1cb465d7 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:78
  17   Thread 0x7ffae91b7700 (LWP 12479) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x5565eefd0538)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  18   Thread 0x7ffa8ffff700 (LWP 12516) 0x00007ffb1cb465d7 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:78
  19   Thread 0x7ffa8e767700 (LWP 12521) 0x00007ffb1cb44bf9 in __GI___poll (
    fds=0x7ffa78007030, nfds=3, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  20   Thread 0x7ffae868e700 (LWP 12481) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffab00075fc)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  21   Thread 0x7ffa8fa7e700 (LWP 12517) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaa911094c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  22   Thread 0x7ffad8276700 (LWP 12501) 0x00007ffa95e1a27e in ?? ()
   from /usr/lib/virtualbox/VBoxDD.so
  23   Thread 0x7ffa8f87b700 (LWP 12520) 0x00007ffb1cb465d7 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:78
---Type <return> to continue, or q <return> to quit---
  24   Thread 0x7ffa8dee5700 (LWP 12523) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaa9130078)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  25   Thread 0x7ffa8de64700 (LWP 12524) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaa9131a08)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  26   Thread 0x7ffad1a77700 (LWP 12511) 0x00007ffb1ca7026c in __GI___sigtimedwait (set=<optimised out>, info=0x7ffad1a76d40, timeout=0x0)
    at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
  27   Thread 0x7ffa8f8fc700 (LWP 12519) 0x00007ffb1cb465d7 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:78
  28   Thread 0x7ffadac6f700 (LWP 12483) 0x00007ffb1d03df85 in futex_abstimed_wait_cancelable (private=<optimised out>, abstime=0x7ffadac6e840, expected=0, 
    futex_word=0x5565ef1629b0)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  29   Thread 0x7ffad2ffd700 (LWP 12491) 0x00007ffb1d03df85 in futex_abstimed_wait_cancelable (private=<optimised out>, abstime=0x7ffad2ffcb60, expected=0, 
    futex_word=0x5565ef472afc)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  30   Thread 0x7ffad0bff700 (LWP 12509) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffa98001638)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  31   Thread 0x7ffa954ef700 (LWP 12513) 0x00007ffb1d03d9f3 in futex_wait_cancel---Type <return> to continue, or q <return> to quit---
able (private=<optimised out>, expected=0, futex_word=0x7ffaac004b68)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  32   Thread 0x7ffa9465f700 (LWP 12514) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7ffaa90f02b8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  33   Thread 0x7ffa8f9fd700 (LWP 12518) 0x00007ffb1cb465d7 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:78
  34   Thread 0x7ffa8df66700 (LWP 12522) 0x00007ffb1d03df85 in futex_abstimed_wait_cancelable (private=<optimised out>, abstime=0x7ffa8df657f0, expected=0, 
    futex_word=0x7ffaa912f578)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  35   Thread 0x7ffad8578700 (LWP 12526) 0x00007ffb1d03d9f3 in futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x5565eefc5100)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
None of these appear related to the gigabit ethernet directly... But this does not happen without the 8254x driver, so there must be some sort of link...

Re: Intel 8254x driver crashes VirtualBox

Posted: Tue Sep 11, 2018 2:32 pm
by davidv1992
I have to admit that I have little to no experience debugging things like this, but given the enormous number of threads that is in some form of futex_wait function, my best guess is that your driver somehow triggers a deadlock within virtualbox. As to debugging that, you could try running a virtualbox build with debug symbols and take a deeper look at what various threads are doing, as well as check on whether it is indeed a deadlock. That might give you some idea as to what is upsetting it.

Re: Intel 8254x driver crashes VirtualBox

Posted: Tue Sep 11, 2018 2:50 pm
by Brendan
Hi,
mariuszp wrote:

Code: Select all

		// set the link UP
		uint32_t *volatile regCtrl = (uint32_t*volatile) nif->mmioAddr;
		do
		{
			(*regCtrl) |= (1 << 26);					// reset
			__sync_synchronize();
			sleep(1);
		} while ((*regCtrl) & (1 << 26));
		(*regCtrl) = ((*regCtrl)
			| (1 << 6)							// set SLU
			| (1 << 5))							// set ASDE
			& ~(1 << 3)							// clear LRST
		;
		__sync_synchronize();
Now, as soon as the driver gets loaded at all, then at the very moment it receives a packet (as far as I can tell), it freezes the VM. Ubuntu says that the window stopped responding and gives me the option to force quit.
Can you be a little more specific. For example, there are 3 places to put a "printf("VM hasn't frozen yet\n");" in your driver's initialisation (at the very start, immediately after the reset loop, and after the SLU/ASDE/LRST flags are configured). If the VM only freezes after you change the SLU/ASDE/LRST flags, you could try doing them one at a time and figure out which flag causes the VM to freeze.
mariuszp wrote:I don't know what's wrong with my driver, but it certainly shouldn't be able to freeze the VM.
You're right - nothing should cause the VM itself to crash/freeze, and (especially if you can get more specific information) it might be worth sending a bug report to the VirtualBox developers, so that they can try to find and fix their problem.

In the meantime, you could try other emulators to see what happens (I think VMware and Qemu both emulate 8254x NICs) - one of them might give you a clue (in its logs or something) if your code is doing something it shouldn't.

The other thing that might be worth trying is enabling/disabling hardware acceleration in VirtualBox (and possibly even changing other settings - more/less RAM, more/less CPUs, etc). Sometimes there's a bug in one place that causes strangle behaviour somewhere completely different.


Cheers,

Brendan

Re: Intel 8254x driver crashes VirtualBox

Posted: Tue Sep 11, 2018 3:05 pm
by mariuszp
@Brendan: it definitely does not freeze right after setting these bits. I can continue using the Os for maybe 10 seconds after boot. It freezes at some point, and the save/receive indicator in the VBox status bar never flashes. It is very likely that this happens right at the moment something is being received... but in either case, as you can see, there is no place within Gldiix where I coudl just put a kprintf() to debug this...

Re: Intel 8254x driver crashes VirtualBox

Posted: Tue Sep 11, 2018 6:40 pm
by Brendan
Hi,
mariuszp wrote:@Brendan: it definitely does not freeze right after setting these bits. I can continue using the Os for maybe 10 seconds after boot. It freezes at some point, and the save/receive indicator in the VBox status bar never flashes. It is very likely that this happens right at the moment something is being received... but in either case, as you can see, there is no place within Gldiix where I coudl just put a kprintf() to debug this...
I was thinking that maybe there's an "internal loopback" mode that can be used to prevent it from receiving packets from the network (to see if the "~10 seconds until freeze" disappears when it's not listening to the network) that can also be used to force it to receive packets (to see if receiving a packet does reliably trigger the freeze).

Unfortunately the wiki page says nothing about loopback (and doesn't mention any kind of tests to determine if the hardware is faulty at all, which is relatively horrifying considering that "do any self tests" should be a mandatory 2nd step of all device drivers); so I dug into the manual to find that it does support loopback (plus a few other things for diagnostics).

Mostly, now I'm wondering how you'd feel about doing things very differently to what you've been told. Specifically:
  • Do a "global reset" of the device, and disable all of the "auto-link negotiation" (instead of trying to enable it)
  • Force the device to use its highest speed with full duplex and put it into "loopback mode"
  • Test the device by sending a few packets to itself and receiving them
  • Do a "link reset" of the device
  • Clear all the statistics back to zero (by reading them all)
  • Disable "loopback" and enable all of the "auto-link negotiation"

Cheers,

Brendan

Re: Intel 8254x driver crashes VirtualBox

Posted: Wed Sep 12, 2018 1:02 am
by linuxyne
For typical accesses to device/dma memory, what one needs is a pointer to volatile memory (volatile uint32_t *ptr) and not a volatile pointer to memory (uint32_t *volatile ptr).

Re: Intel 8254x driver crashes VirtualBox

Posted: Wed Sep 12, 2018 8:03 am
by mariuszp
Well, it doesn't work in QEMU either (but at least does not crash it. just fails to send or receive anything). I guess i'll continue debugging.

Re: Intel 8254x driver crashes VirtualBox

Posted: Thu Sep 13, 2018 4:10 am
by linuxyne
qemu's monitor can be used to enable traces for e1000x implementation.
The monitor command is

Code: Select all

trace-event e1000* on
.

E: When processing the very first packet, it is correct to say that the card advances RDH from 0 to 1, and the driver changes RDT from NUM_RX_DESC to 1? If so, Qemu considers RDH == RDT as a condition where no buffers are available to place any received packets.

Code: Select all

. . .
nif->nextRX = (++index) & (NUM_RX_DESC-1);
volatile uint32_t * regRXTail = (volatile uint32_t *) (nif->mmioAddr + 0x2818);
*regRXTail = nif->nextRX;
. . .
E2: Does enabling interrupts and pci bus mastering, before the threads have been created and some registers have been set, create any races?

(E3: Perhaps that's what caused VBox to crash - dereferencing un-initialized RDB?)