Page 1 of 1

UHCI setup packet stalls for all devices most of the time

Posted: Sun Nov 07, 2021 3:18 am
by mid
I've been at this for months now, and while there's been progress, I couldn't fix this issue in particular, so I came here.

I'm trying to perform a GetDescriptor request via UHCI, and the very first setup TD stalls.
Here's the strategy:

1. Write 0x8F00 to LEGSUP
2. Enable controller reset bit
3. Wait until the bit resets
4. Set conventional configuration
5. Set port reset bit
6. Wait 100ms
7. Reset port reset bit
8. Set port enabled bit
9. Wait 10ms
10. Check connect status at bit 0
11. Run the following TD in this manner:

Code: Select all

QH *qh = (QH*) &buf[0];
TD *td = (TD*) &buf[16];

td->link = 1;
td->control = (1 << 23) | (3 << 27);
td->token = (7 << 21) | (0 << 19) | 0x2D;
td->buffer = (uint32_t) Virt2Phys(&buf[32]);

buf[32] = 128; /* Device to host */
buf[33] = 6; /* Get descriptor */
buf[34] = 0; /* Descriptor index */
buf[35] = 1; /* Device descriptor type */
buf[36] = 0; /* Language ID */
buf[37] = 0;
buf[38] = 18; /* Length */
buf[39] = 0;

qh->link = 1;
qh->element = Virt2Phys(td);
frameList[0] = (uint32_t) Virt2Phys(qh) | 2;

/* Wait until inactive */
do {
	asm volatile(: : : "memory");
} while((td->control & (1 << 23)) != 0);
The above sequence actually worked for my two test devices! Twice.. and then never again. And when it doesn't, td->control is 0x00450007, which means Stalled, CRC, actual length 8 (?!?!), PORTSC is 0x0093, USBSTS is 0x0002, which indicates a USB error interrupt and USBCMD is 0x0001.

That was yesterday. I tried it again today and it worked once. At the beginning, too. And that's what gets me, because this suggests to me that it's not just a timing error. I've tried other wait times, like 50ms&10ms, 150ms&20ms, but only the one above gave me at least some results. My brain is scrambled and I may have forgotten to give other information, so do ask.

Works always in an emulator.

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Sun Nov 07, 2021 10:37 am
by BenLunt
Hi,

I see a few issues.

First, you set the 'T' bit in the LINK pointer in your SETUP packet. In this case, the device is expecting at least one IN packet (3 to be exact) and then a STATUS packet.
When you say that it works in an EMULATOR, what actually works? Do you get the 18-byte descriptor returned? I doubt it.

When retrieving the Device Descriptor, you need to place at least four Transfer Descriptors in the Queue. SETUP(8), IN(8), IN(8), IN(2). Once this is successful, you then need to place one more Transfer Descriptor: STATUS(0)

Most devices (including all emulators) will allow you to place all five TDs in the queue. However, in theory, you must make sure that the first four are successful before you place the STATUS TD in the Queue.

Second, when USB was just starting, a well know Operating System would retrieve the first eight (8) bytes of the Device Descriptor and reset the device, then retrieve all 18 bytes. Some devices expect this and won't return a valid descriptor on the first request. This is a rare instance, but it still happens.

Third, when you do actually place the first four TDs in the Queue, be sure the toggle bit is set correctly for all four TDs starting with zero, then toggling each TD after (hence the name), with the STATUS TD's toggle bit always set no matter the previous sequence.

Hope this helps,
Ben
- http://www.fysnet.net/the_universal_serial_bus.htm

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Sun Nov 07, 2021 1:27 pm
by mid
Thanks for the response.

I haven't checked the full descriptor, but the class codes returned were definitely correct (the Bluetooth controller gave E00101, the pen drive - 000000) on both real hardware and on an emulator.

Pardon my ignorance but I'm not sure how the other 4 TDs are relevant. According to the UHCI documentation (3.4.1.3), Bulk, Interrupt and Control transactions are the same to the hardware, and since bulk transactions have a "shared toggle bit sequence", I assumed the TDs we're talking about now don't actually need to be in one queue or be dispatched in the same frame. After all, it did somehow receive the class codes, even if only twice. The code you see is just the setup packet being dispatched; the rest does happen afterward, just not at once. And since the stall happens on the setup packet, I didn't bother showing the rest.

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Sun Nov 07, 2021 2:21 pm
by BenLunt
I thought that you might have had the others, but wasn't sure.

Control, Bulk, Interrupt, and ISO transfers all happen on different Pipes. i.e.: each should have its own endpoint, while each (usually and not counting the control pipe) having a separate endpoint for direction. Therefore, the toggle bit for Control transfers has no influence on the toggle bit for bulk transfers. Each endpoint must maintain its own toggle bit.

As far as this toggle bit is concerned, you *must* have it correct when sent down the wire, so to speak. For example with control transfers, you can and may place the SETUP packet, zero or more DATA packets, and the STATUS packet in different queues, in different frames, etc, however the sequence sent down the wire must have the toggle bits correct.

The SETUP packet *must* have the toggle bit clear.
The first DATA packet *must* have the toggle bit set.
The second DATA packet *must* have the toggle bit clear.
. . .
The STATUS packet *must* have the toggle bit set.

No matter where you place these packets within your schedule (frame number, queue number, etc), the sequence *must* be SETUP/zero or more DATA/STATUS and the toggle bits must be correct.

If you do place these packets in different queues and frames, (which in my opinion is not something you want to do), you must take great care in knowing where the controller is within this schedule so that it does not execute one of the DATA packets before the SETUP packet.

In my opinion, you should place the SETUP packet and all DATA packets (if any) (and the STATUS packet if you wish) in the same queue. This way, if the controller is processing that particular queue and runs out of time before it can execute all of the DATA packets, you are guaranteed that the controller will start with the next unexecuted packet, following the correct sequence, the next time it comes around to this queue. i.e.: The controller is guaranteed not to execute any packets further down the queue (in the vertical direction) until all of the packets before it within this queue are executed. (granted you have not pointed any other queue or TD to any object within this queue).

An advantage to this is the Depth/Breadth bit and short packet detection. If you create a queue and point the Vertical pointer to a SETUP packet and enough DATA packets to transfer the data expected, then point the Queue's Horizontal pointer to a single TD holding the STATUS packet, if one of the DATA packets is a "short packet" (spd bit set), it will stop executing in the depth direction (depth bit set) and execute in the Breadth position.

For example, if you expect up to 255 bytes from the Configuration Descriptor, place a SETUP packet, and 32 DATA packets in the Vertical direction part of the queue (setting the Depth First bit) and a single STATUS packet in the Horizontal direction of the queue. The controller will try to execute all 32 DATA packets until it finds a Short Packet Detect. It will stop in the Depth direction and move to the Horizontal pointer, which is now your STATUS packet. This allows the controller to handle short packets, not your driver. Let the controller do the work.

This is just my opinion, but all single transfers (called TDs in the USB specs and later controller specs) should be in one queue, as far as the UHCI is concerned.

I am guessing that you are getting the STALL because the controller executed the SETUP packet, then either found a DATA packet out of sequence, the STATUS packet, or the same or different SETUP packet again. Remember, the control pipe expects SETUP/zero or more DATA/STATUS in sequence. It cannot handle multiple transfers at the same time. It must be SETUP/DATAx/STATUS.

Hope this helps.
Ben
- http://www.fysnet.net/the_universal_serial_bus.htm

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Sun Nov 07, 2021 4:13 pm
by mid
I see, thanks. I'll be sure to perform them in one go once I polish the code up.
BenLunt wrote:I am guessing that you are getting the STALL because the controller executed the SETUP packet, then either found a DATA packet out of sequence, the STATUS packet, or the same or different SETUP packet again. Remember, the control pipe expects SETUP/zero or more DATA/STATUS in sequence. It cannot handle multiple transfers at the same time. It must be SETUP/DATAx/STATUS.
Sorry, I must have been really unclear, but no TDs are placed by my code until all previous ones run successfully (hence the "wait until inactive" part in my above snippet), so at the time the SETUP packet ran, there was nothing else in the schedule.

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Sun Nov 07, 2021 6:28 pm
by BenLunt
Another thought about the stall, yet don't quote me on this, this is just a thought. I have not verified this at all.

The controller's firmware may be checking the T bit in the SETUP transfer descriptor. All SETUP packets will have, at the very least, a STATUS packet to follow. The firmware may be checking the T bit to see if the Transfer Descriptor is valid before it tries to execute it.

This is just a thought. I have in no way verified this, nor ever seen this. It just is (highly) unusual to see the T bit set in a SETUP transfer descriptor.

Anyway, without (much) more information and maybe even "hands on", I can only speculate what else may be causing the STALL.

Good luck and feel free to inquire again,
Ben

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Mon Nov 08, 2021 4:04 am
by mid
Well, I changed it to the following:

Code: Select all

QH *qh = (QH*) &buf[0];

TD *td1 = (TD*) &buf[16];
td1->link = 1;
td1->control = (1 << 23) | (1 << 27);
td1->token = (7 << 21) | (1 << 19) | 0x69;
td1->buffer = (uint32_t) Virt2Phys(&buf[56]);

TD *td0 = (TD*) &buf[32];
td0->link = (uint32_t) Virt2Phys(td1);
td0->control = (1 << 23) | (1 << 27);
td0->token = (7 << 21) | (0 << 19) | 0x2D;
td0->buffer = (uint32_t) Virt2Phys(&buf[48]);

buf[48] = 128;
buf[49] = 6;
buf[50] = 0;
buf[51] = 1;
buf[52] = 0;
buf[53] = 0;
buf[54] = 18;
buf[55] = 0;

qh->link = 1;
qh->element = (uint32_t) Virt2Phys(td0);

frameList[0] = (uint32_t) Virt2Phys(qh) | 2;
And it happened again - it successfully received the class codes once and started stalling on later runs! It's almost as if "it" wants me to do something differently each time... td0->control is still 0x00450007, while td1 was untouched.

Since you had to resort to speculation, I'm guessing you confirm that this snippet, at least in isolation, should be correct, so the issues could be elsewhere like in my EHCI handoff (which I need so my pen drive would be visible). This is a lot more stupid than I thought it would be, even after reading the many horror stories of USB.

I also tried requesting 8 bytes to try things the Doors way, but no dice.

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Wed Nov 10, 2021 8:34 am
by mid
I've done it. I'd like to thank a few heroes of the decade prior for assisting me spiritually in this journey, such as XanClic, iocoder and prajwal. If you look at their threads, however, you'll see that they seem to had either given up on their tasks and just wrote an EHCI driver, or had erratically been evolving their code until something worked. The latter was my method of choice, too.

And now, I no longer shall be mad when I come across an old post, where the solution was found yet not stated, because I understand why that is now - none of us knew what the hell we were doing. And, similarly, I have no idea what I had changed to have made it work. What I do know is that the line status bits were all zero before. When the D+ bit came on for me, I had wasted time thinking it was a new error, instead of a sign of progress. All of it works now. But, I am willing to share the entire relevant code with thorough comments, for whomever shall be reading this in the 2030s. It's not ideal, and it even sucks, but you know damn well I won't be touching this for as long as I can.

As for how the line status bit suddenly started being set, I think it's to do with me switching to a much more robust port enabling method. IOW, a simple "reset, wait, enable wait" sequence just isn't enough. So, ultimately, my conclusion is that I wasn't enabling the ports properly.

Code: Select all

/* Effectively disables all EHCIs, and initializes UHCIs */
int inithc() {
	while(1) {
		hcId = ClaimPCIDevice(0x0C032000); /* Class codes of EHCI */
		
		if(hcId == -1) { /* No longer finding new EHCIs */
			break;
		}
		
		/* The following variables are volatile, to make sure accesses are strictly 32-bit and in-order. */
		
		/* Get MMIO address from BAR0 */
		volatile uint32_t *usbbase = MemoryMap((PciRead32(hcId, 16) & ~15), 4096);
		
		/* First byte of the MMIO is the offset to the operational registers */
		volatile uint32_t *opregs = (uint32_t*) ((uintptr_t) usbbase + (usbbase[0] & 0xFF));
		
		/* Extended capabilities are stored in PCI config space, but the offset is read from MMIO */
		uint8_t caps = (usbbase[2] & 0xFF00) >> 8;
		if(caps < 0x40) {
			/* Capabilities cannot exist below 0x40, so */
			/* give up, but it might be doable more gracefully */
			return 0;
		} else {
			/* Although capabilities were designed to be extensible, and thus dynamically ordered in memory, EHCI had only defined one at the beginning, and it's the one we need */
			/* You might want to have a timeout system here */
			/* Request BIOS abdication by writing OS ownership bit */
			PciWrite8(hcId, caps + 3, 1);
			/* Wait until the BIOS ownership bit resets */
			while((PciRead32(hcId, caps) & (1 << 16)) != 0);
			/* Wait until the OS ownership bit actually sets */
			while((PciRead32(hcId, caps) & (1 << 24)) == 0);
			/* Disable legacy support (USBLEGCTLSTS in spec) */
			PciWrite32(hcId, caps + 4, 0);
		}
		
		/* Stop EHCI. Necessary to reset it */
		opregs[0] &= ~1;
		
		/* Poll the halted bit to wait until it truly stops */
		do {
			/* This is a memory barrier, to slap the wrists of overly smart compilers */
			asm volatile(: : : "memory");
		} while((opregs[1] & 4096) == 0);
		
		/* Finally reset it */
		opregs[0] |= 2;
		
		do {
			asm volatile(: : : "memory");
		} while(opregs[0] & 2);
		
		/* A lot of these are default values, but they're set again as a sanity check */
		opregs[1] = opregs[1]; /* Clears WC bits */
		opregs[2] = 0;
		opregs[4] = 0;
		opregs[0] = 0x80000;
		opregs[16] = 0;
		
		MemoryUnmap(usbbase, 4096);
	}
	
	while(1) {
		hcId = ClaimPCIDevice(0x0C030000); /* Class codes of UHCI */
		
		if(hcId == -1) {
			break;
		}
		
		/* Get IOIO address */
		hcIo = PciRead32(hcId, 0x20) & ~3;
		
		/* Disable legacy support */
		PciWrite16(hcId, 0xC0, 0x8F00);
		
		/* Reset it */
		outw(hcIo + 0, 2);
		while(inw(hcIo + 2) & 2);
		
		/* Disable interrupts */
		outw(hcIo + 4, 0);
		/* Self-explanatory */
		outd(hcIo + 8, (uint32_t) Virt2Phys(frameList));
		outb(hcIo + 12, 64);
		outw(hcIo + 2, 0xFFFF);
		outw(hcIo + 0, 1);
		
		/* UHCIs can technically have more than two ports, but that's not my problem */
		for(int p = 0; p < 2; p++) {
			/* Reset it */
			outw(hcIo + 16 + p+p, 512);
			SleepMS(50);
			/* You must stop resetting manually */
			outw(hcIo + 16 + p+p, inw(hcIo + 16 + p+p) & ~512);
			
			for(int try = 0; try < 10; try++) {
				SleepMS(10);
				uint16_t sc = inw(hcIo + 16 + p+p);
				/* If no device connected, no point in continuing */
				if((sc & 1) == 0) break;
				/* If WC bits are set, clear them */
				if(sc & (2 | 8)) outw(hcIo + 16 + p+p, sc);
				/* */
				if(sc & 4) break;
				outw(hcIo + 16 + p+p, sc | 4);
			}
			
			if(inw(hcIo + 16 + p+p) & 1) {
				/* It's here that you should be able to transfer packets finally */
			}
			
			/* Only for testing, disable the port */
			outw(hcIo + 16 + p+p, 0);
		}
		
		/* Only for testing, disable the HC */
		outw(hcIo + 0, 0);
	}
	
	return 1;
}
The above is public domain, or CC0. If you'll be using the above, make sure to do it exactly as is, and only later try making it cleaner. Whatever you may think is redundant, put it in anyway. You must not underestimate the prissiness of what's involved. Neither you and I are meant to comprehend it.

Thank you, Ben, for the help, too. This 6-month saga can finally be put to rest.

Re: UHCI setup packet stalls for all devices most of the tim

Posted: Wed Nov 10, 2021 6:33 pm
by BenLunt
mid wrote:I've done it.
Isn't a great feeling when that happens?
mid wrote:What I do know is that the line status bits were all zero before. When the D+ bit came on for me, I had wasted time thinking it was a new error, instead of a sign of progress. All of it works now. But, I am willing to share the entire relevant code with thorough comments, for whomever shall be reading this in the 2030s. It's not ideal, and it even sucks, but you know damn well I won't be touching this for as long as I can.

As for how the line status bit suddenly started being set, I think it's to do with me switching to a much more robust port enabling method. IOW, a simple "reset, wait, enable wait" sequence just isn't enough. So, ultimately, my conclusion is that I wasn't enabling the ports properly.
Oh yes. Sorry I didn't ask you about your reset/enable code. I have actually seen this before, and when I read you saying that the status bits (d+/d-) were zero, I instantly remembered that this was the exact cause. The port did not reset long enough before the enable. Sorry, I forgot all about this, though the issue you explained is exactly what that was.

Anyway, I am glad for you that you got it working. I have spent nearly 25 years working with USB and still have issues with my code, though learn something new every time I find a fix.

It is a good hobby we have. Good luck and again, feel free to ask again the next time. Maybe I will be more helpful.

Ben