Author Topic: Transition from HS to FS USB  (Read 21279 times)

Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Transition from HS to FS USB
« on: February 23, 2022, 10:33:50 PM »
Hi everyone,

We've been forced by the chip shortage to move from our MK66FN2M0 chip with HS USB to a MK24FN1M0 that only has FS USB. While initial very basic testing looked good (after some advice from Mark), we've hit some serious stability issues. Essentially our app involves sending ~850 bytes at 10Hz to the PC, which should be more than enough for the 12Mbps of FS USB, but we're seeing serious stability issues. I've captured a log of such an instance, opens with USBlyzer, a USB sniffing tool.  (https://www.usblyzer.com/download.htm).

I also took a screenshot of when the issue occurs in the log. The odd thing is that this seems to be PC-related. A faster PC is much more unlikely to generate the problem, while slower PCs trigger it almost immediately.  The faster PCs are able to keep a stable connection for minutes at an end. I'll be doing further troubleshooting tomorrow, as I'll connect some debug pins and try to see whereabouts in the USB flow the problem appears, but any tips would be greatly appreciated. As with most development projects, this is also running late :)

Thank you!

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3243
    • View Profile
    • uTasker
Re: Transition from HS to FS USB
« Reply #1 on: February 24, 2022, 12:02:37 AM »
Hi Alex
Could you post a reference where is is working since at the moment i don't understand what the log is showing and what the error actually is.
There is something about an unsuccessful stall PID in an IN bulk or interrupt transfer, but I don't see any details about what endpoints are being used.
Also, is it possible to see the enumeration too so that the device details are known?
Regards
Mark

Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Re: Transition from HS to FS USB
« Reply #2 on: February 24, 2022, 12:02:34 PM »
Hi,

Thanks for the reply. Things got a little heated close to our northern border, so hoping to get this sorted relatively quickly. I've attached full device information as seen by the USB sniffer. After investigating further, the device simply enters the defined hard fault handler after a random time of USB exchanges. The hard fault is always forced and the fault is either "Undefined Instruction", "Access Violation" or similar. The stack seems healthy, but I'm working on narrowing down potential causes





Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Re: Transition from HS to FS USB
« Reply #3 on: February 24, 2022, 06:12:31 PM »
Managed to narrow it down to a bus fault error that always happens on these lines. That address that's loaded into R2 is totally bogus and what's triggering the bus fault error.

Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Re: Transition from HS to FS USB
« Reply #4 on: February 27, 2022, 12:44:39 AM »
Did a bit more investigating tonight, but there's a behavior that is really impossible for me to explain. Please take a minute to look at the pictures I've attached and follow my rationale below. It might be that, after about 40hrs of debugging this, I'm missing something really obvious.

1. First of all, please look at 'fault.png'. The relevant registers say that the error address is valid and that the BusFault was caused by a load from address 0x0040 0008.
2. Now in 'main.png'. The callstack suggests that the instruction that caused the fault is at at 0x9340 (below the red line in the disassembly window). This looks plausible as the current value in R2 (left side pane) is indeed 0x0040 0000 and, with an offset of 8, gives us exactly the address in BFAR.
3. But how can R2 have 0x0040 0000 in it when, just in the previous instruction, at PC 0x933E, its value is loaded from the address stored in R7, with a 0 bytes offset.
4. Looking at the value in R7, R2 should actually have 0x2000 FF48, which is what the memory view shows us.

So how is it possible that between two instructions in the USB memory handler, only the value of a single register (R2) gets changed.

PS: When I manually stepped through the function in a working instance, that 0x0040 0000 was actually the result of the operation above (ANDS R3, R2). It's almost like the underlined instruction was never executed.

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3243
    • View Profile
    • uTasker
Re: Transition from HS to FS USB
« Reply #5 on: February 28, 2022, 04:24:24 PM »
Alex

I understand you can have a hard fault in the following sequence:

1. (Called from USB OTG interrupt and interrupts globally disabled for fnUSB_handle_frame() handling)
                uDisable_Interrupt();                                    // ensure interrupts remain blocked when handling the next possible transmission
                    fnUSB_handle_frame(USB_TX_ACKED, 0, iEndpoint_ref, &usb_hardware); // handle tx ack event
                uEnable_Interrupt();                                     // allow higher priority interrupts again


2.  (when extra data to send is in a linear block)
                FNSEND_USB_DATA((tx_queue->ptrStart + tx_queue->usSent), usDataLength, iEndpoint, ptrUSB_HW); // transmit next buffer
                fnPushLastLength(ptrUsbQueue, usDataLength);             // save last length for use later
                tx_queue->usSent += usDataLength;                        // total frame length in progress


3. Where
        #define FNSEND_USB_DATA(pData, Len, iEndpoint, ptrUSB_HW) *ptrUSB_HW->ptrTxDatBuffer = (unsigned char *)pData; \
                *ptrUSB_HW->ptr_ulUSB_BDControl = (unsigned long)(SET_FRAME_LENGTH(Len) | (OWN | ptrUSB_HW->ptrEndpoint->ulNextTxData0)); \
                _SIM_USB(0, USB_SIM_TX, iEndpoint, ptrUSB_HW); \
                ptrUSB_HW->ptrEndpoint->ulNextTxData0 ^= DATA_1; \
                ptrUSB_HW->ptrEndpoint->ulEndpointSize ^= ALTERNATE_TX_BUFFER


and
the struct in question starts with
typedef struct stUSB_HW
{
    unsigned long  ulRxControl;
    volatile unsigned long *ptr_ulUSB_BDControl;                         // pointer to the presently valid tx buffer descriptor control entry
    USB_END_POINT *ptrEndpoint;
....


and
typedef struct stUSB_END_POINT
{
    unsigned long ulNextRxData0;
    unsigned long ulNextTxData0;
    unsigned long ulEndpointSize;                                        // contains size of endpoint plus some control flags
} USB_END_POINT;



4. From the screen shot it looks like this line can do it:

*ptrUSB_HW->ptr_ulUSB_BDControl = (unsigned long)(SET_FRAME_LENGTH(Len) | (OWN | ptrUSB_HW->ptrEndpoint->ulNextTxData0));

and specifically the
ptrUSB_HW->ptrEndpoint->ulNextTxData0
can fail.

5. The assembler  to this is
ldr r2, [r7, #0]
ldr r2, [r2, #8]
ldr r2, [r2, #4]


ptrUSB_HW is in the register R7 (0x2002ff48), which points to a location at 0x2000ff48 according to the memory window.

r2 is however showing 0x400000, which is the previous content which is typically SET_FRAME_LENGTH(Len) [0x40 << 16], meaning that either this instruction has either not yet been executed or has been overwritten.

If the second assembly instruction is failing (with R2 at 0x400000) it would indeed be a read fr0m 0x400008, which is an invalid address, whereby the instruction's aim is to load the pointer ptrEndpoint so that the following instruction can load the value of ulNextTxData0.

The struct looks to be intact and the location of ptrEndpoint would be OK.

Can you check how uDisable_Interrupt() is set up in your system? I know that you are using FreeRTOs with the USB stack and USB driver and, since it doesn't look possible for r2 to not have the correct value after the first assembler instruction has been executed, I would carefully ensure that no task switching could be taking place (that is, that it is really correctly protected) and the register instance somehow being changed in the process.

Regards

Mark


Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Re: Transition from HS to FS USB
« Reply #6 on: March 01, 2022, 12:23:44 PM »
Hi Mark,

Thanks for your reply! Have managed to make some progress, but said progress just muddied the waters in terms of what's causing this.

I'll outline below what I have done to debug this and the results:
1. Initially I disabled all instruction and data caching and prefetching. This move has totally eliminated the hard fault occurring in the fnUSB_handle_frame() function and only left me with hard faults that are shown to originate in the prvPortStartFirstTask() function (I've added a picture)
2. Then, when I reduced the clock multiplier from 30 to 24 (essentially lowering the core frequency from 120MHz to 96), all problems essentially disappeared. Which is very odd considering the fact that the USB clock is derived from the internal 48Mhz reference, not from external sources.
3. I then enabled back instruction and data caching and prefetching, got back some of the performance I lost, but the system stayed stable.

As you can see, it's a very odd sequence of events. I can't see how reducing the core speed would get rid of all the issues, unless the system was overclocked to start with.

LE: uDisableInterrupt() (have removed what the precompiler leaves out)

extern void uDisable_Interrupt(void)
{
    __disable_interrupt();                                               // disable interrupts to core
    iInterruptLevel++;                                                   // monitor the level of disable nesting
}
« Last Edit: March 01, 2022, 12:27:59 PM by AlexS »

Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Re: Transition from HS to FS USB
« Reply #7 on: March 01, 2022, 01:33:04 PM »
Quick update: it seems that it wasn't the core clock influencing the hard fault occurrence, but actually the flash clock.

Anecdotally, after a few tests:
1. Core clock 96MHz, flash clock 24MHz -> getting faults as often as before
2. Core clock 120MHz, flash clock at 20MHz -> not getting the faults

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3243
    • View Profile
    • uTasker
Re: Transition from HS to FS USB
« Reply #8 on: March 01, 2022, 03:24:02 PM »
Hi Alex

I did once run the flash clock of a Kinetis part slightly out of specification (by mistake) and found that it would crash very quickly.

24MHz is within spec. so reducing to 20MHz and finding a difference could mean that the chips are not fully qualified. There is another product that I am involved with that needed to move from a K66 to a K22 due to chip supply problems. In that case it was abandoned since a standard Cortex-M4 instruction caused an unknown instruction hard fault when used by compiled code. Instead a redesign with an i.MX RT part has been used.

Therefore, although I didn't physically test the K22s myself, it is worrying that possible Kinetis parts that are now in circulation may not be up to scratch.

In other products recycled chips are being used (taken from scrapped equipment) - there are some good stocks from companies specialised in recycling which, although already a number of years in service, careful testing has shown that they have proven reliable.

My general advice for all in a difficult situation with Kinetis part supply (where the end is still not in sight) is to seriously consider i.MX RT alternatives since these, although requiring a HW change, is an intelligent long-term investment that has multiple advantages. uTasker users can simply move between Kinetis and i.MX RT since existing products can run on these with very little configuration (in the best case by changing the _KINETIS define to _iMX and adjusting the port muxes to suit).

Regards

Mark




Offline AlexS

  • Newbie
  • *
  • Posts: 46
    • View Profile
Re: Transition from HS to FS USB
« Reply #9 on: March 11, 2022, 07:46:15 PM »
In order to have some sort of conclusion on this. After reducing the Kinetis flash clock to 20MHz (from 24Mhz, confirmed using an oscilloscope), the hard fault disappeared. After further investigation into another issue (noisy ADC readings) we believe the problem is actually the design of the chip's power supply lines. Instead of a 3V3 plane with decoupling capacitors close to the pins themselves, our design has a single trace running from all the 5 supply pins to the outside of the chip and then the decoupling capacitors are all placed in parallel there.

We're currently doing another design iteration to confirm that was indeed the problem. Will update this post when that happens. Thanks for your help!