Author Topic: Large HTTP transfers failing due to retransmission counter  (Read 11214 times)

Offline pht

  • Newbie
  • *
  • Posts: 7
    • View Profile
Hello,

For some time now we've been having problem transferring large files (>1MB) over HTTP. uTasker would send a RST after some time. After investigation, I was able to identify the culprit. fnPollTCP will decrement the ptr_TCP->ucRetransmissions counter each time a TCP_EVENT_REGENERATE is generated, and when the counter value is zero then the connection is closed (fnCloseTCPSession is called). The only place I see the counter being reset is in fnNewTCPState which sets the counter value to TCP_DEF_RETRIES.

Shouldn't the ucRetransmissions counter be reset after each successfully transferred packet? Where would be the most appropriate place to do it?

Thanks for your support
Phil

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Re: Large HTTP transfers failing due to retransmission counter
« Reply #1 on: May 17, 2010, 07:52:20 PM »
Hi Phil

I don't understand exactly why the connection is being reset. In which direction is the data being sent and is there data being sent in both directions at the same time? Is it in a local network or in the Internet?

It is correct that ptr_TCP->ucRetransmissions counter is decremented each time a tx frame is regenerated. This limits the number of regenerated frames to TCP_DEF_RETRIES, after which the connection will be reset. (i.e. the maximum number of retries has been attempted and the socket gives up).

If there is an ACK received from any of the retransmissions it will cause the counter to be reset to its maximum value again.

If the counter were to be reset to maximum value each time a frame was transmitted (on the transmission and not the ACK) it would mean that a repetition (eg. due to the link being down) would repeat forever and the socket would get stuck in this state.

Therefore I don't think that the problem is with the counter. If there are repetitions, and it really resets after the maximum number of repeats, it sound more like something else is failing. In a local network a repetition would normally almost never occur and maximum repetitions would only take place when something was really not working any more.

Could you send a Wireshark recording of the actual case where the reset takes place?

Regards

Mark

Offline pht

  • Newbie
  • *
  • Posts: 7
    • View Profile
Re: Large HTTP transfers failing due to retransmission counter
« Reply #2 on: May 17, 2010, 08:38:12 PM »
Hello Mark,

The transfer happens on a local network, and the uTasker device is sending the file to a PC.

Quote
I don't understand exactly why the connection is being reset.
The connection is reset by the fnCloseTCPSession function which does the following:
Code: [Select]
    ptr_TCP->ucSendFlags = TCP_FLAG_RESET;
    fnSendTCPControl(ptr_TCP);

Quote
If there is an ACK received from any of the retransmissions it will cause the counter to be reset to its maximum value again.

Are you sure about that? I've searched the variable ucRetransmissions in source code. The only time that it is set to TCP_DEF_RETRIES is when fnNewTCPState is called. Obviously, this function is not called after each frame. To be sure, I printed the value of ucRetransmissions each time fnPollTCP is hit and it does not seem to be reset.

If my diagnosis happen to be correct then a fix could be to reset the counter upon receiving an ACK (in fnHandleTCP under the TCP_STATE_ESTABLISHED case).

Code: [Select]
                if (rx_tcp_packet.ulAckNo == ptr_TCP->ulNextTransmissionNumber ) {
                    ptr_TCP->ulSendUnackedNumber = ptr_TCP->ulNextTransmissionNumber; // No more unacked data
                    ptr_TCP->ucRetransmissions = TCP_DEF_RETRIES; // <-------------- FIX
                    // inform application that data has been successfully acked
                    if (APP_REQUEST_CLOSE & (iAppTCP |= ptr_TCP->event_listener(ptr_TCP->MySocketNumber, TCP_EVENT_ACK, ptr_TCP->ucRemoteIP, ptr_TCP->usRemport))) {
                        if (rx_tcp_packet.usHeaderLengthAndFlags & TCP_FLAG_FIN) {    // The application has commanded the close of the connection
                            // We have just received FIN + ACK, meaning that the other side has also initiated a close
                            // This a simultaneous close, so we go to state closing and wait for the last ack
                            fnNewTCPState(ptr_TCP, TCP_STATE_CLOSING);
                        }
                        return;
                    }
                }



If my diagnosis is wrong the I can send some Wireshark captures.




Phil

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Re: Large HTTP transfers failing due to retransmission counter
« Reply #3 on: May 17, 2010, 09:33:30 PM »
Hi Phil

I don't see the need to add a retrigger here since there is already a retrigger at the end of the TCP_STATE_ESTABLISHED case.

See:
            fnNewTCPState(ptr_TCP, TCP_STATE_ESTABLISHED);               // retrigger keep alive timer since we have detected activity

which unconditionally does
    ptr_TCP->ucRetransmissions = TCP_DEF_RETRIES;                        // standard retries if the message times out


Not that it should make any difference in this case, I note that your version seems to be quite old - check the latest version just to be sure that there is nothing which could explain such an effect.

Since I still understand that it needs maximum repetitions to actually get to the state that it closes the connection I am still wondering why there are so many repetitions - I would post a recording since it may make things clearer.

regards

Mark

Offline pht

  • Newbie
  • *
  • Posts: 7
    • View Profile
Re: Large HTTP transfers failing due to retransmission counter
« Reply #4 on: May 18, 2010, 03:48:09 PM »
Hello Mark,

After digging I've found my problem. Totally my fault. I made some modifications a long time ago in the HTTP module. Instead of returning something like (fnSendTCP(...) > 0) I forgot the "> 0" part and returned fnSendTCP(). Since fnSendTCP() will almost always return 1400 (0b10101111000), the fourth bit will always be set to 1, thus causing the fnHandleTCP to think that the APP_REQUEST_CLOSE flag was set. Therefore the fnHandleTCP returned without calling fnNewTCPState:

Code: [Select]
if (APP_REQUEST_CLOSE & (iAppTCP |= ptr_TCP->event_listener(ptr_TCP->MySocketNumber, TCP_EVENT_ACK, ptr_TCP->ucRemoteIP, ptr_TCP->usRemport)))
{
    /* ... */
    return;
}


Anyway, sorry for wasting your time and thanks for pointing me out in the right direction.

Phil



Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Re: Large HTTP transfers failing due to retransmission counter
« Reply #5 on: May 18, 2010, 04:08:37 PM »
Hi Phil

I am glad that you found the cause.

Regards

Mark