Author Topic: Ethernet speed  (Read 21708 times)

Offline Michael Bach

  • Newbie
  • *
  • Posts: 2
    • View Profile
Ethernet speed
« on: July 19, 2007, 12:24:15 PM »
Hi there!

I wrote a simple loop black socket to see how much data I can exchange over Ethernet. The whole thing looks like this:

Code: [Select]
/*************************************************************************
TEST TCP Socket
*************************************************************************/
static int testDataListener(USOCKET Socket, unsigned char ucEvent, unsigned char *ucIp_Data, unsigned short usPortLen);

static USOCKET TCP_test_Socket;
#define test_port_number 6574

static int status = 0;
unsigned char ucTestData[] = {0,1,2,3,4,5,6,7,8,9};

/*************************************************************************
end
*************************************************************************/


static int testDataListener(USOCKET Socket, unsigned char ucEvent, unsigned char *ucIp_Data, unsigned short usPortLen)
{

    switch (ucEvent) {
    case TCP_EVENT_ABORT: //Connection not possible
break;
    case TCP_EVENT_CONNECTED: //Connection has been established
break;
    case TCP_EVENT_ACK: //An ACK has been received
break;
    case TCP_EVENT_DATA: //Data has been received
      if (fnSendTCP(TCP_test_Socket, ucIp_Data, usPortLen, TCP_FLAG_PUSH) > 0)
      {
return APP_SENT_DATA;
      }
break;
    case TCP_EVENT_CLOSE: //Close request has been received
break;
    case TCP_EVENT_CLOSED: //Connection has been closed
status = 1;
fnTCP_Listen(TCP_test_Socket, test_port_number, 0);
break;
    case TCP_EVENT_REGENERATE: //Transmitted data has been lost
break;
    case TCP_EVENT_CONREQ: //A remote client has requested a connection
break;
    case TCP_EVENT_ARP_RESOLUTION_FAILED: //IP address of remote server not resolved
break;
    case TCP_EVENT_PARTIAL_ACK: //Ack to some data (when windowing enabled)
        break;
    }
    return APP_ACCEPT; //Standard reply which accepts
}


The received data is simply sent back. I'm not quite sure whether I used the functions/parameters correctly; in any case, the loop back works fine. I've got an app we used in an earlier project and it tells me that it works.
I sent some files and measured the speed, which was about 1 Mbps. What can I do to increase the speed? I need the board to receive and send huge amounts of data...

best regards,
Michael Bach

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3239
    • View Profile
    • uTasker
Re: Ethernet speed
« Reply #1 on: July 19, 2007, 03:31:55 PM »
Hi Michael

At the moment it is difficult to make any conclusions since you seem to be measuring the throughput from your test PC via an echo back to the test PC. Therefore the throughput is defined by a number of presently unknown parameters.

1. Best throughput is achieved when using maximum TCP frame size and so it is necessary to know the frame size being used for testing (this is not visible in the code since it echos that back what it receives and so the length is determined by the received data). If the test frames are small, the overhead can greatly reduce efficiency.

2. The throughput is defined by the PC transmission time, the Ethernet reception delay plus transmission delay and then the delay to get back to the test PC. Then the test PC will hanve to handle the echo and probably start the next test. Each of these phases will have to be known to identify where any bottle-necks are.

Take a look at the following document:
http://www.utasker.com/docs/uTasker/uTaskerBenchmarks.PDF

There is an UDP echo test which was measured for the Freescale M5223X (not the SAM7X which you are using) but the results should be quite similar (but see below). TCP is a bit more complicated than UDP and so the SW handling take a little more time. In any case the check sums in the frames are probably limiting values in terms of absolute performance.
A disadvantage of the SAM7X is that it has 128byte rx LAN buffers and so the received frame has to be grouped into a linear rx buffer of max LAN frame size (involving memcpy()).

Can you send me a short sequence of the test from an Ethereal recording so that I can see where the delays are and where I may be able to make a suggestion as how to optimise the test?

Thanks

Mark
« Last Edit: September 07, 2007, 10:41:03 PM by mark »

Offline Michael Bach

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: Ethernet speed
« Reply #2 on: July 20, 2007, 09:31:53 AM »
Well. My concern is that an ARM might be far to slow to handle Network traffic. It might be ok for simple network communication, but we weren't sure whether we can reach a throughput of 10MB/s which we need at least.

Is there any way to decrease the execution time of the rx handling routines?

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3239
    • View Profile
    • uTasker
Re: Ethernet speed
« Reply #3 on: July 20, 2007, 11:20:39 AM »
Hi Michael

I checked in the recording that you sent me (by private mail).

The PC has a short delay of around 200us and the SAM7X has a delay of around 2.5ms. This is about that what I would expect.
You are sending and receiving 30 echos (1460byte payload) in 80ms which results in an echo throughput of 547kByte / s or 4.38MBit/s.
This is a round trip time which is dictated mainly by the time that it takes the SAM7X to handle the frame reception and send back the echo frame.

In normal circumstances the reaction time will not be influenced by other protocols activated since the other protocols are not operating unless the frame type is for them. Any other tasks in the system are usually sleeping (not polling) until they are woken to do work destined for them. Your test shows a very stable echo response time - although there can be some slightly longer reaction times depending on what other tasks the processor could be performing in a project with multiple functions.

The interrupt itself doesn't process the frame but simply wakes the reception task which then does the work outside of the interrupt routiune. The task scheduling will be in us region so doesn't contribute to the main delay time being seen.

When a frame is received it has to be checked for correct IP contents and then for correct TCP contents. Quite a time intensive part is the calculation of the check sum over the frame - since each byte has to be checked and calculated.
The SAM7X 'unfortunately' also has to do a memcpy of each frame to a linear buffer due to the fact that the Ethernet buffer is made up of a chain of smaller buffers which have to have a circular wrap at the end (this is a characteristic of the chip) - in would be possible to remove the copy for frames which do not actually wrap in the buffer and define as many buffers as possible so that the wrap occurs, say, once every 10 frames - this would reduces the time by maybe 0.4ms for 90% of frames.

When echoing the frame, the TCP header has to be modified and thus also the check sum of the new frame on transmission (as I noted, this tends to be the highest processor overhead). There is finally one memcpy to the transmit buffer (this last copy can be optimised away but it makes the solution very specific, so for general use the copy is performed).

If TCP transmissions are within a LAN it would be possible to remove check sum verification on reception frames (relying on the CRC over the Ethernet frame which is stronger anyway). This would accelerate reception by maybe 50% - it is however not advisable for frames originating from outside of a local network/infranet (which is often the case...)

My conclusion is that your results are as expected for this chip. There are a few points where acceleration are possible but these may make for a less general purpose solution. However the processing power of the chip used finally defines the throughput which is possible. It may be possible to do some project specific optimisation which can reduce throughput (in best case) by possibly factor 2 but this will then be a very specific solution and be at the operating limits of the device. If more reserve is needed and a specific optimisation avoided then more powerful chips (or the use of hardware implementations) may be the best solution.

Regards

Mark
« Last Edit: July 20, 2007, 01:19:50 PM by mark »