Author Topic: Ethernet DMA for LPC23XX project  (Read 10090 times)

Offline martinh

  • Newbie
  • *
  • Posts: 25
    • View Profile
Ethernet DMA for LPC23XX project
« on: November 08, 2012, 09:17:05 AM »
Hello,

we are using the LPC23XX project to transfer data via ethernet (UDP protocol). Now that the amount of data has increased we are looking for a way to speed up transfer.
As a first step checksum calculation in fnSendUDP() has been cancelled.
That led to a transfer rate of 2,93ms per 11060 bytes, i.e. 265ns/byte as an average.

The next idea is to use DMA for transfering the data.
Has such an ethernet DMA transfer ever been realized with that LPC2XXX project?
Does anybody know of an appropriate project or can give some hints where to start?
Any comment would be welcome.

Martin H.

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3079
    • View Profile
    • uTasker
Re: Ethernet DMA for LPC23XX project
« Reply #1 on: November 08, 2012, 01:24:49 PM »
Hi Marin

The Ethernet controller is using DMA to transfer the data from RAM to the Ethernet port.
It is not using DMA to set the data to the SRAM - the DMA controller in the LPC23xx doesn't support RAM-RAM DMA transfers.

Compare your results with these for the M522xx and Kinetis here: http://www.utasker.com/docs/uTasker/uTaskerBenchmarks.PDF
For optimal results (with CS) a processor with IP checksum offloading has quite large advantages as shown by Kinetis results).

Regards

Mark

Offline martinh

  • Newbie
  • *
  • Posts: 25
    • View Profile
Re: Ethernet DMA for LPC23XX project
« Reply #2 on: November 15, 2012, 04:12:20 PM »
Hi Mark,

so with other words the suggested way to speed up the whole transfer is to use uMemcpy() with DMA ?
Has that DMA transfer with uMemcpy() already been realized in the project?
At least I did not find the DMA initialisation. Could you give me a pointer?

Regards,
Martin

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3079
    • View Profile
    • uTasker
Re: Ethernet DMA for LPC23XX project
« Reply #3 on: November 15, 2012, 06:17:05 PM »
Hi Martin

Unfortunately it is not possible to use DMA in uMemcpy() since the LPCs doesn't support memory->memory DMA transfers.
This is one of the first DMA routines that I add because it makes a measurable difference in comparison to a loop doing it but the LPCs only support peripheral DMA transfers to certain memory areas - which is of course a shame.

Regards

Mark

Offline martinh

  • Newbie
  • *
  • Posts: 25
    • View Profile
Re: Ethernet DMA for LPC23XX project
« Reply #4 on: November 16, 2012, 09:13:47 AM »
Hi Mark,

that is surprising me.
I am using LPC2468 and the manual says:

“30.3 Features of the GPDMA

Memory-to-memory, memory-to-peripheral, peripheral-to-memory, and
peripheral-to-peripheral transfers”

I am aware that the project was made for LPC2368, but the manual for LPC2368 contains the same information.

Regards,
Martin

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3079
    • View Profile
    • uTasker
Re: Ethernet DMA for LPC23XX project
« Reply #5 on: November 16, 2012, 03:16:28 PM »
Hi Martin

I think that the restriction is of more general nature - in section 30.4.1 the memory regions supported by DMA are shown - these are off chip memory (on LPC2377/78/88) and USB RAM (8k or 16k on LPC2387).

This means that memory-memory transfers on a device without external RAM is restricted to the USB RAM. This means that it is still not generaly possible to use DMA in uMemcpy() unless a check was made that the source and destination are in the correct areas and it still wouln't help in copying to the Ethernet RAM area.

Regards

Mark

Offline martinh

  • Newbie
  • *
  • Posts: 25
    • View Profile
Re: Ethernet DMA for LPC23XX project
« Reply #6 on: January 09, 2013, 02:00:38 PM »
Hi Mark,

I must admit that I over and over again had ignored that passage in the manual because everything seemed to be so obvious.

As in my case, the amount of data that has to be transmitted is a multiple of 1472. I thought about transmitting the data blocks directly from SDRAM (0xA000 0000…).
The idea is to send the first block as usual, i.e. the data is copied to the Ethernet RAM first (for instance ptTTYQue->ETH_queue.put=0x7fe02a70). When it comes to the following blocks, there is enough free memory before that next block in SDRAM . This memory can be overwritten by that ucData[MAX_IP] block that is created in fnSendIP().
So if the pointer at ucData[] (in the SDRAM, before the data block)  could  be given to an appropriate routine the whole amount of date could be transmitted block by block directly from the external SDRAM. So using memcpy() for that large amount of data would be avoided. And as the transfer protocol is udp, retransmission of erroneous data is not necessary.
My investigations so far ended at fnStartEthTx() where data is sent.
Though I am not yet familiar with that EMAC_TxDescriptor and EMAC_TxProduceIndex, is that a possible way to safe time?
Or is it a stupid idea? I mean if the data that needs to be sent has to be in the Ethernet RAM, all that would make no sense.
What is your opinion?

Regards,

Martin
« Last Edit: January 09, 2013, 03:57:15 PM by martinh »

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3079
    • View Profile
    • uTasker
Re: Ethernet DMA for LPC23XX project
« Reply #7 on: January 09, 2013, 06:58:07 PM »
Hi Martin

The NXP examples always use the Ethernet SRAM area for buffer descriptors and Ethernet data. I think that the first test that needs to be made is to see whether the Ethernet module can work with different areas (the Ethernet area may be tightly coupled and so more efficient but maybe it can still work from other areas, inclding external RAM. To do this, the Ethernet buffer descriptors can be left as they are but the actual data buffers located to somewhere else:
fnConfigEthernet() -
    EMAC_TxDescriptor = (unsigned long)ptrBd;                            // inform the MAC where the first transmit buffer is located
    while (i--) {                                                        // initialise the buffer descriptors
        ptrBd->ptrBlock = ptrBuffer;                                     // initialise the buffer descriptors
        ptrBuffer += pars->usSizeTx;
        ptrBd++;
    }

The ptrBd->ptrBlock = ptrBuffer; points the buffer descriptors to the bufrfer location and overwriting these with a different pointer would allow this to be tested. Beware however of the alignment requirements for memory; the buffers need to be long word aligned.
fnGetTxBufferAdd() is used to locate where the data is to be written to so the rest should be automatic.
If this however doesn't work it means that there is a physical restriction (I didn't see in the data sheet that this was specifically noted) and so the Ethernet SRAM limits the total size.

When sending UDP at maximum speed it is also worth looking at the scatter-and-gatter capabilities of the Ethernet DMA operation.
The uTasker output uses a linear buffer which is the simplest method and is also compatible for all Ethernet controller types (some use buffer descriptors, some use FIFOs). It is possible to avoid some data copies if the header and payload are not moved to the buffer but instead the payload is left where it originately from (if this is possible). This involves building the IP/UDP header in one buffer and then configuring the Tx to send just the header directly from where it is located in memory with any restrictions as to area and alignment already accounted for. Then the payload is chained (again assuming that its location is suitable) to take place after the first transfer has terminated. The frame is then sent as a single frame buf with content from two individual buffers.

This does have down sides to it in that the data has to be held stable until it has been sent (added complications and possible extra memory overhead to avoid it and still keep highest speed) and the extra general driver complications. In a special case this should however be considered when absolute fastest throughput is to be strived for.

Generally I prefer to keep the Tx details as simple as possible (also for compatibilty reasons). The highest overhead for the processor tends to be checksum calculations and using a processor with checksum offloading (like STM32 or Kinetis - see comparison here: http://www.utasker.com/docs/uTasker/uTaskerBenchmarks.PDF ) can achieve a high performance increase without needing to deviate from a general-purpose solution.
If you can however use just UDP without checksum operation this logic is not quite so correct...

Regards

Mark




Offline martinh

  • Newbie
  • *
  • Posts: 25
    • View Profile
Re: Ethernet DMA for LPC23XX project
« Reply #8 on: January 11, 2013, 03:21:51 PM »
Hi Mark,

the first test was successful. With the buffers in external SDRAM transfer was ok.

I would prefer to avoid using the scatter-and-gatter capabilities - for simplicity reasons.
But putting the IP/UDP header before the payload means to overwrite previous data (at least for the second and all following blocks). And - as you wrote  - data has to be held stable until transmission has finished.
Principally I might fill the IP/UDP header immediately after fnStartEthTx() has been called. But I fear that doing so might overwrite the previous block before it has been sent.
Do you know of a criteria that tells us if the contents of  the buffer has been completely sent and the buffer is free for writing?

Regards,
Martin

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3079
    • View Profile
    • uTasker
Re: Ethernet DMA for LPC23XX project
« Reply #9 on: January 12, 2013, 05:37:45 PM »
Hi Martin

To monitor the TX progress the TxConsumeIndex register can be used.
This is equal to TxProduceIndexwhen the SW hasn't prepared anything for transmission (tx will be idle due to this). Once data has been prepared for transmission TxProduceIndex is incremented by the number of buffer descriptors ready to be sent and this causes transmission to begin.
If you use frames with header and payload in a single buffer descriptor you won't be able to detect when the header has been sent but rather only when the complete frame has been sent since TxConsumeIndex will increment (with modulo equal to the number of Tx buffers used) and will then be equal to TxProduceIndex again if one frame is sent at a time.

Regards

Mark

Offline martinh

  • Newbie
  • *
  • Posts: 25
    • View Profile
Re: Ethernet DMA for LPC23XX project
« Reply #10 on: January 21, 2013, 10:20:00 AM »
Hi Mark,

the transfer rate has been increased to about 71Mbit/s ( tested with 8 data blocks: 11036 bytes pure payload in 1.23ms).
The checksum of the payload, however, has not been calculated, due to the great amount of time that would be required.
I am quite happy with the result, perhaps that can be improved in the future by arranging the source buffers in a different way.
But for now it is ok.
Your hints have been very helpful!
Thank you very much!

Regards,
Martin
« Last Edit: January 21, 2013, 04:07:33 PM by martinh »