Hi Martin
The NXP examples always use the Ethernet SRAM area for buffer descriptors and Ethernet data. I think that the first test that needs to be made is to see whether the Ethernet module can work with different areas (the Ethernet area may be tightly coupled and so more efficient but maybe it can still work from other areas, inclding external RAM. To do this, the Ethernet buffer descriptors can be left as they are but the actual data buffers located to somewhere else:
fnConfigEthernet() -
EMAC_TxDescriptor = (unsigned long)ptrBd; // inform the MAC where the first transmit buffer is located
while (i--) { // initialise the buffer descriptors
ptrBd->ptrBlock = ptrBuffer; // initialise the buffer descriptors
ptrBuffer += pars->usSizeTx;
ptrBd++;
}
The ptrBd->ptrBlock = ptrBuffer; points the buffer descriptors to the bufrfer location and overwriting these with a different pointer would allow this to be tested. Beware however of the alignment requirements for memory;
the buffers need to be long word aligned.
fnGetTxBufferAdd() is used to locate where the data is to be written to so the rest should be automatic.
If this however doesn't work it means that there is a physical restriction (I didn't see in the data sheet that this was specifically noted) and so the Ethernet SRAM limits the total size.
When sending UDP at maximum speed it is also worth looking at the scatter-and-gatter capabilities of the Ethernet DMA operation.
The uTasker output uses a linear buffer which is the simplest method and is also compatible for all Ethernet controller types (some use buffer descriptors, some use FIFOs). It is possible to avoid some data copies if the header and payload are not moved to the buffer but instead the payload is left where it originately from (if this is possible). This involves building the IP/UDP header in one buffer and then configuring the Tx to send just the header directly from where it is located in memory with any restrictions as to area and alignment already accounted for. Then the payload is chained (again assuming that its location is suitable) to take place after the first transfer has terminated. The frame is then sent as a single frame buf with content from two individual buffers.
This does have down sides to it in that the data has to be held stable until it has been sent (added complications and possible extra memory overhead to avoid it and still keep highest speed) and the extra general driver complications. In a special case this should however be considered when absolute fastest throughput is to be strived for.
Generally I prefer to keep the Tx details as simple as possible (also for compatibilty reasons). The highest overhead for the processor tends to be checksum calculations and using a processor with checksum offloading (like STM32 or Kinetis - see comparison here:
http://www.utasker.com/docs/uTasker/uTaskerBenchmarks.PDF ) can achieve a high performance increase without needing to deviate from a general-purpose solution.
If you can however use just UDP without checksum operation this logic is not quite so correct...Regards
Mark