Hi Tim
My feeling is that the PHY is not always starting in the correct mode. There were difficulties with the ATMEL and the Micrel (see
http://www.utasker.com/forum/index.php?topic=161.0 ) but this was due to the fact that the ATMEL has active pull-ups on its GPIOs out of reset. It is necessary to actively drive the lines and command a second PHY reset.
The LPC23XX doesn't have pull ups (as far as I am aware) and so the PHY address and mode are defined purely by the hardware on the board. The PHY address is 0x01 as defined by the pull-up/pull-downs in the Micrel itself.
There are some other pins which define the operating mode which are very important. There is one (I don't remember what it is called but I did once have a problem with it) which defines the clock source for the chip; whether its local oscillator is stopped or used for the bus clock. This again is determined by pull-up/down. None of this can be influenced by the LPC23XX on the Olimex board because the Micrel Reset input is connected to the LPC23XX reset input and this can not be driven.
I am wondering whether the PHY clock is not always starting (?) or a mode is not always being latched reliably. I have to admit that I can't image how the use of the JTAG would influence this.
It may be an idea to remove all PHY related code to see what happens then. The PHY initialisation is not absolutely critical because it will normally power up to at least do something (the RMII mode is for example controlled by R16 on the Olimex board) - it would at least not be able to hang waiting on the PHY.
In order to test the PHY communication I would try to read the PHY identifier from every possible PHY address (0x01..0x1f) in a loop - this would require modifying the read routine to allow the PHY address to be passed as a variable rather than being fixed. This may for example show if the PHY address which is being assigned is 'jumping' around.
As mentioned above, the PHY address can not be influenced by the SW and is purely HW related. Presently I don't see that there is any error in the code which would make the interface unreliable. On my Olimex board I can't detect any problems since it starts always with every build.
My hope is that you can completely remove any PHY initialisation and live with the situation. I also wonder whether this problem will be found on other boards or if it is restricted to yours?
Regards
Mark