Author Topic: Debugging exceptions on the SAM7X  (Read 15809 times)

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3239
    • View Profile
    • uTasker
Debugging exceptions on the SAM7X
« on: February 26, 2009, 06:50:27 PM »
Hi All

Sometimes when developing software (and sometimes even after developing but during intensive testing) the processor crashes and one is faced with the challenge of finding out where the elusive bug is hiding in the code (even when sometimes it is hard to accept that it is a bug in the code when one has the feeling that it is more like a gremlin which just somehow has to be accepted).

In such cases nothing really beats connecting a debugger and letting things run until something goes wrong. It is usually best to disable the watchdog in such cases so that the board hangs, the debugger can be paused and we can take a look at what the code is playing at.

In some cases one finds that it is stuck in a forever loop waiting for some hardware flag to be cleared (which is stuck, usually due to a different error which at least becomes obvious now), or due to one of those stupid little mistakes where in this once in a million case it really can never get out. But in many other cases you find the code stuck in an assembler routine at the address 0x00000010 just branching back to itself forever. This is in fact the data abort exception which has occurred because the program has tried to access or manipulate data which it may not (a bad pointer, trying to write to FLASH, or read misaligned memory). All such 'minor' programming mistakes are caught by the ARM's data abort processing and immediately sent to this loop of death.

This is is the depths of hell. The debugger is showing us that there is no hope left; the call stack is empty and we just see a mess of registers.

But the processor is in fact giving us enough information to drag ourselves back out of this swamp of despair, back to the root cause and in many cases showing us quickly just what our program has done incorrectly so that we can, with a little embarrassment, make the necessary fix to this silly little mistake.

So the next time you at at this location in code just try the following simple trick. You may find that it is just that which was missing in your debugging tool-box.

1) The code is looping at 0x00000010 so open the register view in your debugger and check the register CPSR - usually you can expand it to to show the processor's present mode of operation. It will by 0x17 (or decimal 23), corresponding to the processors data abort context.

2) Now look through the various processor register sets until you find the data abort set - usually it is called SPSR_abt with R13_abt and R14_abt. You will see that the contents of these are in fact identical to CPSR, R13 and R14 because these are overlaid due to the fact that the processor is presently in the data abort context.

3) Note the value of R14_abt but subtract 8 from it. This is the address of the instruction which just caused the problem.

4) Edit the program counter (PC or sometimes displayed as R15) with the address that you have just calculated.

5) In CPSR change the MODE value to 0x13 (decimal 19 to return to system mode). You will see that the system mode registers R13 (stack pointer) and R14 and now switched back in.

6) You may need to tell the debugger to display the present program location (if it doesn't automatically do it itself anyway) and, as if by magic, you can see the instruction which was responsible - both in disassemble mode and in source level C-code mode. Since the context has been restored with correct stack pointer you can also see the call stack (where the program came from).

Additional note for Rowley Crossworks users: To ensure that the registers and call-stack are synchronized the following tips may prove useful:
"You either need to click "Debug > Show Next Location" after changing the register in the register window or use "Debug > Locate" to modify the registers.

7) The last manipulation which may be required is to correct the Thumb bit in the CPSR. If the code in which you are in is compiled in thumb mode, set the T bit (it was in ARM mode in the data abort and this will correct it to accordingly).

All register contents will now be as they were when the error occurred and, if you do a single step it will in fact repeat the error and you are back in the loop of death. Alternatively you can try to manipulate the register containing the bad pointer or move the PC to the next line (avoiding it) and let the program continue.

In many cases the reason for the problem become quickly understood. in others it may be caused by the result of a previous error which is not visible at this snap-shot in CPU time, but at least there are most important clues on the detective mission of finding the root cause.

Good luck!

Regards

Mark


« Last Edit: March 25, 2009, 04:01:06 PM by mark »