Author Topic: iInterruptLevel problem  (Read 9065 times)

Offline dkg

  • Newbie
  • *
  • Posts: 48
    • View Profile
iInterruptLevel problem
« on: January 05, 2011, 09:05:52 PM »
Hi Mark,

In our code, I added a suicide loop should we ever enter uEnable_Interrupt with iInterruptLevel equal to 0 (to catch problems like you do when compiling for the Windows simulator). This code is only compiled and used during our development cycle. We recently discovered a condition where the coldfire crashes and I can reproduce it at will now. When doing the test with development code, it ends up in the previously described suicide loop. I am not yet sure where it crashes in release code since I can't easily run the debugger on it and we have no post-mortem dump capability (yet). I am assuming at this point the crash in our release code is related to iInterruptLevel problem since the same sequence of events causes both problems.

So I went back in history to see when this problem first appeared. After trying many release versions of our code, I discovered it appears at the point I merged in the update to uTasker v1.4.2. Our previous code was based on v1.4 but I needed newer code at the time to support USB features. I have gone through a lot of diffs to check our changes against your release but have been unable to locate the problem so far. I turned off all USB code and other sections not relevant to the error condition and I can still get it to fail in code based on v1.4.2.

There were a lot of changes to uTasker between v1.4 and v1.4.2. I was wondering if you had heard of or seen this kind of problem since v1.4. Any ideas on how to track down where the mismatched interrupt enable/disable might be happening? Unfortunately, it would not be practical to try to reproduce this issue with your demo application code running. The only thought I have at this point is to squirrel away the last n return addresses on entry into both routines and look at the "trace" at the point of failure.

Any other ideas would be appreciated.

Dave G.

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Re: iInterruptLevel problem
« Reply #1 on: January 05, 2011, 11:49:45 PM »
Hi Dave

I haven't heard of problems with symmetical enabling and disabling.

There are two potential reasons for this to happen:
1) The code is really enabling too often (in this case simply not decrementing iInterruptLevel when it is already 0 may avoid any further problems, but it may be that it is doing it too soon at a certain location(?)). To detect where it happens is probably not that easy but I woudl expect it to be caught in the simulator if the same code sequence is exercised.
2) The iInterruptLevel count is getting corrupted. Either by variable overwrite ot by a NMI.
The variable is usually never changed with interrupts enabled and so only an NMI level interrupt which is doing something that manipulates the variabel calling either uEnable_Interrupt() or uDisable_Interrupt() as sub-routine to a driver function, for example. Could this be the case since I remember you using NMI in the past?

Regards

Mark


Offline dkg

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: iInterruptLevel problem
« Reply #2 on: January 06, 2011, 02:47:40 PM »
Hi Mark,

Yes, I was forced to use IRQ7 for an interrupt but, based on the discussion we had back then, the only thing that interrupt service routine does now is increment a global variable and call uTaskerStateChange to activate the handler task.

I have a couple more things to try and I'll let you know what I find out. As a stop-gap measure, I'll try your idea of changing the enable routine to be more "permissive" but that is only a band-aid until I find the real problem.

Thanks,
Dave G.

Offline dkg

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: iInterruptLevel problem
« Reply #3 on: January 10, 2011, 10:11:48 PM »
Hi Mark,

I found the problem. It turns out it is related to the use of IRQ7. The problem is the code in M5223X.c for the IRQ7 interrupt routine calls fnIRQ like all the other interrupt handlers. fnIRQ does a set and clear of iInterruptLevel around the call to the registered interrupt handler. This set/clear is a problem since IRQ7 may have interrupted while we already have interrupts disabled. When it clears iInterruptLevel and then the interrupted routine finally attempts to enable interrupts again, the "count" is off. This gets back to a question I had a long time ago of whether we should be incrementing/decrementing iInterruptLevel. I understand your logic for it being equivalent since interrupts don't normally happen when interrupts are disabled. Unfortunately, that isn't the case for IRQ7.

So, what is the proper way to fix this issue? For now, I changed fnIRQ to increment/decrement the counter but maybe we should only do that for the _irq7_handler?

Dave G.

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Re: iInterruptLevel problem
« Reply #4 on: January 10, 2011, 10:49:07 PM »
Hi Dave

I understand the problem and suggest the following change:

   10.01.2011 Handle IRQ7 specially to avoid modification of iInterruptLevel counter {133}

// IRQ7 has dedicated NMI level and so should not call any driver routines which require protected regions.
// The flag iInterruptLevel is not modified since this routine can also not be interrupted and doesn't need to control the value
// due to the fact that it doesn't call any code with protected region entry/exit requirements (this avoids the risk of
// disturbing lower priority interrupts that may be in a protected region and avoids related counter corruption)
//
static __interrupt__ void _irq7_handler(void)                            // {133}

    EPFR0 = 0x80;                                                        // clear interrupt flag
    if (eport_handler[6]) {
        (eport_handler[6])();                                            // call the handler, if available
    }
}


The logic of setting 1 and 0 only is based on the fact that the interrupts will only (normally) be called when the value was previously 0 and therefore only leave when the value is 1. Therefore this is equivalent to incrementing and decrementing, but is a little more efficient (saves maybe 2 instructions or so).

Since this makes no sense for the NMI (which may not use routines that share resources that need to be protected) there is also no need to do anything with the counter - therefore I changed it to not call the generic handling code but just do its own call instead.

Furthermore, the increment/decrement does solve the simple situation when the NMI interrupts the IRQ (at the end of the IRQ the value is correct again), but it doesn't solve the matter. There is still the risk that the NMI takes place when the IRQ is in the "process" of incrementing or decrementing the counter. This of course occurs much less frequently but will take place at some time and then result in a much rarer error, but with potentiall equally serious consequences.

By completely removing any counter manipulation it is fully protecting this variable and so is probably the only completely safe thing to do.

Regards

Mark


P.S. Edited the code slightly since the iIRQ in the function array was obviously wrong (now 6)
« Last Edit: January 11, 2011, 12:40:07 AM by mark »

Offline dkg

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: iInterruptLevel problem
« Reply #5 on: January 10, 2011, 11:10:59 PM »
Hi Mark,

That was going to be my alternate suggestion ;) I'll change my code to match this modification.

Thanks,
Dave G.