Author Topic: SNMP Traps not sending timely and/or locking up, appears to be ARP problem  (Read 2630 times)

Offline Ray

  • Newbie
  • *
  • Posts: 17
    • View Profile
Hi Mark,
I really thought I had the SNMP traps working, but somehow It is again failing.  To troubleshoot, I temporarily added 7 MIB spots for the ARP table as a debug because CLI isn't available.  On the included wireshark filtered capture  the first entry is ARP to the SNMP manager, and my 2 test traps flow freely   .  then appx 75 seconds in it appears to be an ARP refresh?
then traps are locked out   then 5 minutes later a gateway ARP refresh? then 5 minutes later traps begin for a short time, then ARP blocking again.  Eventually the system locks up with this - but my Modbus polling is good and SNMP GET communication is unaffected?   Just traps.

Since this is very odd behavior, I'm guessing that my method for sending traps is the problem. 
However, my ARP table is full of unwanted entries despite having ARP_IGNORE_FOREIGN_ENTRIES defined.



so I think I've got 2 questions:
1)  Is the ARP_IGNORE_FOREIGN_ENTRIES  supposed to ignore all random ARP requests on the network?  It doesn't appear to do this based on the ipaddresses in the ARP table

2)  Do SNMP traps need to be implemented in a specific way so that they get an ARP notification?  do they need to be called within the snmp void fnSNMP(TTASKTABLE *ptrTaskTable)??


Thanks
Ray




in my MIB handler, the extern unsigned char fnInitialiseSNMP(void)  ( called in Application.c - in my fnPIT_timerTask  every 5 seconds)

        if (ulCount_5000ms) --ulCount_5000ms;
        else
        {
          ulCount_5000ms = 500;
          if (SnmpStarted == 0)
          {
           if( fnInitialiseSNMP() == 1) SnmpStarted = 1;
          }
          else
          {
            if (SendColdStartTrap == 0)
            {
                fnSendSNMPTrap(SNMP_COLDSTART, 0, 1);//ALL_SNMP_MANAGERS);
                SendColdStartTrap = 1; 
            }
            else// cold start has been sent
            {
              fnSendSNMPTrap(SNMP_COLDSTART, 0, 1);//  test with just the easy to send trap
            }
          }


Offline Ray

  • Newbie
  • *
  • Posts: 17
    • View Profile
I should add I appear to be getting the same result in the simulator.  This is where my debug effort currently is.   

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Hi Ray

In fnHandleARP_response()

you could try removing the registration of received ARP requests that were not destined to your IP address as follows, which may help reduce the ARP entries that you are not interested in.

    else {                                                               // it was not an ARP to our IP address but we can still add it to our table or refresh the entry
    #if !defined ARP_IGNORE_FOREIGN_ENTRIES
        if (uMemcmp(ucRequestingIP, cucBroadcast, IPV4_LENGTH) != 0) {   // ignore broadcasts
            fnAddARP(ucRequestingIP, ucRequestingMAC, &arp_details);     // {14}
        }
    #endif
    #if defined USE_IP_STATS
        fnIncrementEthernetStats(SEEN_FOREIGN_ARP_FRAMES, _NETWORK_ID);  // update statistics for foreign addresses
    #endif
    }


ARP entries will probably not be an issue though.


Looking at the wireshark recording I see that the traps are initially sent to 172.22.1.94

15mins later I see that there are ARPs being sent out to resolve the address 172.22.0.1, which are not answered. These maybe due to the traps that you are trying to send but may be due to other causes.

In any case you need to find out why the ARP (assuming associated with the trap after a certain time) is presumably being sent to a different address, potentially on a different sub-net, since this is possibly the issue.
Since you are working with the simulator this should be quite easy to do:
- set a break point in the ARP transmission when it starts to see how the destination address is being defined.
- let it run until the problem starts
- enable the break point again to see what is causing the ARP to be sent (possibly the trap want to send data) and compare how the destination address is being defined. I expect you'll find a difference that will explain why it stops working after a certain time.
- Traps can be sent form anywhere and don't need to be called in the SNMP task.

Regards

Mark

Offline Ray

  • Newbie
  • *
  • Posts: 17
    • View Profile
Hi Mark,
Yes, I believe ARP is not the root cause, rather, it is possible the ARP process is just a little overwhelmed in my noisy network envornment.
The ARP request for 172.22.0.1 is for the gateway, probably for NTP - I've disabled this for now.


I mistakenly changed one of the ARP_IGNORE_FOREIGN_ENTRIES  in static void fnSendARP_response(ARP_INPUT *ptrArpInput)
On line 682, if we received our own ARP request, it wasn't being added, I have fixed this.
What I meant to comment out was your suggestion in fnHandleARP_response()
That is now commented out with the preprocessor, however it didn't fix the problem of extra ARP entires.

Additionally, there is an instance of fnAddARP() located in ip.c  line 688
This was adding all the misc ARP's it received, I have diabled with the preprocessor, now my ARP table has 3 and only 3 entries (+ broadcast)

This didn't resovle my trap problem. The symptom was trap manager 1 worked but 2 or 3 didn't.   
As always your amazing debugger to the rescue and discovered in static int fnSendTrap() function, line 1099 fnSendUDP()  has extra information OR'd into the SocketHandle and would fail the first check.   

Commenting this allows traps beyond manager 1 to send /* | ptrSNMP_manager_details[iManagerRef].snmp_manager_details | ((iManagerRef & USER_INFO_MASK) << USER_INFO_SHIFT)*/
I have no idea what those values are for, but commenting them out allows my 3 managers to receive traps.

if (fnSendUDP((USOCKET)(SNMPSocketNr /* | ptrSNMP_manager_details[iManagerRef].snmp_manager_details | ((iManagerRef & USER_INFO_MASK) << USER_INFO_SHIFT)*/)  ,(unsigned char*)ptrSNMP_manager_details[iManagerRef].snmp_manager_ip_address, SNMP_MANAGER_PORT, (unsigned char*)&UDP_Message.tUDP_Header, (unsigned short)iNewLength, OWN_TASK) == NO_ARP_ENTRY)

which failes the first check of fnSendUDP(USOCKET SocketHandle, unsigned char *dest_IP, unsigned short usRemotePort, unsigned char *ptrBuf, unsigned short usDataLen, UTASK_TASK OwnerTask)

    if (_UDP_SOCKET_MASK(SocketHandle) > UDP_SOCKETS) {                  // {7}
        return INVALID_SOCKET_HANDLE;
    }
   
Cautiously, all is well...I'm running a 5 day blast on coldstart traps to make sure we don't bog down.

Thank You
Ray

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Hi Ray

In ip.c I see that only ARPs directed directly to your IP address are entered so I am not sure that it is appropriate to remove that when ARP_IGNORE_FOREIGN_ENTRIES is enabled.
However the ARP table should not be critical since the worst thing that can happen is that a resolution needs to first be performed when the destination is not yet known.
You can also increase the size of the ARP table so that entries don't need to be deleted when new ones are entere but there is not space for all.

Regards

Mark


Offline Ray

  • Newbie
  • *
  • Posts: 17
    • View Profile
Hi Mark,
Well, I'm drilling down and feel I'm close to the root problem, I must be missing something important.

The ARP appeared to be contributing to the problem because it changed so frequently in my noisy environment. 
Locking down the ARP to only required entries from requests seemed to solve the problem however, my system failed in the middle of the night.  Modbus worked, SNMP GET worked, UDP broadcast worked.  Everything but traps. 

With another solid day of debugging with the simulator, I think I've reached a conclusion: 
TASK_SNMP is not in the taskTable and unable to process the trap queue. extern void fnSNMP(TTASKTABLE *ptrTaskTable)
This is how the trap queue draws down, and resends when the message from ARP arrives.  This task is never called.

I added //uTaskerStateChange(TASK_SNMP, UTASKER_ACTIVATE); to my period timer and it falied to find a TASK_SNMP entry
I tried to follow extern TASK_LIMIT uTaskerStart()  these are the tasks it loaded

wdog
ARP
Eth
TCP
usb
O-MOD
app
maintenance
MassSt
DHCP
dNS
period
IGMP
NetInd
keeper
1
2
3
4
5 mine
6 mine
7 mine
8 mine
9 mine
lowPower
NULL

I'm not sure why SNMP doesn't get processed into the task array?

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Ray

Check that TASK_SNMP is in ctNodes[] in TaskConfig.h

In addition, make sure that you have no new task with the same name.

    {"MsnMp",     fnSNMP,       SMALL_QUEUE, (DELAY_LIMIT)(NO_DELAY_RESERVE_MONO), 0, UTASKER_STOP},


#define TASK_SNMP               'n'                                      // SnMP protocol task

This looks wrong - TASK_SNMP should match the first character of the task's string name.

Try with

    {"nsnMp",     fnSNMP,       SMALL_QUEUE, (DELAY_LIMIT)(NO_DELAY_RESERVE_MONO), 0, UTASKER_STOP},

instead!

Regards

Mark


Offline Ray

  • Newbie
  • *
  • Posts: 17
    • View Profile
Hi Mark,
I stared at that 'M' and while it didn't set off the right alarms for me, it did seem odd.  Yes, that fixed my SNMP_TASK problem and now I am able to use the fnRetry(socket,n) to pull from the queue.

Ah, now that worked, but...  Yes a little more.
This makes sense now:
unsigned char ucSNMP_manager = ((socket >> USER_INFO_SHIFT) & USER_INFO_MASK); // extract the snmp manager information

socket is a 5 bit (max 31 sockets, no socket 0)
Needed to extract
        #define USER_INFO_SHIFT      5
        #define USER_INFO_MASK       0x03                                // 4 users supported
        #define SOCKET_NUMBER_MASK   0x1f                                // single network and interface with up to 4 user functions (USOCKET can be single byte width)
      
      
This is the root cause of my fnSendUDP failure too.   I believe you meant to do something with this preprocessor directive _UDP_SOCKET_MASK(SocketHandle); 
in tcpip.h   (487)    #define _UDP_SOCKET_MASK(uSocket)         (uSocket)
however, it doesn't do anything and allows the top 3 bits to shine through.

I actually solved this by doing this instead:
SocketHandle = (SOCKET_NUMBER_MASK & SocketHandle);

extern signed short fnSendUDP(USOCKET SocketHandle......)
UDP_TAB *ptrUDP = tUDP;
  SocketHandle = (SOCKET_NUMBER_MASK & SocketHandle);
  if ((SOCKET_NUMBER_MASK & SocketHandle)> UDP_SOCKETS){ //(_UDP_SOCKET_MASK(SocketHandle) > UDP_SOCKETS) {                  // {7}
      return INVALID_SOCKET_HANDLE;
  }
  if ((uMemcmp(dest_IP, cucNullMACIP, IPV4_LENGTH)) == 0) {
      return(INVALID_DEST_IP);
  }
      
   By the time we are in fnSendUDP()  no other info seems necessary,
   
   if somehow I haven't thought this completely through, I think you might want me to update the #define _UDP_SOCKET_MASK(uSocket)  (uSocket&SOCKET_NUMBER_MASK)
   
   At this point my traps on all 3 managers are being received without me resorting to forcing any ARP requests outside of the normal ARP algorithm.

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
Hi Ray

    #if IP_INTERFACE_COUNT > 1 || IP_NETWORK_COUNT > 1                   // {74}
        #define _TCP_SOCKET_MASK_ASSIGN(uSocket)  (uSocket &= (SOCKET_NUMBER_MASK))
        #define _TCP_SOCKET_MASK(uSocket)         (USOCKET)((uSocket) & (SOCKET_NUMBER_MASK))
        #define _UDP_SOCKET_MASK_ASSIGN(uSocket)  (uSocket &= (SOCKET_NUMBER_MASK))
        #define _UDP_SOCKET_MASK(uSocket)         (USOCKET)((uSocket) & (SOCKET_NUMBER_MASK))
    #else
        #define _TCP_SOCKET_MASK_ASSIGN(uSocket)
        #define _TCP_SOCKET_MASK(uSocket)         (uSocket)
        #define _UDP_SOCKET_MASK_ASSIGN(uSocket)
        #define _UDP_SOCKET_MASK(uSocket)         (uSocket)

    #endif


When used in a single network/single interface environment the masks become dummy.
However I see that when SNMP is used a further USER_INFO_MASK field is used (in order to control multiple SNMP managers).
My suggestion is that the use of the mask for UDP sockets (not TCP sockets) be forced with:

    #if ((IP_INTERFACE_COUNT > 1) || (IP_NETWORK_COUNT > 1))             // {74}
        #define _TCP_SOCKET_MASK_ASSIGN(uSocket)  (uSocket &= (SOCKET_NUMBER_MASK))
        #define _TCP_SOCKET_MASK(uSocket)         (USOCKET)((uSocket) & (SOCKET_NUMBER_MASK))
        #define _UDP_SOCKET_MASK_ASSIGN(uSocket)  (uSocket &= (SOCKET_NUMBER_MASK))
        #define _UDP_SOCKET_MASK(uSocket)         (USOCKET)((uSocket) & (SOCKET_NUMBER_MASK))
    #else
        #define _TCP_SOCKET_MASK_ASSIGN(uSocket)
        #define _TCP_SOCKET_MASK(uSocket)         (uSocket)
        #if defined SER_INFO_MASK
            #define _UDP_SOCKET_MASK_ASSIGN(uSocket)  (uSocket &= (SOCKET_NUMBER_MASK))
            #define _UDP_SOCKET_MASK(uSocket)         (USOCKET)((uSocket) & (SOCKET_NUMBER_MASK))
        #else

            #define _UDP_SOCKET_MASK_ASSIGN(uSocket)
            #define _UDP_SOCKET_MASK(uSocket)     (uSocket)
        #endif
    #endif
#endif


Thanks for identifying the issue - presumably I always had a mult-network environment when working with SNMP to have missed this.

Regards

Mark
« Last Edit: August 03, 2023, 09:18:18 PM by mark »

Offline Ray

  • Newbie
  • *
  • Posts: 17
    • View Profile
#if defined defined USER_INFO_MASK   
Actually should be
#if defined USER_INFO_MASK

The copy/paste demon wins again  :)

I'll update and test

Thanks
Ray

Offline mark

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3236
    • View Profile
    • uTasker
I corrected the double "defined" in the previous post (which was of course incorrect).
Regards
Mark