fine tuning network automation

NA 2022.11 on Linux server

Linux server has around 24 GB and we had on-boarded around 600 devices and our target is 1500 devices.

We had restarted NA server last friday and now we are seeing only 5 GB left when we run free -g

Morever we are seeing NA application also shows low memory as seen in below output,

Used Memory (Total-Free): 11640 MB
Free Memory: 327 MB
Total Memory: 11968 MB
Maximum Memory: 11968 MB

Please can you provide if there is any fine tuning parameters we can apply similar to NNM like JVM heap memory allocation and garbage collection.

Parents
  • 0  

    Hi Ramesh,

    So, you mention your NA Core has 24 GB RAM, did you or someone else do the install?  I ask as there is / are some performance steps that are available and curious what might have been done already...

    Performance tuning - Network Automation (microfocus.com)

    Is your NA instance single core or multiple cores?  

    When this happens / happened, what tasks were running?  Anything stuck (running long)?  Do you have any custom tasks (change plans or diagnostics)?  Has this happened more than just this one time?  Like every Monday night, you see this happen on Core 3?  

    Has anyone changed the default task values?  Max Tasks / Max Concurrent Tasks?  Do you have an external DB or is it the embedded one?  

    Also, are the ~600 devices "typical" devices (switches, routers, load balancers, firewalls) or do you have anything that might be more complex (ACI / APIC devices)?  

    Have you looked at the appserver_wrapper.log file?  There may be some useful information that'll point you to a problem.  Perhaps old driver(s) or something else but quite possible you can find the beginning of this bad behavior.  

    Lastly, and this is just from my history:

    1) It's tempting to think that if some memory is good, (tons) memory is better and you throw almost all your memory to JVM - not really a good idea.  Same with increasing tasks.  

    2) Like life, there is a balance here.  You can increase your task numbers but if you do that, then you need to make sure you have JVM set to handle it as well as have the number of DB connections too.  

    3) Small steps and use caution.  Make changes slowly, document what you had and are changing and then test carefully.  You always want to be able to get to prior steps.

    Good luck!

    -Chris

  • 0 in reply to   

    Chris

    Yes NA server has 24 GB, we have only NA and Operations agent running on the server.

    NA instance is running on single core.

    There were no tasks running when the issue happens and there are also no stuck tasks and we do not have custom tasks and this has happened twice.

    Max tasks has been changed from 20 to 30 and other than there is no change done and we have external DB.

    We had only on-boarded only cisco switches so far.

    I also checked the file /opt/NA/server/ext/wrapper/conf/appserver_wrapper.conf where we can configure initial and max JVM memory, but I am not seeing any option for garbage collection like in NNM.

  • 0 in reply to 

    Chris

    Currently we have only 3 tasks running.

    resolve FQDN/Diagnostics/Snapshot

Reply Children
  • 0   in reply to 

    Are your drivers current?  

    OK, how about this - the last time this happened, can you go back and see what task(s) was / were running?  Anything out of the ordinary?

    If you go to /opt/NA/server/log and do this:

     grep -i gc appserver_wrapper.log*

    For example, should look similar to this:

    appserver_wrapper.log:timestamp goes here INFO [stdout] {system/scheduler} [SubTaskExecutorThread] 75 PausableThreadPoolExecutor: Memory low, explict request for GC. -

    what do you get back?  Look at ls -ltr /opt/NA/server/log/appserver_wrapper.log* - what's the oldest date?  

    You say you have three tasks running - do you mean three tasks scheduled or when the problem happened, there were three tasks running or just when you replied, three were running?  

    The low memory issue, normally there's some reason that memory gets run down.  So, you may want to look at when this condition happened and see what was running (what task(s) on what device(s)) and go from there.  There should be some detail to get you started within the appserver_wrapper that'll give you a start.