NA 2019.05 Task load

Hello,

we have this current situation in NA

Looking in task logs we see that they are waiting because the max number of concurrent tasks has been reached. how is it possible if no tasks is running at the moment ?

Thanks and regards,

Giacomo

  • 0  

    Hi Giacomo,

    I know there have been a few issues with stuck tasks with the older versions of NA.  

    On core 1, do you see an event for memory?  If so, you can try to connect to the proxy and do a "run gc" and this might help.  

    Other option, look at the queued tasks and try to cancel a few or cancel all and re-run the cancelled task and see if runs cleanly.  

    Can you say if your NA configuration has devices bound to a core or not?  

    thanks,

    Chris

  • 0 in reply to   

    Hello Chris,

    doing top on the server this is the result

    KiB Mem : 32761312 total, 226768 free, 31490604 used, 1043940 buff/cache
    KiB Swap: 25165820 total, 18850608 free, 6315212 used. 836564 avail Mem

    i have deleted all waiting task and i'll wait tomorrow to see if the scheduled tasks are running as expected.

    It seems all devices are bound to core1, this explains why core2 was doing nothing

  • Suggested Answer

    0   in reply to 

    Hey, can you check a few things when you look tomorrow:

    1) Do an event search and add these options:

    Select fields:

    Event Date, Summary, Added By, Description

    Search Criteria

    • Date since 1 month ago
    • Summary equals 'Monitor Error' OR 'System Health'

    Looking to see if NA is running out of memory on a core.  

    2) Check admin / Distributed / Core List - you see both / all your cores, they show as Running: Fully Functional?

    3) Check /opt/NA/jre/distributed.rcx (on each core - should be the same but worth checking) - this will show the task / core / device behavior.   (FYI - 24.4 it gets much better if you can switch away from core binding)

    That said, if you need to continue to do core binding and you want to see what core each device is tied to, there is a way to do this.  

    Result would be a field similar to below so when you look at the device home page, you'll see the core # that the device is tied to (since this shouldn't change, it's a set and forget).  This wouldn't really make sense if you float between cores.   Just requires a bit of code and a custom data field.  

    4) Run a task search with these options:

    Selected fields:

    just add "Core" to defaults

    Search Criteria

    • Schedule Date since 48 hours ago (you can change this depending on the size of environment) - looking to see if you have tasks on other cores or not.  

    It's definitely possible that core 1 got bogged down and other core(s) were done with their tasks.  This is where 24.4 can help as other cores can take over when they are free.  Now, unsure how well that'll work if there's a bug, but in general, it is nice.  

    5) Have you changed any of the Task options (controlling max / concurrent tasks) or similar?  Have you edited the conf file to alter memory settings?  

    Hope this helps,
    Chris