NA 2023.05 Linux lost of ssh connection when trying to pull device configuration

I'm having an issue with a new deployment of NA 2023.05 on RHEL 8 VMs. Error message:

NA 2023.05 receives error: Attempt to retrieve data from device failed: Task thread was interrupted. When the snapshot starts it connect using the svc_na account using SSHv2 port 22. The account logs in and connects with the password. The device can be accessed and it's building the configuration and it fails when retrieving the configuration. We are also having problems finding on the server where the session logs are located to analyze more information. The driver packs have been updated to the latest version.

 Thanks,  Jim

Parents
  • 0  

    Hi Jim,  

    So, I have a few questions for you that hopefully will get you what you're looking for.  

    Let's start with the logs first. When you say session logs, are you talking about the session log box that you can check for a task?  That's in the DB.  If you do a task report and look at that task's result, you'll see it.  

    If you want it in a file (log), then you could either turn up logging globally or for that single device / task (device/session -> trace).  Then it will be in the appserver_wrapper.log or the generated task log for the single device.  

    The task specific log would be in ../NA/server/ext/appserver/standalone/log/Task Type Name task id #### on device ID ###.task.log

    If you have more than one NA Core, just make sure you to go the core that ran the task, same if you do a download of the troubleshoot.zip.  

    Now, the take snapshot issue.  First question - how long is it saying the task is taking, or more to the point, is it taking > 60 minutes (and this is if you or someone hasn't changed the default max task timer).  

    Do you mind if I ask what device type this device is?  Do you have others and if so, are they all failing or just this one device (of many) is failing?  

    Do you see anything when you look at the task (view session log), specifically, do you see NA logging in and then executing commands?  Can you see what the last thing was?  Was it trying to get the config via a transport method (scp, ftp, sftp)?  If so, do other devices use these methods and if not, have you configured this in NA?  

    As a work-around, you could try editing the device and only selecting CLI / ssh to get the config (uncheck scp and other methods) and see if the task is successful (granted, this may be a device where you need a method, but for say IOS, this should work).  If that did work for this device task, then perhaps look at how NA is configured (scp / sftp / ftp) or make sure that there's no issues with firewalls or such.  

    Hope this helps,

    Chris

Reply
  • 0  

    Hi Jim,  

    So, I have a few questions for you that hopefully will get you what you're looking for.  

    Let's start with the logs first. When you say session logs, are you talking about the session log box that you can check for a task?  That's in the DB.  If you do a task report and look at that task's result, you'll see it.  

    If you want it in a file (log), then you could either turn up logging globally or for that single device / task (device/session -> trace).  Then it will be in the appserver_wrapper.log or the generated task log for the single device.  

    The task specific log would be in ../NA/server/ext/appserver/standalone/log/Task Type Name task id #### on device ID ###.task.log

    If you have more than one NA Core, just make sure you to go the core that ran the task, same if you do a download of the troubleshoot.zip.  

    Now, the take snapshot issue.  First question - how long is it saying the task is taking, or more to the point, is it taking > 60 minutes (and this is if you or someone hasn't changed the default max task timer).  

    Do you mind if I ask what device type this device is?  Do you have others and if so, are they all failing or just this one device (of many) is failing?  

    Do you see anything when you look at the task (view session log), specifically, do you see NA logging in and then executing commands?  Can you see what the last thing was?  Was it trying to get the config via a transport method (scp, ftp, sftp)?  If so, do other devices use these methods and if not, have you configured this in NA?  

    As a work-around, you could try editing the device and only selecting CLI / ssh to get the config (uncheck scp and other methods) and see if the task is successful (granted, this may be a device where you need a method, but for say IOS, this should work).  If that did work for this device task, then perhaps look at how NA is configured (scp / sftp / ftp) or make sure that there's no issues with firewalls or such.  

    Hope this helps,

    Chris

Children
  • 0 in reply to   

    Thanks Chris, I checked the /opt/NA/server/ext/appserver/standalone/log/server.log and saw there's an issue with the " Error executing SCP get - Failure executing SCP command. The tasks is not taking more than 3 - 40 seconds to run. I can see it logging in, collecting data and building the configuration file but not pulling it back.  The devices that mainly are having issues are Cisco Nexus and ASR devices.  The scp/sftp/ftp setting are configured to match our older NA 2020.08.   Thanks