ALM 17: 10000 lines in SA log - "queue thread failed Extra information: Param #0: com.hp.alm.platform.db.CTdDbException"

Hi!

ALM 17 is creating every 6 minutes a SA log file with 3 MB. This happens on 5 nodes. It is alway the same message line.

Aug 03
09:18:19.540
Aug 03
09:18:19.554
Cluster Aware Queue N/A N/A N/A ERR ClusterAwareQueueThreadImpl.run(35) queue thread failed Extra information: Param #0: com.hp.alm.platform.db.CTdDbException:
Aug 03
09:18:19.567
Aug 03
09:18:19.582
Cluster Aware Queue N/A N/A N/A ERR ClusterAwareQueueThreadImpl.run(35) queue thread failed Extra information: Param #0: com.hp.alm.platform.db.CTdDbException:
Aug 03
09:18:19.595
Aug 03
09:18:19.609
Cluster Aware Queue N/A N/A N/A ERR ClusterAwareQueueThreadImpl.run(35) queue thread failed Extra information: Param #0: com.hp.alm.platform.db.CTdDbException:

Typically 400 - 500 users are connected and 110 projects have a user connection. The users seams not to be affected and they don't open more tickets than usally. 
I really don't know where I should start. Any ideas are welcome!

  • 0

    To be more precise: we are using ALM QC 17.01 (17.01.0.126)

  • 0

    After switching log leven for SA to debug we found just before each error:

    -- ClusterAwareQueueImpl.handleZombies(1469) calls executeQuery on 'qcsiteadmin_db'

    SQL execution completed in 0ms [1000 rows affected]: /* ~~QC */ SELECT PQ_ID FROM PRIORITY_QUEUE WITH (NOLOCK) WHERE PQ_TASK_TYPE=/*P*/'copy-repository' AND (PQ_PRIORITY IN (/*P*/-1, /*P*/-2)) AND DATEDIFF( millisecond , PQ_LAST_TOUCH, getdate())>/*P*/9223372036854775807

    -- ClusterAwareQueueImpl.handleZombies(1469)

    com.microsoft.sqlserver.jdbc.SQLServerException

    Messages:
    The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.;

    Stack Trace:
    com.microsoft.sqlserver.jdbc.SQLServerException: The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)

  • 0

    Error occurs every 34 - 37 ms

    07:03:20.562
    07:03:20.598
    07:03:20.635
    07:03:20.669
    07:03:20.704

  • Suggested Answer

    0   in reply to 

    Hello QC-team,

    The exception is on the 'datediff' function that relates to MS SQL Server.
    There is a limitation on the input for this function when defined in milliseconds. It can't be bigger than 24 days, 20 hours and 31 minutes.

    Now, all the data in this table would have been deleted after the tasks finished.
    And if the data is too old, then it's useless for ALM.
    If the rows are older than 24 days they are considered Zombies for ALM and can be safely deleted.

    So the recommendation to avoid getting such errors in the logs is to delete these old references out of the PRIORITY_QUEUE table.

    A query like this will help getting such records:

    SELECT * FROM td.PRIORITY_QUEUE WITH (NOLOCK) WHERE DATEDIFF( second ,PQ_LAST_TOUCH,getdate())>/*P*/2073600

    Restart the service on each node of the cluster for changes to take effect and make sure to take a backup of the DB before the manual deletion.


    I will recommend opening a support ticket in case you need further assistance determining why such records were not cleared as expected.

    Tnanks,

    Claudio Ureña

    ALM Software Support

    Although I am an OpenText employee, I am speaking for myself and not for OpenText.

    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button