This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

404 page not removed from database

I have a page that is in our idol database that was removed from the system.  When the webconnector crawls that page it sees a 404 but doesn't remove it. This is the log from the webconnector. 

What am I missing?

Below is the config

08/04/2020 09:19:34 [416] 70-Error: NTRSIMMEDIATE: Root url not found: www.xxx.com/.../but-has-now-returns-404 08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:End load url: www.xxx.com/.../but-has-now-returns-404 08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Storing shared cookie jar cookie: __cfduid ; Domain=.app-ab23.marketo.com ; Path=/ 08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Storing shared cookie jar cookie: bm_sv ; Domain=.northerntrust.com ; Path=/ 08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Storing shared cookie jar cookie: __cfduid ; Domain=.onetrust.com ; Path=/ 08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:HTTP Status: 404 08/04/2020 09:19:34 [273] 10-Full: NTRSIMMEDIATE: WKOOP 2f0294f0c6ec187458d05265a17690f5 task complete 08/04/2020 09:19:34 [414] 10-Full: NTRSIMMEDIATE: WKOOP:Waiting for task 08/04/2020 09:19:34 [414] 70-Error: NTRSIMMEDIATE: Root url not found: www.xxx.com/.../but-has-now-returns-404 08/04/2020 09:19:34 [18] 10-Full: NTRSIMMEDIATE: Finished processing depth: 0, 1 pages 08/04/2020 09:19:34 [18] 10-Full: NTRSIMMEDIATE: 0 unseen urls, 0 documents removed

 

config

[immediate] Url0=www.xxx.com/.../but-has-now-returns-404 IngestEnableDeletes=true StayOnSite=true IndexDatabase=xxx Depth=0 SpiderUrlMustHaveRegex=.* UserAgent=IDOL12-immediate

 

  • Verified Answer

    0

    The error is occurring here because this isn't just a page that exists in the database, but is the configured root document for the task. The root URLs are the task's entry point into the site, and always need to be reachable; otherwise, there's no way to differentiate this from a misconfigured task, or some sort of failure with the site itself. A task without valid root URLs isn't viable, so the connector isn't even getting to the point where it will fully start up the task, let alone attempt to issue deletes.

    Was this the original connector and task that indexed the document? Even if the task was able to start, deletes are issued only for documents that it has previously encountered, based on the contents of its datastore DB. If that's missing, the connector won't be aware that it's responsible for the IDOL document in question, and you may need to delete it manually from IDOL (or the Content Engine).