This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

WebConnector 11.6 StayOnSite

Why does the WebConnector (v.11.6) continue to try to crawl URLs that are not on the start point when I have StayOnSite=TRUE? I have even gone to great lengths to create SpiderUrlCantHaveRegex patterns to get these URLs excluded but the WebConnector continues to attempt to contact the URLs through the proxy server.

  • 0  

    This might be an issue for our support team, ideally with sharing logs and cfgs. Here is some feedback that I got though:

    The best GUESS is that the connector is behaving in that it is sticking to pages from that web site but because you are looking at proxy traffic, you see other URLS being downloaded.  We embed Chrome and that will download other bits from other sites in order to form a complete page.  It won’t be saving those bits though.

    OpenText Community Manager
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button

  • 0 in reply to   

    I think it would be a good idea to add a parameter to be able to instruct the WebConnector to "absolutely stay on site" and not attempt to contact any "outside" URLs. Do you agree that parameter would be useful?

  • 0 in reply to 

    This is also occurring on v12 web crawlers.

    I've added "spider must/cant...", "must have" and "cant have" options and those sites still get hit.  It slows everything down.

  • 0 in reply to 

    If you're certain that this is the result of following links, I'd recommend opening a support ticket. While the answer regarding the proxy is certainly a possibility, it's not necessarily established functionality.

    If, however, the specific items being requested are things like images, Javascript, or other resources that may be referenced in the HTML, but not necessarily in a <a href="..."> hyperlink, then you may want to look at ResourceUrlMustHaveRegex to prevent attempts at loading these. (This was added in 12.4.)