Unable to crawl site for weeks. Need help?

Struggling to Perform a Site Crawl: Seeking Assistance

Hello everyone,

I’m currently facing an issue as an in-house SEO professional. For the past two weeks, I’ve been unable to crawl my company’s website using either Screaming Frog or SEMrush. I’ve attempted several solutions, but the problem persists. I would appreciate any additional suggestions or insights on what else I might investigate. Here’s what I’ve tried so far:

  1. Reduced the crawl speed on both platforms to just one maximum thread and set the maximum URL/s to 1.0 or lower.
  2. Added my IP address to the whitelist in our CDN provider, Cloudflare.
  3. Whitelisted the user agents for Screaming Frog and SEMrush within Cloudflare.

Any advice or tips would be greatly appreciated!


2 responses to “Unable to crawl site for weeks. Need help?”

  1. It’s frustrating when you can’t crawl your site, especially when the issue persists despite trying several solutions. Here are some additional steps and considerations that might help resolve the issue:

    1. Check Robots.txt

    • Content: Double-check the robots.txt file to ensure that the paths for ScreamingFrog and SEMRush arenโ€™t being blocked. Look for any Disallow directives that may be preventing crawling.
    • Syntax: Ensure that the robots.txt file is correctly formatted and free from syntax errors.

    2. Test with Other User Agents

    • Sometimes, specific user-agent restrictions may cause a problem. Try crawling the site with different user agents (e.g., Googlebot) using a tool that lets you modify the user agent.

    3. Server-side and Firewall Settings

    • Web Server Logs: Check your server logs to see if requests are hitting the server and what response codes are being returned. This might provide insight into any blocks or errors.
    • Firewall Rules: Even with Cloudflare, your server may have its firewall or security module running (like ModSecurity). Check to ensure these aren’t blocking your crawler’s requests.

    4. Cloudflare Settings

    • Rate Limiting: Ensure there are no rate limiting settings configured in Cloudflare that could limit the number of requests from ScreamingFrog or SEMRush.
    • Security Level: Adjust the security level temporarily to see if it resolves the issue. Cloudflareโ€™s security features might be overly aggressive.

    5. Authentication Issues

    • Credentials: If your site requires authentication, ensure that ScreamingFrog and SEMRush are correctly set up with the necessary credentials to access any protected areas.

    6. DNS and SSL

    • DNS Settings: Verify that DNS settings havenโ€™t changed and that theyโ€™re correctly pointing to your server. An incorrect setting might prevent tools from reaching the website.
    • SSL/TLS Issues: Ensure that there are no SSL/TLS issues that might prevent these crawlers from establishing a connection. Check for and fix any SSL certificate problems.

    7. Update Software

    • Ensure that both ScreamingFrog and SEMRush are updated to their latest versions, as older versions may have bugs or compatibility issues.

    8. Local Environment

    • Network and ISP: Check if thereโ€™s anything on your local network (like company firewalls or proxies) that could be blocking outbound requests to your site.
    • **PC
  2. Hi there! It sounds like youโ€™re dealing with a frustrating situation, and I can empathize with the challenges that come with crawling issues. Given what you’ve already tried, Iโ€™d recommend checking a couple of additional areas that might shed light on your problem:

    1. **Robots.txt File**: Ensure that your robots.txt file isnโ€™t inadvertently blocking certain user agents, including Screaming Frog and SEMrush. Itโ€™s worth reviewing this to make sure there are no disallowed directives affecting the crawl.

    2. **Server Response Codes**: Use an HTTP status code checker to see if your website is returning any unusual server response codes across different pages. Errors like 500 (Internal Server Error) or excessive 429 (Too Many Requests) could affect crawling.

    3. **Firewall Settings**: Sometimes, security plugins or firewall settings can block certain crawlers. If youโ€™re using any security or firewall solutions beyond Cloudflare, double-check those configurations.

    4. **Cross-Origin Resource Sharing (CORS)**: If your site serves content from multiple subdomains or external resources, make sure your CORS settings arenโ€™t hindering the crawl.

    5. **Caching Issues**: If your site uses aggressive caching, it might be interfering with how crawlers view your pages. Clearing the cache can potentially resolve any discrepancies.

    6. **Query Parameters**: If your URLs include complex query parameters, you might want to simplify them to see if that helps in allowing the crawlers through.

    If you havenโ€™t

Leave a Reply

Your email address will not be published. Required fields are marked *