Struggling to Perform a Site Crawl: Seeking Assistance
Hello everyone,
I’m currently facing an issue as an in-house SEO professional. For the past two weeks, I’ve been unable to crawl my company’s website using either Screaming Frog or SEMrush. I’ve attempted several solutions, but the problem persists. I would appreciate any additional suggestions or insights on what else I might investigate. Here’s what I’ve tried so far:
- Reduced the crawl speed on both platforms to just one maximum thread and set the maximum URL/s to 1.0 or lower.
- Added my IP address to the whitelist in our CDN provider, Cloudflare.
- Whitelisted the user agents for Screaming Frog and SEMrush within Cloudflare.
Any advice or tips would be greatly appreciated!
2 responses to “Unable to crawl site for weeks. Need help?”
It’s frustrating when you can’t crawl your site, especially when the issue persists despite trying several solutions. Here are some additional steps and considerations that might help resolve the issue:
1. Check Robots.txt
robots.txt
file to ensure that the paths for ScreamingFrog and SEMRush arenโt being blocked. Look for anyDisallow
directives that may be preventing crawling.robots.txt
file is correctly formatted and free from syntax errors.2. Test with Other User Agents
3. Server-side and Firewall Settings
4. Cloudflare Settings
5. Authentication Issues
6. DNS and SSL
7. Update Software
8. Local Environment
Hi there! It sounds like youโre dealing with a frustrating situation, and I can empathize with the challenges that come with crawling issues. Given what you’ve already tried, Iโd recommend checking a couple of additional areas that might shed light on your problem:
1. **Robots.txt File**: Ensure that your robots.txt file isnโt inadvertently blocking certain user agents, including Screaming Frog and SEMrush. Itโs worth reviewing this to make sure there are no disallowed directives affecting the crawl.
2. **Server Response Codes**: Use an HTTP status code checker to see if your website is returning any unusual server response codes across different pages. Errors like 500 (Internal Server Error) or excessive 429 (Too Many Requests) could affect crawling.
3. **Firewall Settings**: Sometimes, security plugins or firewall settings can block certain crawlers. If youโre using any security or firewall solutions beyond Cloudflare, double-check those configurations.
4. **Cross-Origin Resource Sharing (CORS)**: If your site serves content from multiple subdomains or external resources, make sure your CORS settings arenโt hindering the crawl.
5. **Caching Issues**: If your site uses aggressive caching, it might be interfering with how crawlers view your pages. Clearing the cache can potentially resolve any discrepancies.
6. **Query Parameters**: If your URLs include complex query parameters, you might want to simplify them to see if that helps in allowing the crawlers through.
If you havenโt