Troubleshooting Bing Indexing Issues: When Your Site Is Discovered but Not Crawled
Effective website indexing is vital for ensuring your content reaches a broad audience through search engines. While Googleโs indexing process is generally smooth, many webmasters encounter challenges with Bing, especially when pages are discovered but not crawled promptly. If youโre experiencing such issues, youโre not aloneโand understanding the potential causes and solutions can help you resolve them efficiently.
Understanding the Problem
Recently, a website owner observed that certain pages, such as example.com/sub-page/, are being indexed rapidlyโoften within minutesโby Bing. In contrast, other pages, like example.com/blog/post/, remain in a “Discovered” state for over 24-48 hours without being crawled.
This inconsistency can be perplexing, particularly because:
- The siteโs domain is well-established, aged 12 years.
- The website uses a current theme and plugins.
- No recent changes were made to Bing Webmaster Tools settings or the website itself.
- The site employs Cloudflare for CDN and security features.
- Bingbot continues to visit and crawl other parts of the site without issues.
Potential Causes
-
Selective Crawling Restrictions
While your domain is accessible, certain folders or pages might be inadvertently blocked from crawling due to configurations in firewall rules or crawler directives. -
Robots.txt or Meta Tag Policies
Verify that your robots.txt file does not disallow the blog subfolder or specific pages. Similarly, ensure no meta tags are preventing crawling. -
Firewall or Security Settings
Although you checked Cloudflare firewall settings, some configurations might selectively block particular URLs or user agents. Itโs worth confirming that the Bingbot user agent isnโt being restricted. -
Server Response and Headers
Ensure that the pages return appropriate HTTP status codes (200 OK) and that there are no unexpected redirects or errors that might hinder crawling.
Troubleshooting Steps
To address the issue, consider the following steps:
-
Review Robots.txt and Meta Tags:
Confirm that the blog subfolder isnโt disallowed. Use tools like Bingโs URL Inspection Tool to test individual pagesโ accessibility. -
Examine Server Logs:
Check server logs for Bingbot access attempts to identify any blocks or errors when crawling blog pages. -
Inspect Firewall and CDN Settings:
Revisit Cloudflareโs firewall rules to ensure no specific restrictions apply to the blog