Can External Sitemap Generators Identify Orphaned Pages?
I’m interested in generating a sitemap for a website I don’t own, which currently lacks one. My best option seems to be using an external sitemap generator, such as XML Sitemaps.
However, these tools typically crawl a site by following internal links. As a result, they might not include orphaned pages, even though one of the key uses of a sitemap is to discover these unlinked pages.
Is this assumption correct?
2 responses to “Can external sitemap generators detect orphaned pages?”
When dealing with websites, a sitemap is a crucial tool that helps search engines and users understand the structure of a website. Your question revolves around identifying orphaned pages in the process of generating a sitemap for a website you don’t own. Let’s delve deeper into this:
Understanding Orphaned Pages
Orphaned pages are pages on a website that do not have any links pointing to them from other pages within the same domain. As a result, these pages are often difficult for search engines and users to discover unless directly accessed via the URL.
External Sitemap Generators and Orphaned Pages
You mentioned tools like XML Sitemaps to generate sitemaps. These tools primarily crawl websites by following internal links from the provided starting point (usually the homepage). Hereโs how this process impacts orphaned pages:
Crawling via Internal Links: Most external sitemap generators start by crawling a website from a given URL and follow internal links from there. This means:
If a page is not linked from any other page on the site (i.e., an orphaned page), the crawler typically won’t find it during the process, as there is no link trail leading to it.
Discovering of Orphaned Pages:
Limitations: Because these generators rely on links, they generally can’t discover pages that have no inbound links from other parts of the site.
Conclusion
Using an external sitemap generator on a website you don’t own will likely overlook orphaned pages because these tools depend on the linkage structure to discover pages. Since orphaned pages, by definition, lack these incoming links, they remain undiscovered in such a crawl.
Recommended Actions
Contact Website Owner: If orphaned pages are critical for your purpose, consider reaching out to the website’s owner. They might be able to provide you with an existing sitemap or additional resources to help identify these pages.
Webmaster Tools: If ever given access, tools like Google Search Console for the website can provide insights into pages that are indexed but not internally linked.
Advanced Tools: Consider advanced web crawling and SEO tools (like
Your assumption is quite accurate! External sitemap generators primarily rely on internal linking to discover and crawl pages, meaning that orphaned pagesโthose not linked from anywhere else on the siteโcan go unnoticed in the generated sitemap.
However, there are a few strategies you might consider to identify orphaned pages. One approach is to use website auditing tools like Screaming Frog or Sitebulb, which can provide a more comprehensive analysis by allowing you to check for pages that exist but do not have any inbound links.
Additionally, if you have access to Google Search Console, you can review the โCoverageโ report to find pages that are indexed but might not have any internal links pointing to them. This can give you a clearer picture of which pages might be orphaned.
In the long term, itโs beneficial to establish a process for regularly auditing content and ensuring all pages, especially high-value or evergreen content, are properly linked within the site structure. This not only helps in maintaining a well-organized site but also enhances SEO by ensuring that search engines can easily find and index all content.