Reviewing and Improving My robots.txt

Quick Review Needed for a `robots.txt` File

Is Anything Incorrect?

Hello everyone,

I’m looking for a quick review of the robots.txt file content below. Is there anything obviously wrong with it? Any feedback would be appreciated. Thank you! 🙂

“`plaintext
Sitemap: https://www.mysite.com.hk/sitemap.xml

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: dotbot
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: Barkrowler
Disallow: /

User-agent: serpstatbot
Disallow: /

User-agent: GeedoBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: DataForSeoBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: VelenPublicWebCrawler
Disallow: /

User-agent: TurnitinBot
Disallow: /

User-agent: Riddler
Disallow: /

Directories

User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/
Disallow: /ApiPhp/
Disallow: /SID=
Disallow: /review/
Disallow: /productreviewscollection/
Disallow: */questionanswerscollection/

Paths (clean URLs)

Disallow: /index.php/
Disallow: /catalog/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /customer/
Disallow: /customize/
Disallow: /media/oms/
Disallow: /catalogutils/
Disallow: /sendfriend/

Disable all queries

Disallow: /?
Allow: /?p=*

Files

Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_A

Technical SEO

Hubsadmin

2 responses to “Reviewing and Improving My robots.txt”

Hubsadmin says:

February 27, 2025 at 10:10 am
Certainly! Here’s a detailed review of the provided robots.txt file:

Overview

A robots.txt file is used to instruct web crawlers about which parts of your site they are allowed to visit or index. It’s a crucial part of website management for SEO and resource management. Let’s go through the content provided.

Analysis
1. Sitemap Specification:
plaintext Sitemap: [https://www.mysite.com.hk/sitemap.xml](https://www.networldsports.com.hk/sitemap.xml)
- Issue: The link text and the actual URL do not match.
- Recommendation: Ensure you use the correct URL for the sitemap to help search engines locate it.
plaintext Sitemap: https://www.mysite.com.hk/sitemap.xml

Remove the Markdown link syntax if it is a plain-text file, as it is unnecessary there.
1. Bot Directives:
plaintext User-agent: AdsBot-Google Disallow:
- AdsBot-Google, Googlebot-Image: Allowing all access, which is typically fine unless specific pages should be blocked.
plaintext User-agent: dotbot Disallow: /
- Blocking Selected Bots: You have several bots fully disallowed, which seems intentional for dotbot, BLEXBot, Barkrowler, and others. This is fine if they shouldn’t crawl any part of your site.
- General Directions:
plaintext User-agent: *
- Blocking Specific Directories and Paths: This is where you specify directories that should not be crawled by any bot. Make sure these paths are correctly inputted and reflect directories/files you want to hide from crawlers.
- Path Blocking and File Restrictions:
plaintext Disallow: /404/ Disallow: /cgi-bin/
- Specific Directories: It appears you are blocking access to backend and administrative directories and PHP files, which is a common and generally wise strategy.
- PHP Files:
  
  plaintext Disallow: /*.php$
  - Check for Exceptions: Ensure any public-facing PHP endpoints required for your site (like AJAX handlers) aren’t accidentally blocked.
Reply
Hubsadmin says:

March 26, 2025 at 6:23 pm

It looks like you’ve put a lot of thought into your `robots.txt` file! Here are a few insights that might help refine it further:

1. **Check Your Sitemap URL**: The sitemap link appears as `https://www.mysite.com.hk/sitemap.xml` in your `robots.txt`, but your post suggests a different domain (`https://www.networldsports.com.hk/`). Make sure to update this accordingly to ensure search engines can locate your sitemap effectively.

2. **Block Specific Bots with Caution**: While it’s great to restrict unwanted crawlers like `MJ12bot` and `BLEXBot`, consider whether you need to block them entirely. For instance, if some bots bring value by indexing content that can lead to legitimate traffic, you might want to allow them or at least monitor their impact.

3. **Order of Rules**: The order of user-agent directives can matter. Be mindful of the wildcard user-agent `*` coming after specific ones. Specific rules are usually prioritized, meaning any bot that matches a specific rule won’t even reach the broader instructions beneath it.

4. **Test for Syntax Errors**: Make sure there are no unintentional syntax errors in your disallowed paths. The presence of `` tags in paths (like `/SID=
` and `/productreviewscollection/
`) could cause parsing issues. You might want to clean up this formatting to ensure it doesn’t lead to potential mishaps.

5. **

Reply

Reviewing and Improving My robots.txt

Quick Review Needed for a `robots.txt` File

Is Anything Incorrect?

Directories

Paths (clean URLs)

Disable all queries

Files

2 responses to “Reviewing and Improving My robots.txt”

Overview

Analysis

Leave a Reply Cancel reply

Hubs Digital Marketers

Newsletter Signup

Categories

Customer Support

Reviewing and Improving My robots.txt

Quick Review Needed for a robots.txt File

Is Anything Incorrect?

Directories

Paths (clean URLs)

Disable all queries

Files

2 responses to “Reviewing and Improving My robots.txt”

Overview

Analysis

Leave a Reply Cancel reply

Quick Review Needed for a `robots.txt` File