Robots.txt Generator

Generate proper robots.txt files to control search engine crawling and indexing of your website.

Settings

Crawling Directives

Quick Add Common Paths:

A robots.txt generator that helps you create proper robots.txt files to control search engine crawling and indexing of your website. Manage which parts of your site search engines can access.

Features

  • User Agent Configuration:

    • Support for all major search engine crawlers
    • Specific rules for different bots
    • Wildcard support for all crawlers
  • Crawling Directives:

    • Allow/Disallow rules for specific paths
    • Add comments for documentation
    • Easy-to-manage directive list
    • Quick-add common paths
  • Advanced Settings:

    • Crawl delay configuration
    • Sitemap URL integration
    • Custom directive support
    • Template presets
  • Smart Features:

    • Pre-configured templates (Basic/Strict)
    • Common path quick-add buttons
    • Real-time preview
    • One-click copy to clipboard

How to Use

  1. Configure User Agent:

    • Choose which crawler to configure
    • Use "*" for all crawlers
    • Create separate rules for specific bots
  2. Set Crawling Rules:

    • Add Allow directives for permitted paths
    • Add Disallow directives for restricted areas
    • Use comments to document your rules
  3. Advanced Configuration:

    • Set crawl delay to manage server load
    • Add sitemap URL for better indexing
    • Include custom directives if needed
  4. Generate and Implement:

    • Click "Generate robots.txt" to create the file
    • Copy the generated content
    • Save as robots.txt in your website root directory

Robots.txt Syntax

User Agent

User-agent: *          # All crawlers
User-agent: Googlebot  # Google only
User-agent: Bingbot    # Bing only

Directives

Allow: /public/        # Allow access to public folder
Disallow: /private/    # Disallow private folder
Disallow: /            # Disallow everything
Allow: /               # Allow everything

Advanced Directives

Crawl-delay: 1         # Wait 1 second between requests
Sitemap: https://example.com/sitemap.xml

Common Use Cases

Basic Website

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://example.com/sitemap.xml

E-commerce Site

User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /admin/
Crawl-delay: 1
Sitemap: https://example.com/sitemap.xml

Blog with Admin Area

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Crawl-delay: 2
Sitemap: https://example.com/sitemap.xml

Strict Security

User-agent: *
Disallow: /
Allow: /public/
Allow: /sitemap.xml
Allow: /robots.txt

Best Practices

Path Matching

  • Exact Match: /page.html matches only that file
  • Directory: /folder/ disallows entire folder
  • Trailing Slash: /folder matches both /folder and /folder.html
  • Wildcards: /*.pdf disallows all PDF files

Common Paths to Disallow

  • /admin/ - Admin panels
  • /private/ - Private areas
  • /cgi-bin/ - CGI scripts
  • /tmp/ - Temporary files
  • /cache/ - Cache directories
  • /search/ - Internal search results

Crawl Delay Guidelines

  • Shared Hosting: 1-2 seconds
  • Dedicated Server: 0.5-1 second
  • Large Sites: Consider no delay
  • Rate Limiting: Use server-side limits instead

User Agent Reference

Major Search Engines

  • Googlebot: Google search crawler
  • Bingbot: Microsoft Bing crawler
  • Slurp: Yahoo search crawler
  • DuckDuckBot: DuckDuckGo crawler
  • Baiduspider: Baidu search crawler
  • Yandexbot: Yandex search crawler

Specialized Crawlers

  • Googlebot-Image: Google image search
  • Googlebot-News: Google news crawler
  • Mediapartners-Google: Google AdSense crawler
  • AdsBot-Google: Google Ads crawler

Common Mistakes to Avoid

Syntax Errors

  • Missing User-agent: Always specify user agent first
  • Incorrect Paths: Use forward slashes, not backslashes
  • Case Sensitivity: URLs are case-sensitive on most servers
  • Empty Disallow: Disallow: allows everything

Logic Errors

  • Conflicting Rules: More specific rules override general ones
  • Forgotten Allow: If you disallow /folder/, remember to allow specific files
  • Blocking CSS/JS: Don't block resources needed for rendering
  • Disallowing Sitemap: Always allow sitemap access

SEO Impact

  • Blocking Too Much: Don't disallow important content
  • No Sitemap: Include sitemap for better crawling
  • Ignoring Crawl Budget: Use crawl delay for large sites
  • Forgetting Mobile: Ensure mobile content is accessible

Testing and Validation

Google Tools

  • Google Search Console: Check robots.txt coverage
  • Robots.txt Tester: Test specific URLs
  • URL Inspection Tool: Verify crawling status

Third-party Tools

  • Screaming Frog: Technical SEO audit
  • Ahrefs Site Audit: Comprehensive analysis
  • Screaming Frog: Crawl simulation

Manual Testing

# Test with curl
curl -A "Googlebot" https://example.com/robots.txt

# Test specific URL
curl -A "Googlebot" -I https://example.com/private-page

Advanced Patterns

Multiple User Agents

User-agent: Googlebot
Disallow: /no-google/

User-agent: Bingbot
Disallow: /no-bing/

User-agent: *
Allow: /

Wildcard Patterns

Disallow: /*.pdf$     # All PDF files
Disallow: /search?*    # All search URLs with parameters
Disallow: /temp/*/    # All subdirectories of temp

Conditional Rules

User-agent: *
Crawl-delay: 1

User-agent: Googlebot
Crawl-delay: 0.5

Security Considerations

Sensitive Information

  • Don't hide secrets: robots.txt is public
  • Use authentication: Real security requires login
  • Noindex sensitive pages: Use meta tags instead
  • Monitor access: Check crawler behavior

Rate Limiting

  • Server-side limits: More reliable than crawl-delay
  • IP blocking: For abusive crawlers
  • CDN protection: Cloudflare, Akamai
  • Monitoring: Track crawler requests

Implementation Tips

File Location

  • Root Directory: Must be at https://domain.com/robots.txt
  • Case Sensitive: Use lowercase "robots.txt"
  • Accessible: Ensure file is reachable
  • Small Size: Keep under 500KB

Hosting Considerations

  • Static Hosting: Easy to deploy
  • CMS Integration: Generate dynamically if needed
  • Version Control: Track changes
  • Backup: Keep copies of versions

Maintenance

  • Regular Reviews: Update when site structure changes
  • Log Analysis: Monitor crawler behavior
  • Performance Impact: Check crawl efficiency
  • SEO Coordination: Align with SEO strategy

Privacy

This tool processes all data locally in your browser. No information is sent to any server, ensuring your website structure and crawling rules remain private and secure.

Troubleshooting

Common Issues

  • 404 Error: File not found or wrong location
  • Parsing Errors: Invalid syntax
  • Unexpected Blocking: Rules too broad
  • Crawler Ignoring: File not accessible

Debugging Steps

  1. Check File Location: Ensure robots.txt is in root
  2. Validate Syntax: Use robots.txt validators
  3. Test URLs: Verify specific page blocking
  4. Check Logs: Monitor crawler requests
  5. Review Rules: Ensure logic is correct

Resources

  • Google Documentation: robots.txt specification
  • Robots.txt Tester: Google's testing tool
  • Web Robots Database: Comprehensive crawler list
  • SEO Guidelines: Best practices documentation
  • RFC 9309: Official robots.txt standard