Robots.txt Generator

A robots.txt generator that helps you create proper robots.txt files to control search engine crawling and indexing of your website. Manage which parts of your site search engines can access.

Features

User Agent Configuration:
- Support for all major search engine crawlers
- Specific rules for different bots
- Wildcard support for all crawlers
Crawling Directives:
- Allow/Disallow rules for specific paths
- Add comments for documentation
- Easy-to-manage directive list
- Quick-add common paths
Advanced Settings:
- Crawl delay configuration
- Sitemap URL integration
- Custom directive support
- Template presets
Smart Features:
- Pre-configured templates (Basic/Strict)
- Common path quick-add buttons
- Real-time preview
- One-click copy to clipboard

How to Use

Configure User Agent:
- Choose which crawler to configure
- Use "*" for all crawlers
- Create separate rules for specific bots
Set Crawling Rules:
- Add Allow directives for permitted paths
- Add Disallow directives for restricted areas
- Use comments to document your rules
Advanced Configuration:
- Set crawl delay to manage server load
- Add sitemap URL for better indexing
- Include custom directives if needed
Generate and Implement:
- Click "Generate robots.txt" to create the file
- Copy the generated content
- Save as robots.txt in your website root directory

Robots.txt Syntax

User Agent

User-agent: *          # All crawlers
User-agent: Googlebot  # Google only
User-agent: Bingbot    # Bing only

Directives

Allow: /public/        # Allow access to public folder
Disallow: /private/    # Disallow private folder
Disallow: /            # Disallow everything
Allow: /               # Allow everything

Advanced Directives

Crawl-delay: 1         # Wait 1 second between requests
Sitemap: https://example.com/sitemap.xml

Common Use Cases

Basic Website

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://example.com/sitemap.xml

E-commerce Site

User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /admin/
Crawl-delay: 1
Sitemap: https://example.com/sitemap.xml

Blog with Admin Area

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Crawl-delay: 2
Sitemap: https://example.com/sitemap.xml

Strict Security

User-agent: *
Disallow: /
Allow: /public/
Allow: /sitemap.xml
Allow: /robots.txt

Best Practices

Path Matching

Exact Match: /page.html matches only that file
Directory: /folder/ disallows entire folder
Trailing Slash: /folder matches both /folder and /folder.html
Wildcards: /*.pdf disallows all PDF files

Common Paths to Disallow

/admin/ - Admin panels
/private/ - Private areas
/cgi-bin/ - CGI scripts
/tmp/ - Temporary files
/cache/ - Cache directories
/search/ - Internal search results

Crawl Delay Guidelines

Shared Hosting: 1-2 seconds
Dedicated Server: 0.5-1 second
Large Sites: Consider no delay
Rate Limiting: Use server-side limits instead

User Agent Reference

Major Search Engines

Googlebot: Google search crawler
Bingbot: Microsoft Bing crawler
Slurp: Yahoo search crawler
DuckDuckBot: DuckDuckGo crawler
Baiduspider: Baidu search crawler
Yandexbot: Yandex search crawler

Specialized Crawlers

Googlebot-Image: Google image search
Googlebot-News: Google news crawler
Mediapartners-Google: Google AdSense crawler
AdsBot-Google: Google Ads crawler

Common Mistakes to Avoid

Syntax Errors

Missing User-agent: Always specify user agent first
Incorrect Paths: Use forward slashes, not backslashes
Case Sensitivity: URLs are case-sensitive on most servers
Empty Disallow: Disallow: allows everything

Logic Errors

Conflicting Rules: More specific rules override general ones
Forgotten Allow: If you disallow /folder/, remember to allow specific files
Blocking CSS/JS: Don't block resources needed for rendering
Disallowing Sitemap: Always allow sitemap access

SEO Impact

Blocking Too Much: Don't disallow important content
No Sitemap: Include sitemap for better crawling
Ignoring Crawl Budget: Use crawl delay for large sites
Forgetting Mobile: Ensure mobile content is accessible

Testing and Validation

Google Tools

Google Search Console: Check robots.txt coverage
Robots.txt Tester: Test specific URLs
URL Inspection Tool: Verify crawling status

Third-party Tools

Screaming Frog: Technical SEO audit
Ahrefs Site Audit: Comprehensive analysis
Screaming Frog: Crawl simulation

Manual Testing


bash
# Test with curl
curl -A "Googlebot" https://example.com/robots.txt

# Test specific URL
curl -A "Googlebot" -I https://example.com/private-page

Advanced Patterns

Multiple User Agents

User-agent: Googlebot
Disallow: /no-google/

User-agent: Bingbot
Disallow: /no-bing/

User-agent: *
Allow: /

Wildcard Patterns

Disallow: /*.pdf$     # All PDF files
Disallow: /search?*    # All search URLs with parameters
Disallow: /temp/*/    # All subdirectories of temp

Conditional Rules

User-agent: *
Crawl-delay: 1

User-agent: Googlebot
Crawl-delay: 0.5

Security Considerations

Sensitive Information

Don't hide secrets: robots.txt is public
Use authentication: Real security requires login
Noindex sensitive pages: Use meta tags instead
Monitor access: Check crawler behavior

Rate Limiting

Server-side limits: More reliable than crawl-delay
IP blocking: For abusive crawlers
CDN protection: Cloudflare, Akamai
Monitoring: Track crawler requests

Implementation Tips

File Location

Root Directory: Must be at https://domain.com/robots.txt
Case Sensitive: Use lowercase "robots.txt"
Accessible: Ensure file is reachable
Small Size: Keep under 500KB

Hosting Considerations

Static Hosting: Easy to deploy
CMS Integration: Generate dynamically if needed
Version Control: Track changes
Backup: Keep copies of versions

Maintenance

Regular Reviews: Update when site structure changes
Log Analysis: Monitor crawler behavior
Performance Impact: Check crawl efficiency
SEO Coordination: Align with SEO strategy

Privacy

This tool processes all data locally in your browser. No information is sent to any server, ensuring your website structure and crawling rules remain private and secure.

Troubleshooting

Common Issues

404 Error: File not found or wrong location
Parsing Errors: Invalid syntax
Unexpected Blocking: Rules too broad
Crawler Ignoring: File not accessible

Debugging Steps

Check File Location: Ensure robots.txt is in root
Validate Syntax: Use robots.txt validators
Test URLs: Verify specific page blocking
Check Logs: Monitor crawler requests
Review Rules: Ensure logic is correct

Resources

Google Documentation: robots.txt specification
Robots.txt Tester: Google's testing tool
Web Robots Database: Comprehensive crawler list
SEO Guidelines: Best practices documentation
RFC 9309: Official robots.txt standard