A robots.txt generator that helps you create proper robots.txt files to control search engine crawling and indexing of your website. Manage which parts of your site search engines can access.
Features
How to Use
-
Configure User Agent:
- Choose which crawler to configure
- Use "*" for all crawlers
- Create separate rules for specific bots
-
Set Crawling Rules:
- Add Allow directives for permitted paths
- Add Disallow directives for restricted areas
- Use comments to document your rules
-
Advanced Configuration:
- Set crawl delay to manage server load
- Add sitemap URL for better indexing
- Include custom directives if needed
-
Generate and Implement:
- Click "Generate robots.txt" to create the file
- Copy the generated content
- Save as robots.txt in your website root directory
Robots.txt Syntax
User Agent
User-agent: * # All crawlers
User-agent: Googlebot # Google only
User-agent: Bingbot # Bing only
Directives
Allow: /public/ # Allow access to public folder
Disallow: /private/ # Disallow private folder
Disallow: / # Disallow everything
Allow: / # Allow everything
Advanced Directives
Crawl-delay: 1 # Wait 1 second between requests
Sitemap: https://example.com/sitemap.xml
Common Use Cases
Basic Website
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://example.com/sitemap.xml
E-commerce Site
User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /admin/
Crawl-delay: 1
Sitemap: https://example.com/sitemap.xml
Blog with Admin Area
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Crawl-delay: 2
Sitemap: https://example.com/sitemap.xml
Strict Security
User-agent: *
Disallow: /
Allow: /public/
Allow: /sitemap.xml
Allow: /robots.txt
Best Practices
Path Matching
- Exact Match:
/page.html matches only that file
- Directory:
/folder/ disallows entire folder
- Trailing Slash:
/folder matches both /folder and /folder.html
- Wildcards:
/*.pdf disallows all PDF files
Common Paths to Disallow
/admin/ - Admin panels
/private/ - Private areas
/cgi-bin/ - CGI scripts
/tmp/ - Temporary files
/cache/ - Cache directories
/search/ - Internal search results
Crawl Delay Guidelines
- Shared Hosting: 1-2 seconds
- Dedicated Server: 0.5-1 second
- Large Sites: Consider no delay
- Rate Limiting: Use server-side limits instead
User Agent Reference
Major Search Engines
- Googlebot: Google search crawler
- Bingbot: Microsoft Bing crawler
- Slurp: Yahoo search crawler
- DuckDuckBot: DuckDuckGo crawler
- Baiduspider: Baidu search crawler
- Yandexbot: Yandex search crawler
Specialized Crawlers
- Googlebot-Image: Google image search
- Googlebot-News: Google news crawler
- Mediapartners-Google: Google AdSense crawler
- AdsBot-Google: Google Ads crawler
Common Mistakes to Avoid
Syntax Errors
- Missing User-agent: Always specify user agent first
- Incorrect Paths: Use forward slashes, not backslashes
- Case Sensitivity: URLs are case-sensitive on most servers
- Empty Disallow:
Disallow: allows everything
Logic Errors
- Conflicting Rules: More specific rules override general ones
- Forgotten Allow: If you disallow
/folder/, remember to allow specific files
- Blocking CSS/JS: Don't block resources needed for rendering
- Disallowing Sitemap: Always allow sitemap access
SEO Impact
- Blocking Too Much: Don't disallow important content
- No Sitemap: Include sitemap for better crawling
- Ignoring Crawl Budget: Use crawl delay for large sites
- Forgetting Mobile: Ensure mobile content is accessible
Testing and Validation
Google Tools
- Google Search Console: Check robots.txt coverage
- Robots.txt Tester: Test specific URLs
- URL Inspection Tool: Verify crawling status
Third-party Tools
- Screaming Frog: Technical SEO audit
- Ahrefs Site Audit: Comprehensive analysis
- Screaming Frog: Crawl simulation
Manual Testing
# Test with curl
curl -A "Googlebot" https://example.com/robots.txt
# Test specific URL
curl -A "Googlebot" -I https://example.com/private-page
Advanced Patterns
Multiple User Agents
User-agent: Googlebot
Disallow: /no-google/
User-agent: Bingbot
Disallow: /no-bing/
User-agent: *
Allow: /
Wildcard Patterns
Disallow: /*.pdf$ # All PDF files
Disallow: /search?* # All search URLs with parameters
Disallow: /temp/*/ # All subdirectories of temp
Conditional Rules
User-agent: *
Crawl-delay: 1
User-agent: Googlebot
Crawl-delay: 0.5
Security Considerations
Sensitive Information
- Don't hide secrets: robots.txt is public
- Use authentication: Real security requires login
- Noindex sensitive pages: Use meta tags instead
- Monitor access: Check crawler behavior
Rate Limiting
- Server-side limits: More reliable than crawl-delay
- IP blocking: For abusive crawlers
- CDN protection: Cloudflare, Akamai
- Monitoring: Track crawler requests
Implementation Tips
File Location
- Root Directory: Must be at
https://domain.com/robots.txt
- Case Sensitive: Use lowercase "robots.txt"
- Accessible: Ensure file is reachable
- Small Size: Keep under 500KB
Hosting Considerations
- Static Hosting: Easy to deploy
- CMS Integration: Generate dynamically if needed
- Version Control: Track changes
- Backup: Keep copies of versions
Maintenance
- Regular Reviews: Update when site structure changes
- Log Analysis: Monitor crawler behavior
- Performance Impact: Check crawl efficiency
- SEO Coordination: Align with SEO strategy
Privacy
This tool processes all data locally in your browser. No information is sent to any server, ensuring your website structure and crawling rules remain private and secure.
Troubleshooting
Common Issues
- 404 Error: File not found or wrong location
- Parsing Errors: Invalid syntax
- Unexpected Blocking: Rules too broad
- Crawler Ignoring: File not accessible
Debugging Steps
- Check File Location: Ensure robots.txt is in root
- Validate Syntax: Use robots.txt validators
- Test URLs: Verify specific page blocking
- Check Logs: Monitor crawler requests
- Review Rules: Ensure logic is correct
Resources
- Google Documentation: robots.txt specification
- Robots.txt Tester: Google's testing tool
- Web Robots Database: Comprehensive crawler list
- SEO Guidelines: Best practices documentation
- RFC 9309: Official robots.txt standard