Skip to main content

Robots.txt

free
GET/v1/domain/robots

Fetches and parses the robots.txt file to extract crawler rules, disallowed paths, and sitemap references. Reveals what content is hidden from search engines.

What It Does

Retrieves robots.txt from the domain root and parses it per RFC 9309 (the Robots Exclusion Protocol standard). Extracts User-agent blocks, Allow/Disallow rules, sitemap references, and crawl-delay directives. Highlights commonly interesting disallowed paths (admin panels, APIs, etc.).

Why It's Useful

Robots.txt often reveals hidden directories, admin panels, and API endpoints that aren't linked publicly. For SEO, it helps verify that important pages aren't accidentally blocked from search engines.

Use Cases

Penetration Tester

Security Reconnaissance

Discover hidden paths and admin interfaces listed in Disallow rules.

Find additional attack surface not discoverable through crawling.

SEO Specialist

SEO Troubleshooting

Diagnose why certain pages aren't appearing in search results by checking robots.txt blocks.

Fix accidental search engine blocks hurting organic traffic.

DevOps Engineer

Crawler Configuration

Verify robots.txt is properly configured before deploying to production.

Prevent accidental blocking of important pages from search engines.

Parameters

NameTypeRequiredDescription
domainstringRequiredDomain or full URL — accepts `example.com` or `https://example.com/path`. Both `http://` and `https://` robots.txt locations are probed regardless of the protocol picked.Example: https://example.com

Response Fields

FieldTypeDescription
domainstringThe queried domain (bare hostname).
urlstringFull URL that was fetched, echoing the protocol used in the request.
existsbooleanWhether robots.txt exists
fileSizenumberFile size in bytes (null if not found)
rulesarrayParsed rules by user-agent (userAgent, allow, disallow, crawlDelay)
effectiveRulesByCrawlerobjectPer-crawler effective rules (Googlebot, Bingbot, GPTBot, ClaudeBot, CCBot, etc.) with source ("specific" or "wildcard") — reveals exactly what each crawler is allowed to do, removing the common misconception that the wildcard always applies.
sitemapsarraySitemap URLs referenced
sitemapReachabilityarrayHEAD-verified reachability of each declared Sitemap URL (status, contentType, lastModified). A 404 sitemap declared in robots.txt is a top Search Console error class.
interestingPathsarraySecurity-interesting disallowed paths (admin, login, API, etc.)
totalRulesnumberTotal number of user-agent rule groups
totalDisallowedPathsnumberTotal number of disallowed paths across all rules
scorenumberRobots.txt health score (0-100)
gradestringLetter grade (A-F) based on score
scoreDetailsarrayBreakdown of scoring factors
recommendationsarrayActionable suggestions to improve robots.txt

Code Examples

cURL
curl "https://api.edgedns.dev/v1/domain/robots" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d "domain=https://example.com"
JavaScript
const response = await fetch(
  'https://api.edgedns.dev/v1/domain/robots?domain=https%3A%2F%2Fexample.com',
  {
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY'
    }
  }
);

const data = await response.json();
console.log(data);
Python
import requests

response = requests.get(
    'https://api.edgedns.dev/v1/domain/robots',
    headers={'Authorization': 'Bearer YOUR_API_KEY'},
    params={
    'domain': 'https://example.com'
    }
)

data = response.json()
print(data)

Read the full Robots.txt guide

Why it matters, real-world use cases, parameters, response fields, and how to call it from Claude, ChatGPT, or Gemini via MCP.

Read the guide →

Related Endpoints

External References

Learn more about the standards and protocols behind this endpoint.

Try This Endpoint

Test the Robots.txt endpoint live in the playground.