Robots.txt

free

GET/v1/domain/robots

Fetches and parses the robots.txt file to extract crawler rules, disallowed paths, and sitemap references. Reveals what content is hidden from search engines.

What It Does

Retrieves robots.txt from the domain root and parses it per RFC 9309 (the Robots Exclusion Protocol standard). Extracts User-agent blocks, Allow/Disallow rules, sitemap references, and crawl-delay directives. Highlights commonly interesting disallowed paths (admin panels, APIs, etc.).

Why It's Useful

Robots.txt often reveals hidden directories, admin panels, and API endpoints that aren't linked publicly. For SEO, it helps verify that important pages aren't accidentally blocked from search engines.

Use Cases

Penetration Tester

Security Reconnaissance

Discover hidden paths and admin interfaces listed in Disallow rules.

Find additional attack surface not discoverable through crawling.

SEO Specialist

SEO Troubleshooting

Diagnose why certain pages aren't appearing in search results by checking robots.txt blocks.

Fix accidental search engine blocks hurting organic traffic.

DevOps Engineer

Crawler Configuration

Verify robots.txt is properly configured before deploying to production.

Prevent accidental blocking of important pages from search engines.

Parameters

Name	Type	Required	Description
`domain`	string	Required	Domain or full URL — accepts `example.com` or `https://example.com/path`. Both `http://` and `https://` robots.txt locations are probed regardless of the protocol picked.Example: `https://example.com`

Response Fields

Field	Type	Description
`domain`	string	The queried domain (bare hostname).
`url`	string	Full URL that was fetched, echoing the protocol used in the request.
`exists`	boolean	Whether robots.txt exists
`fileSize`	number	File size in bytes (null if not found)
`rules`	array	Parsed rules by user-agent (userAgent, allow, disallow, crawlDelay)
`effectiveRulesByCrawler`	object	Per-crawler effective rules (Googlebot, Bingbot, GPTBot, ClaudeBot, CCBot, etc.) with source ("specific" or "wildcard") — reveals exactly what each crawler is allowed to do, removing the common misconception that the wildcard always applies.
`sitemaps`	array	Sitemap URLs referenced
`sitemapReachability`	array	HEAD-verified reachability of each declared Sitemap URL (status, contentType, lastModified). A 404 sitemap declared in robots.txt is a top Search Console error class.
`interestingPaths`	array	Security-interesting disallowed paths (admin, login, API, etc.)
`totalRules`	number	Total number of user-agent rule groups
`totalDisallowedPaths`	number	Total number of disallowed paths across all rules
`score`	number	Robots.txt health score (0-100)
`grade`	string	Letter grade (A-F) based on score
`scoreDetails`	array	Breakdown of scoring factors
`recommendations`	array	Actionable suggestions to improve robots.txt

Code Examples

cURL

curl "https://api.edgedns.dev/v1/domain/robots" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d "domain=https://example.com"

JavaScript

const response = await fetch(
  'https://api.edgedns.dev/v1/domain/robots?domain=https%3A%2F%2Fexample.com',
  {
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY'
    }
  }
);

const data = await response.json();
console.log(data);

Python

import requests

response = requests.get(
    'https://api.edgedns.dev/v1/domain/robots',
    headers={'Authorization': 'Bearer YOUR_API_KEY'},
    params={
    'domain': 'https://example.com'
    }
)

data = response.json()
print(data)

Read the full Robots.txt guide

Why it matters, real-world use cases, parameters, response fields, and how to call it from Claude, ChatGPT, or Gemini via MCP.

Read the guide →

External References

Learn more about the standards and protocols behind this endpoint.

Google Search Essentials W3C JSON-LD

Try This Endpoint

Test the Robots.txt endpoint live in the playground.

Open in Playground Get API Key