Skip to main content
Guides/SEO & Content

Deep Technical SEO Audits: Beyond the 200 OK

Modern technical SEO audits have outgrown the checklist most teams still run. A 200 OK is not a successful audit. Titles truncate in pixels, not characters. Hreflang quietly breaks the moment one return-link goes missing. Robots.txt now means something different to Googlebot, GPTBot, and ClaudeBot. This guide walks through the deeper checks that distinguish a real technical audit from a surface scan — and shows the one-call API workflow that runs them all.

EdgeDNS Team··11 min read

Why a 200 OK isn't a successful audit

Most technical SEO audits stop at the front door. The crawler fetches the homepage, sees an HTTP 200 status, confirms a title tag exists, finds a canonical, and ticks the box. The page is technically fine, so the audit moves on. That is exactly where modern technical SEO problems live — in the gap between technically fine and actually working.

A page can return 200 and still be a soft 404 that Google quietly drops from the index. A title tag can be the perfect length in characters and still get truncated by pixel width in the mobile SERP. Hreflang can be perfectly declared on every page and still be broken because one of the alternates forgot to declare a return link. Robots.txt can be valid and still mean three different things to Googlebot, GPTBot, and ClaudeBot. None of these problems show up on the surface scan that most audit tools still run.

This guide is the field manual for the deeper layer. Each section covers one capability that distinguishes a real audit from a surface scan, plus the EdgeDNS endpoint that runs it. At the end, we compose them into a single audit pass — the same one you would run in CI before every production deploy, or against a competitor's site to find the exact gaps you can exploit.

Title pixel-width and SERP truncation preview

Google does not truncate titles by character count. It truncates by pixel width, because narrow characters like i and l take less screen space than wide ones like m and W. A 58-character title made mostly of ms gets cut off in the mobile SERP long before a 62-character title made mostly of is does. The classic "keep titles under 60 characters" rule is a rule of thumb that fails on real-world copy, and the failure is invisible until you compare your title against the actual rendered SERP.

A deep audit measures the actual pixel-width of your title using the same font and weight Google uses, and renders a SERP truncation preview — the exact string that will appear in mobile search results, including the `…` cutoff if there is one. That preview is the difference between knowing your title "might" truncate and seeing exactly which word the truncation will cut off. The rewrite that lifts CTR is almost always a rearrangement that pushes the highest-intent keyword into the visible portion before the cut.

The `domain-seo-audit` endpoint returns this on every audit run under `title.widthAnalysis`. The field includes the measured pixel width, the truncation threshold for the target device, a boolean for whether the title will truncate, and the visible-portion string itself.

bash
curl "https://api.edgedns.dev/v1/domain/seo-audit?domain=example.com" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  | jq '.data.title.widthAnalysis'

# {
#   "pixelWidth": 612,
#   "truncationThresholdPx": 580,
#   "willTruncate": true,
#   "truncationPreview": "Best CRM software for small business — Acm…"
# }

Soft 404s, mixed content, and security headers

Three production traps hide behind a 200 OK response. None of them break uptime monitoring. All of them silently degrade trust and rankings.

Soft 404s. A page that says "page not found" in the body while returning HTTP 200. Google detects this pattern and quietly drops the page from the index — but a basic 2xx check passes. Soft 404s usually appear after a CMS template change, a router refactor, or a marketing tag manager misconfiguration. They are easy to ship, easy to miss, and expensive in lost organic traffic.

Mixed content. An HTTPS page that loads images, scripts, or stylesheets over plain HTTP. Modern browsers block these silently — the page renders without the blocked asset, layout breaks subtly, and visitors leave without anyone seeing a warning. Mixed content also reduces trust-score signals used by mailbox providers and AI summarizers when they evaluate whether to surface your content.

Missing security headers. The absence of Content-Security-Policy, Strict-Transport-Security, X-Frame-Options, and X-Content-Type-Options does not break the page, but it appears on every security audit and increasingly correlates with how third-party trust services rate the domain. Adding them is a five-minute Cloudflare or middleware change with no downside.

The `domain-search-readiness` endpoint returns all three under `softFourOhFour`, `mixedContent`, and `securityHeaders`. Run it in CI against every deploy.

Hreflang reciprocity for international sites

If your site has more than one language or region, you almost certainly use hreflang — the `<link rel="alternate" hreflang="...">` tags that tell Google which version of a page to show to which audience. Hreflang has one rule that breaks more international launches than any other: every alternate must declare a return link. If your English page declares a Spanish alternate, the Spanish page must declare an English alternate pointing back. Google calls this return-link validation, and a missing return-link is the single most common reason hreflang silently stops working after a global migration.

A single-page audit cannot detect this. The bug only shows up when you fetch each alternate and compare what it declares. A deep audit runs that crawl automatically — fetches every hreflang alternate, records what each one points back to, and flags every missing or mismatched return link. The same audit catches duplicate hreflang codes (two `en-US` blocks on the same page) and `x-default` misconfigurations.

This check is tier-gated to Developer+ because it actually crawls the alternate URLs — meaningful work, real bandwidth, real API budget. On the `domain-canonical` endpoint, pass `validateReciprocity=true`:

Warning:

Hreflang return-links are the highest-leverage international SEO check that almost no audit tool runs by default. If your site has any international presence, this is the audit that finds the bug everyone else is shipping.

bash
curl "https://api.edgedns.dev/v1/domain/canonical?url=https://example.com/page&validateReciprocity=true" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  | jq '.data.hreflangReciprocity'

# {
#   "checked": 8,
#   "reciprocal": 6,
#   "missingReturnLinks": [
#     { "lang": "es-MX", "href": "https://example.com/mx/page" },
#     { "lang": "de-DE", "href": "https://example.com/de/page" }
#   ],
#   "duplicateCodes": []
# }

Per-crawler robots: Googlebot vs GPTBot vs ClaudeBot

Robots.txt used to mean one thing. In 2026, it means whatever the specific crawler interprets it to mean — and Googlebot, Bingbot, GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Applebot-Extended, and PerplexityBot all match different `User-agent` blocks. A site that adopted `User-agent: GPTBot / Disallow: /` two years ago to opt out of OpenAI training has probably forgotten that ClaudeBot, CCBot, and Applebot-Extended each need their own block.

The `robots.txt` specificity algorithm is not obvious. A long-tail `User-agent: GPTBot` block overrides the `User-agent: *` rules below it. Wildcards behave differently than path prefixes. The effective rule for a given crawler depends on the full file, and reasoning about it by reading the file is the kind of thing humans get wrong roughly half the time.

A deep audit resolves the effective rule per crawler — what does Googlebot actually do on this site, what does GPTBot actually do, what does ClaudeBot actually do — and surfaces drift. It also HEAD-verifies every `Sitemap:` URL declared in the file. A `Sitemap:` line pointing at a 404 quietly defeats the whole point, and is easy to ship after a migration.

The `domain-robots` endpoint returns `effectiveRulesByCrawler` (13 crawlers) and `sitemapReachability`. Use it to answer questions like "is GPTBot allowed to train on my content?" with a definitive yes/no instead of a guess.

bash
curl "https://api.edgedns.dev/v1/domain/robots?domain=example.com" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  | jq '.data.effectiveRulesByCrawler | {Googlebot, GPTBot, ClaudeBot, CCBot}'

# {
#   "Googlebot":  { "effective": "allow", "matchedBlock": "User-agent: *" },
#   "GPTBot":     { "effective": "disallow-all", "matchedBlock": "User-agent: GPTBot" },
#   "ClaudeBot":  { "effective": "allow", "matchedBlock": "User-agent: *" },
#   "CCBot":      { "effective": "allow", "matchedBlock": "User-agent: *" }
# }

Sitemap URL sampling and lastmod accuracy

A sitemap that exists is not the same as a sitemap that works. The two failure modes that hide in plain sight: listed URLs that have since died, and `<lastmod>` dates that lie.

Most sitemaps are generated by a build step or a CMS hook. Once shipped, nobody re-validates them. Pages get deleted, slugs change, CMS migrations rename routes — and the sitemap quietly accumulates 404s. A serious audit samples real URLs from the sitemap (typically a few percent, randomly) and fetches each one to confirm it still returns 200. The sample size scales with sitemap size; the broken-URL rate scales with site age.

The `<lastmod>` field has a separate problem: many CMSes set it to "now" on every build, regardless of whether the page actually changed. Google has been explicitly devaluing inaccurate `lastmod` since 2023 because they got tired of being lied to. A deep audit cross-references the claimed `lastmod` against signals from the actual page — the Last-Modified HTTP header, content-freshness markers in the body — and flags sitemaps that systematically lie.

This check is tier-gated to Pro+ because it issues real HTTP requests to every sampled URL. On the `domain-sitemap` endpoint, pass `validateUrls=true`. The endpoint also surfaces sitemap extensions — the `<image:image>`, `<video:video>`, and `<news:news>` blocks that let you flag content for the corresponding Google verticals. Most sites with an image library, a video catalog, or a news section have never added the extensions, and miss the dedicated discovery channels entirely.

Rich-results eligibility and strategic schema gaps

Structured data validation usually stops at "is the JSON-LD well-formed." That is a useful baseline and a terrible audit. The real question is eligibility: given the schema this page declares, would Google actually generate a rich result, and which fields are missing or mismatched in a way that disqualifies it?

Each Google rich-result type — Product cards, Recipe cards, FAQ accordions, Event boxes, Article enhancements — has a specific required-fields list documented in Google's rich-results gallery. A `Product` block missing `offers`, `aggregateRating`, or `review` is a `Product` block that produces no rich result, no matter how clean the JSON. A deep audit checks each declared type against its eligibility list and tells you which rich-result formats this page actually qualifies for.

Then there is the question of strategic gaps: the high-leverage schema types most sites should have but don't. The two most common are `BreadcrumbList` (powers the breadcrumb trail in mobile search results, lifts CTR a few percent site-wide) and `Organization` with `sameAs` links (powers the knowledge-panel sidebar on branded queries). A homepage `Organization` block and per-template `BreadcrumbList` are two of the highest-ROI structured-data investments any site can make.

The `domain-structured-data` endpoint returns both under `richResultsEligibility.eligibleTypes` and `strategicGaps`.

Composing it all into one audit pass

Each of the checks above is a single endpoint call. The full deep audit composes them in one pass — typically against a representative URL from each major template (homepage, product/category, article, landing page). For a CI gate, run it on the staging URL of the page you are about to deploy. For a quarterly portfolio audit, run it against every domain you manage.

javascript
// Deep technical SEO audit — one pass, ~5 seconds end-to-end
const domain = 'example.com';
const url = 'https://example.com/products/blue-widget';
const headers = { Authorization: 'Bearer YOUR_API_KEY' };

const [seoAudit, readiness, meta, canonical, robots, sitemap, schema] =
  await Promise.all([
    fetch(`https://api.edgedns.dev/v1/domain/seo-audit?url=${url}`, { headers }),
    fetch(`https://api.edgedns.dev/v1/domain/search-readiness?url=${url}`, { headers }),
    fetch(`https://api.edgedns.dev/v1/domain/meta?url=${url}&validateImages=true`, { headers }),
    fetch(`https://api.edgedns.dev/v1/domain/canonical?url=${url}&validateReciprocity=true`, { headers }),
    fetch(`https://api.edgedns.dev/v1/domain/robots?domain=${domain}`, { headers }),
    fetch(`https://api.edgedns.dev/v1/domain/sitemap?domain=${domain}&validateUrls=true`, { headers }),
    fetch(`https://api.edgedns.dev/v1/domain/structured-data?url=${url}`, { headers }),
  ].map((p) => p.then((r) => r.json())));

const issues = [];
if (seoAudit.data.title?.widthAnalysis?.willTruncate)
  issues.push(`Title truncates at ${seoAudit.data.title.widthAnalysis.pixelWidth}px`);
if (readiness.data.softFourOhFour?.detected) issues.push('Soft 404 detected');
if (readiness.data.mixedContent?.issues?.length) issues.push('Mixed content present');
for (const m of canonical.data.hreflangReciprocity?.missingReturnLinks ?? [])
  issues.push(`Hreflang ${m.lang} has no return link`);
if (robots.data.effectiveRulesByCrawler?.Googlebot?.effective === 'disallow-all')
  issues.push('Googlebot disallow-all in robots.txt');
for (const b of sitemap.data.sampleValidation?.broken ?? [])
  issues.push(`Sitemap URL ${b.url} returned ${b.status}`);
for (const gap of schema.data.strategicGaps ?? [])
  issues.push(`Missing strategic schema: ${gap}`);

if (issues.length) {
  console.error('Deep SEO audit FAILED:', issues);
  process.exit(1);
}
console.log('Deep SEO audit PASSED');

What unlocks at Developer+ and Pro+

Most of the checks above run on the free tier — title pixel-width, soft-404 and mixed-content detection, per-crawler robots resolution, rich-results eligibility, strategic schema gaps. Two checks are tier-gated because they do meaningful crawling work:

  • Developer+: `validateReciprocity` on `domain-canonical`. The endpoint fetches every hreflang alternate (typically 4–20 URLs per page) and audits their return links.

  • Pro+: `validateUrls` on `domain-sitemap`. The endpoint samples and HEAD-checks listed sitemap URLs. Sample size scales with sitemap size.

If you operate an international site, the Developer+ tier is the one that pays for itself the fastest. If you operate a content site with 10k+ pages and frequent sitemap regeneration, the Pro+ tier is the one. For everything else, the free tier already runs the rest of the audit. The full pricing ladder is on the plans page.

Glossary

Soft 404 — a page returning HTTP 200 whose body indicates the page was not found. Google detects the pattern and drops the page from the index.

Mixed content — an HTTPS page that loads sub-resources (images, scripts, styles) over plain HTTP. Modern browsers block the sub-resources silently.

Hreflang reciprocity — the rule that every hreflang alternate must declare a return link. Missing return links break hreflang silently.

Title pixel-width — the rendered width of a title tag in mobile SERP font, measured in pixels. Truncation happens at the pixel threshold, not the character count.

Effective rule (in robots.txt) — the rule a specific crawler actually applies, after the `robots.txt` specificity algorithm resolves which `User-agent` block matches.

Rich-results eligibility — whether a page's structured data meets the required-fields list for a given Google rich-result format. A schema block can be valid JSON-LD and still not qualify.

Strategic schema gaps — high-leverage schema types most sites lack: `BreadcrumbList` on internal templates, `Organization` with `sameAs` on the homepage.

`lastmod` accuracy — whether the `<lastmod>` date claimed in a sitemap entry matches signals from the actual page. Google has been devaluing inaccurate `lastmod` since 2023.

Need Programmatic Access?

Automate domain intelligence with 100+ API endpoints and a free MCP server for AI integration.