GEO Audit Checklist: 25 Checks for Generative Engine Optimization

By the SiteBeat team · Updated 3 July 2026 · 7 min read

Quick answer: a GEO audit checks whether AI answer engines can crawl, read, extract and trust your website. This checklist covers all 25 essential checks across four pillars — crawler access, structure & extractability, authority & freshness, and structured data. Every check is deterministic: it's either true of your site or it isn't, no AI judgement required.

Pillar 1 — Crawler access (can AI fetch you?)

  1. robots.txt allows citation crawlersOAI-SearchBot, PerplexityBot, Claude-SearchBot, Googlebot are not disallowed.
  2. No blanket disallow — no User-agent: * + Disallow: / left over from staging.
  3. Content is server-rendered — your main content appears in the raw HTML. AI crawlers do not run JavaScript.
  4. No noindex — neither a meta robots tag nor an X-Robots-Tag header excludes the page.
  5. No bot-challenge wall — crawlers get your content, not a "checking your browser" interstitial.
  6. HTTPS — with a valid certificate.
  7. XML sitemap — exists and is referenced from robots.txt.
  8. No AI opt-out directives you didn't intendnoai meta tags, tdm-reservation, or Content-Usage headers explicitly tell AI not to use your content.

Pillar 2 — Structure & extractability (can AI quote you?)

  1. Exactly one H1 — the page's topic anchor.
  2. No skipped heading levels — H2 → H4 breaks the outline AI uses to segment content.
  3. Semantic landmarks<main> or <article> separates content from navigation.
  4. Question-style subheadings — H2s phrased as the questions users ask.
  5. Answer-first opening — a 40–80 word direct answer at the top, liftable verbatim.
  6. Extractable chunk sizes — paragraphs of 40–100 self-contained words, not walls of text.
  7. Lists and tables — structured formats are cited disproportionately often.
  8. Image alt text — AI can't read or cite an image it can't describe.

Pillar 3 — Authority & freshness (can AI trust you?)

  1. Machine-readable datesdatePublished/dateModified in schema, kept current.
  2. Outbound citations — links to authoritative sources (.gov, .edu, primary research) correlate with ~30% more AI citations (Princeton GEO study).
  3. Statistics with attribution — concrete numbers are the most quotable sentences on any page.
  4. Author attribution — a byline backed by a schema Person with a profile link.
  5. Trust pages — About, Contact, Privacy, Terms all exist and are linked.
  6. No keyword stuffing — over-repetition measurably reduces AI visibility.

Pillar 4 — Structured data (can AI identify you?)

  1. Valid JSON-LD — present and parseable (malformed schema is ignored entirely).
  2. Organization schema with sameAs — plus logo and contact point, so AI can ground your brand as an entity.
  3. Content-type schema — Article (with author, dates, image, publisher) on posts; FAQ schema on question pages; complete Open Graph and Twitter Card tags for link unfurling.

How do you run all 25 checks at once?

Manually, this checklist takes a couple of hours per site. A SiteBeat scan runs every check on this list (plus ~25 more) in about 20 seconds and returns an AI readiness grade from A+ to F, a per-crawler access matrix, the exact text AI extracts from your pages, and — in the full audit — copy-paste fixes: corrected robots.txt, generated llms.txt, and prefilled JSON-LD. The full audit is a one-time €29; see a sample report.

Run the full 50-check GEO audit

Free scan · grade in 20 seconds · full audit €29 one-time

Audit my site →

Related guides