AEO Audit Process — Standard Operating Procedure

TWO TIERS — Quick-Scan vs Full Audit

AEO Quick-ScanFull AEO Audit
PurposeCold email customization, bulk intelPer-lead sales deliverable
Cost$0 (pure HTTP checks)~$0.80-1.50/audit (Google Places + Lighthouse)
Speed~3 sec/site, 22K sites in ~2hrs8-12 min/audit
Signals12 programmatic checks (robots, schema, meta, HTTPS, etc.)Full AI-powered: competitor comparison, rank map, PageSpeed, website health
Score0-100 from HTTP/HTML parsing0-100 from AI + crawling
WhenALL leads automatically (every new lead batch)Interested/warm leads only
Script/root/lead-machine/scripts/aeo-quick-scan.js/root/lead-machine/scripts/aeo-audit.js
OutputCSV + HTML summary reportsFull branded HTML report per lead

RULE: Run Quick-Scan on EVERY lead/principal. Use results for email personalization.

  • Score < 30 → “your online presence needs serious help” angle
  • Score 30-60 → “you’re missing key signals competitors have” angle
  • Score 60+ → “you’re close but leaving money on the table” angle
  • No schema → “AI search can’t find you” angle
  • No robots.txt → “search engines can’t properly crawl you” angle
  • Blocking AI bots → “you’re literally telling ChatGPT to ignore you”
  • Missing FAQPage schema → “Google can’t show your FAQs in search results”
  • Missing Review schema → “your 5-star reviews aren’t showing in search”
  • Missing Person schema → “AI doesn’t know WHO runs your agency”

Quick-Scan Pass 2 — Content Extraction

  • Script: /root/lead-machine/scripts/aeo-quick-scan-pass2.js
  • What it does: Fetches actual robots.txt + JSON-LD schema content, analyzes AI bot blocking
  • Output: robots-content.jsonl, schema-content.jsonl, principals-aeo-enriched.csv
  • ALWAYS run pass 2 after pass 1 — adds AI bot blocking + specific missing schema types

Enrichment Pipeline Order (for new lead batches)

  1. Google Maps scrape (queue-gmaps-us-zips.js → google-maps-worker.js)
  2. Website enrichment (principal-discovery.js → website-enrich-worker.js)
  3. Email guessing/verification (auto-chained from website enrichment)
  4. Principal list generation (extract-valid-principals.js)
  5. AEO Quick-Scan Pass 1 (aeo-quick-scan.js) — 12 signal checks, 0-100 score
  6. AEO Quick-Scan Pass 2 (aeo-quick-scan-pass2.js) — robots/schema content + AI bot analysis
  7. Copy generation (copy-generate worker) — uses AEO data for personalization

AEO Quick-Scan Benchmark Stats (Mar 13, 2026 — 20K insurance agencies)

  • Avg AEO Score: 73.2/100 (loaded sites only)
  • HTTPS: 99.8% | robots.txt: 74.5% | sitemap: 66.1% | Schema: 53.6%
  • Meta desc: 53.2% | Title: 81% | H1: 62% | Viewport: 81.6% | OG: 64.2%
  • AI bot blocking: 3.4% (mostly GPTBot 377, CCBot 254, Google-Extended 254)
  • Missing FAQPage: 98.4% | Missing Review schema: 98.1% | Missing Person: 94.9%

Full Audit — Quick Reference

  • Script: /root/lead-machine/scripts/aeo-audit.js (VPS: [VPS-IP])
  • Output: /root/lead-machine/output/aeo-audits/ (JSON + email HTML + report HTML)
  • Deploy audit to: /var/www/reports/{domain-slug}-aeo.html
  • URL format: https://reports.strategicaiarchitects.com/{domain-slug}-aeo.html

MANDATORY Command Template

ssh root@[VPS-IP] "cd /root/lead-machine && node scripts/aeo-audit.js \
  --url https://DOMAIN.COM \
  --name 'LEAD NAME' \
  --business 'BUSINESS NAME' \
  --industry insurance \
  --location 'CITY ST' \
  --lat LATITUDE \
  --lng LONGITUDE \
  --lead-rating RATING \
  --lead-reviews REVIEW_COUNT \
  --rank-category 'SEARCH CATEGORY' \
  --rank-grid 5 \
  --rank-radius 5 \
  --competitors 5 \
  --monthly-searches 120 \
  --client-value 1500"

CRITICAL: Coordinate Resolution (Mar 11 fix)

  • NEVER rely on Nominatim city geocoding alone — it returns the geographic center of the city, which for large cities (Houston, Dallas, etc.) can be 15+ miles from the business’s actual location.
  • aeo-audit.js now auto-resolves coordinates via Google Places API: After Nominatim geocoding, it searches Google Places for the business name + location, gets the Place Details, and uses the real lat/lng. This happens automatically when GOOGLE_PLACES_KEY is set.
  • If auto-resolve fails or for manual override: Pass --lat and --lng with the business’s actual coordinates. Find them by searching Google Places for the business name.
  • To find business coordinates: Search “Business Name City” on Google Maps, right-click → “What’s here?” to get lat/lng. Or use Google Places API: searchGooglePlaces("Business Name City", approxLat, approxLng, key, 50000).
  • Example (Wise Insurance): Nominatim returned 29.79, -95.29 (east Houston). Actual location: 29.705, -95.568 (southwest Houston, 17 miles away). Rank map went from 0% to 48% top-3 coverage after fixing coordinates.

Optional Revenue Data Flags (for Cost of Waiting section)

  • --monthly-searches — Real monthly search volume for the keyword+location (from Google Keyword Planner, Semrush, etc.)
  • --client-value — Lifetime value of one client in dollars (e.g., Medicare client = $1000-1500)
  • --monthly-revenue — Lead’s current monthly revenue (for context)
  • If these are provided: Shows data-driven revenue calculation (searches x 25% capture x 10% close x LTV)
  • If NOT provided: Shows qualitative copy only — NO fabricated numbers. Never guess.

NEVER FORGET These Flags

  • --location — MANDATORY. Enables competitor comparison AND rank map AND geocoding.
  • --lat / --lng — Business’s ACTUAL coordinates. Auto-resolved via Google Places API if GOOGLE_PLACES_KEY is set. Pass manually as fallback.
  • --lead-rating / --lead-reviews — Pass Google review data for the lead. Look up on Google Maps or their site’s JSON-LD schema first.
  • --rank-category — The Google Maps search category (e.g., “medicare agent”, “health insurance agent”, “insurance agent”). Use the lead’s primary H2/service keyword. This MUST match what customers would search on Google Maps.
  • --rank-grid 5 --rank-radius 5 — Standard 5x5 grid, 5 mile radius. Can increase to 7x7 / 10mi for rural areas.
  • --competitors 5 — Find and audit 5 competitors.

MANDATORY Report Structure (EVERY audit must have ALL of these)

Every single audit report MUST contain this exact structure — no exceptions:

  1. AEO Score — the big number out of 100, with category breakdown
  2. AI Visibility — how the business appears in AI search (ChatGPT, Perplexity, etc.)
  3. Key Findings — top 3-5 findings generated by Haiku
  4. Detailed Scoring Breakdown — bars for Content, Q&A, E-E-A-T, Schema, Technical
  5. Local Rank Map — interactive Leaflet map with color-coded grid showing Google Maps ranking at each point. MUST show average rank and top-3 coverage percentage.
  6. Competitor Comparison — ALWAYS 5 competitors. Audit their AEO scores. Include the lead’s OWN website in the comparison so they can see where they stand.
  7. Google Reviews Comparison — lead’s rating/reviews vs competitors
  8. Website Health Audit — crawl 20 pages, check broken links/images/meta/SSL
  9. PageSpeed — BOTH mobile AND desktop Lighthouse scores
  10. Cost of Inaction — revenue impact calculation (if —monthly-searches and —client-value provided)
  11. CTA — book strategy call button

Step-by-Step Process

1. Research the Lead (before running audit)

  • Google their business: Find rating, review count, exact address, city
  • Check their website: Does it have JSON-LD schema? What industry keywords?
  • Determine search category: What would someone search on Google Maps to find them?

2. Run the Audit

  • Use the command template above with ALL flags filled in
  • Takes ~8-12 minutes (scraping + scoring + Gemini/SearXNG competitors + rank map + Lighthouse)
  • Run in background: add 2>&1 and use run_in_background

3. Deploy the Report

# Copy report to web root
ssh root@[VPS-IP] "cp /root/lead-machine/output/aeo-audits/{slug}-{date}-report.html /var/www/reports/{domain-slug}-aeo.html"

4. Verify All Sections Present

Check the deployed report for ALL of these sections:

  • Section 01: AEO Score (the big number)
  • Section 02: AI Visibility / AI Search Answers
  • Section 03: Key Findings
  • Section 04: Detailed Breakdown
  • Section 05: Competitor Comparison (requires —location + working search)
  • Section 05b: Google Reviews comparison (requires —lead-rating/—lead-reviews)
  • Rank Map (interactive Leaflet map, requires —location + geocoding)
  • Website Health Audit (auto-runs, 20 pages)
  • PageSpeed (BOTH mobile AND desktop Lighthouse scores)
  • Cost of Inaction
  • Section 06: CTA

Always in this format:

Audit: https://reports.strategicaiarchitects.com/{domain-slug}-aeo.html

Score Stability Rules (CRITICAL)

The AEO score MUST be deterministic. Same site = same score. Changes implemented:

  1. URLs sorted alphabetically before slicing to MAX_PAGES — deterministic page selection
  2. networkidle0 (wait for ALL network to settle, not just 2 connections)
  3. 4-second JS render wait (not 2s) — more content captured consistently
  4. 30-second page timeout (not 15s) — fewer timeout-based page skips
  5. Retry-once on page scrape failure — fewer random failures
  6. 15-second sitemap/robots check (not 8s) — consistent technical score
  7. Scoring input fingerprint logged[Inputs] line shows exact data feeding the score

If the score changes between runs on the same site, check the [Inputs] log line. The word count, page count, heading counts, and link counts should be identical. If not, investigate which page loaded differently.

Competitor Filtering Rules

  • skipDomains: ONLY blocks true directories (Yelp, YellowPages, BBB), social media, news, gov, medical info sites. NEVER block insurance companies — even national ones like The Zebra, Insurify, Progressive, State Farm. If they rank locally, they ARE competitors.
  • rejectNames: Blocks generic page titles like “find the best”, “homepage”, “insurance agents in”, “directory”. Does NOT block company names that happen to contain common words.
  • RULE: If it’s an actual business that sells insurance, it’s a competitor. Period. Don’t over-filter.

Known Issues

  • Gemini API geo-blocked on VPSUser location is not supported. CF Worker proxy at /root/lead-machine/workers/gemini-proxy.js needs to be deployed to Cloudflare to fix.
  • SearXNG unreliable — Brave/DuckDuckGo/Startpage get rate-limited/CAPTCHA’d. Restart with docker restart searxng may temporarily help. When all engines are down, competitor search and rank map return empty.
  • FIXED (Mar 11): Rank map 0% due to wrong coordinates — Nominatim geocodes “Houston TX” to city center (29.79, -95.29), but businesses can be 15+ miles away. aeo-audit.js now auto-resolves business coordinates via Google Places API after geocoding. Falls back to Nominatim if Places lookup fails. Pass --lat/--lng as manual override.
  • Google Places API working — GOOGLE_PLACES_KEY set in /root/lead-machine/.env. Rank map uses Places Text Search API. 0.80/audit for rank map grid). Free $200/mo credit covers ~250 audits.
  • sed compression bug — NEVER use sed to inject multi-line JavaScript on VPS. It compresses to single lines. If a line starts with //, everything after becomes a comment. ALWAYS use Python to modify JS files.

Scoring Categories (100 points total)

CategoryMaxWhat it measures
Content25Word count, page count, long-form pages, headings, internal links
Q&A20FAQ sections, question headings, answer patterns, definitions
E-E-A-T20Author info, credentials, contact, privacy, testimonials, social
Schema15JSON-LD, LocalBusiness, Person, FAQPage, Review, Breadcrumb
Technical20HTTPS, mobile viewport, meta descriptions, OG, canonical, sitemap, robots, alt text

Template Files

  • Report template: /root/lead-machine/templates/aeo-report-v1.html
  • Email template: /root/lead-machine/templates/aeo-audit-email.html
  • CSS variables in template ensure dark text on light backgrounds

Supporting Modules

  • /root/lead-machine/scripts/google-reviews-section.js — Google Reviews comparison
  • /root/lead-machine/scripts/rank-map-embed.js — Rank map visualization (Leaflet + SVG fallback)
  • /root/lead-machine/scripts/rank-map.js — Rank map data generation (search grid)
  • /root/lead-machine/scripts/site-audit.js — Website health audit + PageSpeed
  • /root/lead-machine/scripts/llm-access-check.js — LLM accessibility checker (see below)

LLM Accessibility Check v2 (Mar 14-15, 2026)

Tests whether ChatGPT, Gemini, Perplexity, and Claude can actually access/crawl a website. 100% free — just HTTP requests with different User-Agent strings. Zero API cost.

How It Works (v2 — with JS dependency detection)

  1. Fetches raw HTML with browser User-Agent (this is what bots see — NO JS execution)
  2. PRIMARY CHECK: Extracts text content from raw HTML, strips tags, counts chars
    • If <100 chars → empty/JS shell → ALL bots are blind (regardless of status code)
    • If <300 chars + SPA shell patterns → JS-dependent → ALL bots blind
    • If <500 chars + many scripts + bundle filenames → JS-heavy SPA → ALL bots blind
  3. Detects platform from HTML patterns (GHL, Wix, Squarespace, React, Angular, etc.)
  4. If site HAS content in raw HTML → checks per-bot access (robots.txt, HTTP 403, Cloudflare, differential content)

Detection Methods

  • JS dependency (v2 — catches ~16% of sites!): Raw HTML has <500 chars of text content. LLM bots can’t execute JS, so they see an empty page. This is the BIGGEST issue — 5x more common than robots.txt blocking.
  • robots.txt: Checks for Disallow: / for each bot’s user-agent
  • HTTP 403/401/503: Bot gets blocked status while browser gets 200
  • Cloudflare challenge: Detects cf-mitigated: challenge header, “Just a moment…” title, window._cf_chl_opt
  • Differential content: Bot gets <15% of browser’s content (served a different page)

Scripts & Integration Points

  • Core module: scripts/llm-access-check.jscheckLlmAccess(url) returns per-bot accessibility + jsAnalysis + platform + summary
    • New exports: extractTextContent(), analyzeJsDependency(), detectPlatform()
    • Returns jsAnalysis: { jsDependent, reason, textLength, spaShell, scriptCount }
    • Returns platform: { id, name } (ghl, wix, squarespace, react-spa, wordpress, etc.)
  • Bulk scanner (Pass 3): scripts/aeo-llm-check.js — Runs on all principals, outputs principals-llm-enriched.csv
    • Concurrency: 15, ~42 min for 19K domains
    • Output columns: chatgpt_can_see, gemini_can_see, perplexity_can_see, claude_can_see, llm_visible_count, llm_block_method, llm_js_dependent, llm_platform, llm_text_length, llm_summary
    • Summary JSON now includes js_dependency stats and platforms breakdown
  • Self-serve audit tool: Integrated into api-audit-routes.js runFastAudit(). Results shown in FREE tier (before email gate).
    • 4 cards (green check / red X per AI engine)
    • JS-dependent sites show “Blind” instead of “Blocked”, with “Invisible (JS-only site)” reason
    • Special alert box for JS-dependent: explains JavaScript rendering issue, names the platform
    • This is the killer cold email hook
  • Email gate now collects: email + phone + consent checkbox (required)
  • Full audit report: TODO — add LLM access section to aeo-audit.js output

Enrichment Pipeline Order (updated)

  1. Google Maps scrape
  2. Website enrichment
  3. Email guessing/verification
  4. Principal list generation
  5. AEO Quick-Scan Pass 1 (12 signal checks)
  6. AEO Quick-Scan Pass 2 (robots/schema content + AI bot analysis)
  7. AEO Quick-Scan Pass 3 (LLM accessibility — actual HTTP checks with bot UAs)
  8. Copy generation (uses AEO + LLM data for personalization)

Cold Email Angles (LLM-specific)

  • 0/4 visible: “AI search engines literally CANNOT see your website. Even if your AEO score is decent, it doesn’t matter if ChatGPT and Gemini can’t read your content.”
  • Some blocked: “Google’s Gemini AI can’t access your site, which means it can’t recommend you. Your competitors who ARE visible are getting those referrals.”
  • Cloudflare blocking: “Your Cloudflare security settings are blocking AI search engines. Easy fix, huge impact.”
  • JS-only: “Your website loads content with JavaScript, which AI bots can’t execute. They see an empty page.”
  • robots.txt: “Your robots.txt file is telling ChatGPT to go away. Literally.”