AEO Audit Process — Standard Operating Procedure
TWO TIERS — Quick-Scan vs Full Audit
| AEO Quick-Scan | Full AEO Audit | |
|---|---|---|
| Purpose | Cold email customization, bulk intel | Per-lead sales deliverable |
| Cost | $0 (pure HTTP checks) | ~$0.80-1.50/audit (Google Places + Lighthouse) |
| Speed | ~3 sec/site, 22K sites in ~2hrs | 8-12 min/audit |
| Signals | 12 programmatic checks (robots, schema, meta, HTTPS, etc.) | Full AI-powered: competitor comparison, rank map, PageSpeed, website health |
| Score | 0-100 from HTTP/HTML parsing | 0-100 from AI + crawling |
| When | ALL leads automatically (every new lead batch) | Interested/warm leads only |
| Script | /root/lead-machine/scripts/aeo-quick-scan.js | /root/lead-machine/scripts/aeo-audit.js |
| Output | CSV + HTML summary reports | Full branded HTML report per lead |
RULE: Run Quick-Scan on EVERY lead/principal. Use results for email personalization.
- Score < 30 → “your online presence needs serious help” angle
- Score 30-60 → “you’re missing key signals competitors have” angle
- Score 60+ → “you’re close but leaving money on the table” angle
- No schema → “AI search can’t find you” angle
- No robots.txt → “search engines can’t properly crawl you” angle
- Blocking AI bots → “you’re literally telling ChatGPT to ignore you”
- Missing FAQPage schema → “Google can’t show your FAQs in search results”
- Missing Review schema → “your 5-star reviews aren’t showing in search”
- Missing Person schema → “AI doesn’t know WHO runs your agency”
Quick-Scan Pass 2 — Content Extraction
- Script:
/root/lead-machine/scripts/aeo-quick-scan-pass2.js - What it does: Fetches actual robots.txt + JSON-LD schema content, analyzes AI bot blocking
- Output:
robots-content.jsonl,schema-content.jsonl,principals-aeo-enriched.csv - ALWAYS run pass 2 after pass 1 — adds AI bot blocking + specific missing schema types
Enrichment Pipeline Order (for new lead batches)
- Google Maps scrape (queue-gmaps-us-zips.js → google-maps-worker.js)
- Website enrichment (principal-discovery.js → website-enrich-worker.js)
- Email guessing/verification (auto-chained from website enrichment)
- Principal list generation (extract-valid-principals.js)
- AEO Quick-Scan Pass 1 (aeo-quick-scan.js) — 12 signal checks, 0-100 score
- AEO Quick-Scan Pass 2 (aeo-quick-scan-pass2.js) — robots/schema content + AI bot analysis
- Copy generation (copy-generate worker) — uses AEO data for personalization
AEO Quick-Scan Benchmark Stats (Mar 13, 2026 — 20K insurance agencies)
- Avg AEO Score: 73.2/100 (loaded sites only)
- HTTPS: 99.8% | robots.txt: 74.5% | sitemap: 66.1% | Schema: 53.6%
- Meta desc: 53.2% | Title: 81% | H1: 62% | Viewport: 81.6% | OG: 64.2%
- AI bot blocking: 3.4% (mostly GPTBot 377, CCBot 254, Google-Extended 254)
- Missing FAQPage: 98.4% | Missing Review schema: 98.1% | Missing Person: 94.9%
Full Audit — Quick Reference
- Script:
/root/lead-machine/scripts/aeo-audit.js(VPS: [VPS-IP]) - Output:
/root/lead-machine/output/aeo-audits/(JSON + email HTML + report HTML) - Deploy audit to:
/var/www/reports/{domain-slug}-aeo.html - URL format:
https://reports.strategicaiarchitects.com/{domain-slug}-aeo.html
MANDATORY Command Template
ssh root@[VPS-IP] "cd /root/lead-machine && node scripts/aeo-audit.js \
--url https://DOMAIN.COM \
--name 'LEAD NAME' \
--business 'BUSINESS NAME' \
--industry insurance \
--location 'CITY ST' \
--lat LATITUDE \
--lng LONGITUDE \
--lead-rating RATING \
--lead-reviews REVIEW_COUNT \
--rank-category 'SEARCH CATEGORY' \
--rank-grid 5 \
--rank-radius 5 \
--competitors 5 \
--monthly-searches 120 \
--client-value 1500"CRITICAL: Coordinate Resolution (Mar 11 fix)
- NEVER rely on Nominatim city geocoding alone — it returns the geographic center of the city, which for large cities (Houston, Dallas, etc.) can be 15+ miles from the business’s actual location.
- aeo-audit.js now auto-resolves coordinates via Google Places API: After Nominatim geocoding, it searches Google Places for the business name + location, gets the Place Details, and uses the real lat/lng. This happens automatically when GOOGLE_PLACES_KEY is set.
- If auto-resolve fails or for manual override: Pass
--latand--lngwith the business’s actual coordinates. Find them by searching Google Places for the business name. - To find business coordinates: Search “Business Name City” on Google Maps, right-click → “What’s here?” to get lat/lng. Or use Google Places API:
searchGooglePlaces("Business Name City", approxLat, approxLng, key, 50000). - Example (Wise Insurance): Nominatim returned 29.79, -95.29 (east Houston). Actual location: 29.705, -95.568 (southwest Houston, 17 miles away). Rank map went from 0% to 48% top-3 coverage after fixing coordinates.
Optional Revenue Data Flags (for Cost of Waiting section)
--monthly-searches— Real monthly search volume for the keyword+location (from Google Keyword Planner, Semrush, etc.)--client-value— Lifetime value of one client in dollars (e.g., Medicare client = $1000-1500)--monthly-revenue— Lead’s current monthly revenue (for context)- If these are provided: Shows data-driven revenue calculation (searches x 25% capture x 10% close x LTV)
- If NOT provided: Shows qualitative copy only — NO fabricated numbers. Never guess.
NEVER FORGET These Flags
--location— MANDATORY. Enables competitor comparison AND rank map AND geocoding.--lat/--lng— Business’s ACTUAL coordinates. Auto-resolved via Google Places API if GOOGLE_PLACES_KEY is set. Pass manually as fallback.--lead-rating/--lead-reviews— Pass Google review data for the lead. Look up on Google Maps or their site’s JSON-LD schema first.--rank-category— The Google Maps search category (e.g., “medicare agent”, “health insurance agent”, “insurance agent”). Use the lead’s primary H2/service keyword. This MUST match what customers would search on Google Maps.--rank-grid 5 --rank-radius 5— Standard 5x5 grid, 5 mile radius. Can increase to 7x7 / 10mi for rural areas.--competitors 5— Find and audit 5 competitors.
MANDATORY Report Structure (EVERY audit must have ALL of these)
Every single audit report MUST contain this exact structure — no exceptions:
- AEO Score — the big number out of 100, with category breakdown
- AI Visibility — how the business appears in AI search (ChatGPT, Perplexity, etc.)
- Key Findings — top 3-5 findings generated by Haiku
- Detailed Scoring Breakdown — bars for Content, Q&A, E-E-A-T, Schema, Technical
- Local Rank Map — interactive Leaflet map with color-coded grid showing Google Maps ranking at each point. MUST show average rank and top-3 coverage percentage.
- Competitor Comparison — ALWAYS 5 competitors. Audit their AEO scores. Include the lead’s OWN website in the comparison so they can see where they stand.
- Google Reviews Comparison — lead’s rating/reviews vs competitors
- Website Health Audit — crawl 20 pages, check broken links/images/meta/SSL
- PageSpeed — BOTH mobile AND desktop Lighthouse scores
- Cost of Inaction — revenue impact calculation (if —monthly-searches and —client-value provided)
- CTA — book strategy call button
Step-by-Step Process
1. Research the Lead (before running audit)
- Google their business: Find rating, review count, exact address, city
- Check their website: Does it have JSON-LD schema? What industry keywords?
- Determine search category: What would someone search on Google Maps to find them?
2. Run the Audit
- Use the command template above with ALL flags filled in
- Takes ~8-12 minutes (scraping + scoring + Gemini/SearXNG competitors + rank map + Lighthouse)
- Run in background: add
2>&1and userun_in_background
3. Deploy the Report
# Copy report to web root
ssh root@[VPS-IP] "cp /root/lead-machine/output/aeo-audits/{slug}-{date}-report.html /var/www/reports/{domain-slug}-aeo.html"4. Verify All Sections Present
Check the deployed report for ALL of these sections:
- Section 01: AEO Score (the big number)
- Section 02: AI Visibility / AI Search Answers
- Section 03: Key Findings
- Section 04: Detailed Breakdown
- Section 05: Competitor Comparison (requires —location + working search)
- Section 05b: Google Reviews comparison (requires —lead-rating/—lead-reviews)
- Rank Map (interactive Leaflet map, requires —location + geocoding)
- Website Health Audit (auto-runs, 20 pages)
- PageSpeed (BOTH mobile AND desktop Lighthouse scores)
- Cost of Inaction
- Section 06: CTA
5. Provide Links to Mike
Always in this format:
Audit: https://reports.strategicaiarchitects.com/{domain-slug}-aeo.html
Score Stability Rules (CRITICAL)
The AEO score MUST be deterministic. Same site = same score. Changes implemented:
- URLs sorted alphabetically before slicing to MAX_PAGES — deterministic page selection
- networkidle0 (wait for ALL network to settle, not just 2 connections)
- 4-second JS render wait (not 2s) — more content captured consistently
- 30-second page timeout (not 15s) — fewer timeout-based page skips
- Retry-once on page scrape failure — fewer random failures
- 15-second sitemap/robots check (not 8s) — consistent technical score
- Scoring input fingerprint logged —
[Inputs]line shows exact data feeding the score
If the score changes between runs on the same site, check the [Inputs] log line. The word count, page count, heading counts, and link counts should be identical. If not, investigate which page loaded differently.
Competitor Filtering Rules
- skipDomains: ONLY blocks true directories (Yelp, YellowPages, BBB), social media, news, gov, medical info sites. NEVER block insurance companies — even national ones like The Zebra, Insurify, Progressive, State Farm. If they rank locally, they ARE competitors.
- rejectNames: Blocks generic page titles like “find the best”, “homepage”, “insurance agents in”, “directory”. Does NOT block company names that happen to contain common words.
- RULE: If it’s an actual business that sells insurance, it’s a competitor. Period. Don’t over-filter.
Known Issues
- Gemini API geo-blocked on VPS —
User location is not supported. CF Worker proxy at/root/lead-machine/workers/gemini-proxy.jsneeds to be deployed to Cloudflare to fix. - SearXNG unreliable — Brave/DuckDuckGo/Startpage get rate-limited/CAPTCHA’d. Restart with
docker restart searxngmay temporarily help. When all engines are down, competitor search and rank map return empty. - FIXED (Mar 11): Rank map 0% due to wrong coordinates — Nominatim geocodes “Houston TX” to city center (29.79, -95.29), but businesses can be 15+ miles away. aeo-audit.js now auto-resolves business coordinates via Google Places API after geocoding. Falls back to Nominatim if Places lookup fails. Pass
--lat/--lngas manual override. - Google Places API working — GOOGLE_PLACES_KEY set in /root/lead-machine/.env. Rank map uses Places Text Search API. 0.80/audit for rank map grid). Free $200/mo credit covers ~250 audits.
- sed compression bug — NEVER use
sedto inject multi-line JavaScript on VPS. It compresses to single lines. If a line starts with//, everything after becomes a comment. ALWAYS use Python to modify JS files.
Scoring Categories (100 points total)
| Category | Max | What it measures |
|---|---|---|
| Content | 25 | Word count, page count, long-form pages, headings, internal links |
| Q&A | 20 | FAQ sections, question headings, answer patterns, definitions |
| E-E-A-T | 20 | Author info, credentials, contact, privacy, testimonials, social |
| Schema | 15 | JSON-LD, LocalBusiness, Person, FAQPage, Review, Breadcrumb |
| Technical | 20 | HTTPS, mobile viewport, meta descriptions, OG, canonical, sitemap, robots, alt text |
Template Files
- Report template:
/root/lead-machine/templates/aeo-report-v1.html - Email template:
/root/lead-machine/templates/aeo-audit-email.html - CSS variables in template ensure dark text on light backgrounds
Supporting Modules
/root/lead-machine/scripts/google-reviews-section.js— Google Reviews comparison/root/lead-machine/scripts/rank-map-embed.js— Rank map visualization (Leaflet + SVG fallback)/root/lead-machine/scripts/rank-map.js— Rank map data generation (search grid)/root/lead-machine/scripts/site-audit.js— Website health audit + PageSpeed/root/lead-machine/scripts/llm-access-check.js— LLM accessibility checker (see below)
LLM Accessibility Check v2 (Mar 14-15, 2026)
Tests whether ChatGPT, Gemini, Perplexity, and Claude can actually access/crawl a website. 100% free — just HTTP requests with different User-Agent strings. Zero API cost.
How It Works (v2 — with JS dependency detection)
- Fetches raw HTML with browser User-Agent (this is what bots see — NO JS execution)
- PRIMARY CHECK: Extracts text content from raw HTML, strips tags, counts chars
- If <100 chars → empty/JS shell → ALL bots are blind (regardless of status code)
- If <300 chars + SPA shell patterns → JS-dependent → ALL bots blind
- If <500 chars + many scripts + bundle filenames → JS-heavy SPA → ALL bots blind
- Detects platform from HTML patterns (GHL, Wix, Squarespace, React, Angular, etc.)
- If site HAS content in raw HTML → checks per-bot access (robots.txt, HTTP 403, Cloudflare, differential content)
Detection Methods
- JS dependency (v2 — catches ~16% of sites!): Raw HTML has <500 chars of text content. LLM bots can’t execute JS, so they see an empty page. This is the BIGGEST issue — 5x more common than robots.txt blocking.
- robots.txt: Checks for Disallow: / for each bot’s user-agent
- HTTP 403/401/503: Bot gets blocked status while browser gets 200
- Cloudflare challenge: Detects
cf-mitigated: challengeheader, “Just a moment…” title,window._cf_chl_opt - Differential content: Bot gets <15% of browser’s content (served a different page)
Scripts & Integration Points
- Core module:
scripts/llm-access-check.js—checkLlmAccess(url)returns per-bot accessibility + jsAnalysis + platform + summary- New exports:
extractTextContent(),analyzeJsDependency(),detectPlatform() - Returns
jsAnalysis: { jsDependent, reason, textLength, spaShell, scriptCount } - Returns
platform: { id, name }(ghl, wix, squarespace, react-spa, wordpress, etc.)
- New exports:
- Bulk scanner (Pass 3):
scripts/aeo-llm-check.js— Runs on all principals, outputsprincipals-llm-enriched.csv- Concurrency: 15, ~42 min for 19K domains
- Output columns: chatgpt_can_see, gemini_can_see, perplexity_can_see, claude_can_see, llm_visible_count, llm_block_method, llm_js_dependent, llm_platform, llm_text_length, llm_summary
- Summary JSON now includes
js_dependencystats andplatformsbreakdown
- Self-serve audit tool: Integrated into
api-audit-routes.jsrunFastAudit(). Results shown in FREE tier (before email gate).- 4 cards (green check / red X per AI engine)
- JS-dependent sites show “Blind” instead of “Blocked”, with “Invisible (JS-only site)” reason
- Special alert box for JS-dependent: explains JavaScript rendering issue, names the platform
- This is the killer cold email hook
- Email gate now collects: email + phone + consent checkbox (required)
- Full audit report: TODO — add LLM access section to aeo-audit.js output
Enrichment Pipeline Order (updated)
- Google Maps scrape
- Website enrichment
- Email guessing/verification
- Principal list generation
- AEO Quick-Scan Pass 1 (12 signal checks)
- AEO Quick-Scan Pass 2 (robots/schema content + AI bot analysis)
- AEO Quick-Scan Pass 3 (LLM accessibility — actual HTTP checks with bot UAs)
- Copy generation (uses AEO + LLM data for personalization)
Cold Email Angles (LLM-specific)
- 0/4 visible: “AI search engines literally CANNOT see your website. Even if your AEO score is decent, it doesn’t matter if ChatGPT and Gemini can’t read your content.”
- Some blocked: “Google’s Gemini AI can’t access your site, which means it can’t recommend you. Your competitors who ARE visible are getting those referrals.”
- Cloudflare blocking: “Your Cloudflare security settings are blocking AI search engines. Easy fix, huge impact.”
- JS-only: “Your website loads content with JavaScript, which AI bots can’t execute. They see an empty page.”
- robots.txt: “Your robots.txt file is telling ChatGPT to go away. Literally.”