AEO Audit Process — Standard Operating Procedure

TWO TIERS — Quick-Scan vs Full Audit

	AEO Quick-Scan	Full AEO Audit
Purpose	Cold email customization, bulk intel	Per-lead sales deliverable
Cost	$0 (pure HTTP checks)	~$0.80-1.50/audit (Google Places + Lighthouse)
Speed	~3 sec/site, 22K sites in ~2hrs	8-12 min/audit
Signals	12 programmatic checks (robots, schema, meta, HTTPS, etc.)	Full AI-powered: competitor comparison, rank map, PageSpeed, website health
Score	0-100 from HTTP/HTML parsing	0-100 from AI + crawling
When	ALL leads automatically (every new lead batch)	Interested/warm leads only
Script	`/root/lead-machine/scripts/aeo-quick-scan.js`	`/root/lead-machine/scripts/aeo-audit.js`
Output	CSV + HTML summary reports	Full branded HTML report per lead

RULE: Run Quick-Scan on EVERY lead/principal. Use results for email personalization.

Score < 30 → “your online presence needs serious help” angle
Score 30-60 → “you’re missing key signals competitors have” angle
Score 60+ → “you’re close but leaving money on the table” angle
No schema → “AI search can’t find you” angle
No robots.txt → “search engines can’t properly crawl you” angle
Blocking AI bots → “you’re literally telling ChatGPT to ignore you”
Missing FAQPage schema → “Google can’t show your FAQs in search results”
Missing Review schema → “your 5-star reviews aren’t showing in search”
Missing Person schema → “AI doesn’t know WHO runs your agency”

Quick-Scan Pass 2 — Content Extraction

Script: /root/lead-machine/scripts/aeo-quick-scan-pass2.js
What it does: Fetches actual robots.txt + JSON-LD schema content, analyzes AI bot blocking
Output: robots-content.jsonl, schema-content.jsonl, principals-aeo-enriched.csv
ALWAYS run pass 2 after pass 1 — adds AI bot blocking + specific missing schema types

Enrichment Pipeline Order (for new lead batches)

Google Maps scrape (queue-gmaps-us-zips.js → google-maps-worker.js)
Website enrichment (principal-discovery.js → website-enrich-worker.js)
Email guessing/verification (auto-chained from website enrichment)
Principal list generation (extract-valid-principals.js)
AEO Quick-Scan Pass 1 (aeo-quick-scan.js) — 12 signal checks, 0-100 score
AEO Quick-Scan Pass 2 (aeo-quick-scan-pass2.js) — robots/schema content + AI bot analysis
Copy generation (copy-generate worker) — uses AEO data for personalization

AEO Quick-Scan Benchmark Stats (Mar 13, 2026 — 20K insurance agencies)

Avg AEO Score: 73.2/100 (loaded sites only)
HTTPS: 99.8% | robots.txt: 74.5% | sitemap: 66.1% | Schema: 53.6%
Meta desc: 53.2% | Title: 81% | H1: 62% | Viewport: 81.6% | OG: 64.2%
AI bot blocking: 3.4% (mostly GPTBot 377, CCBot 254, Google-Extended 254)
Missing FAQPage: 98.4% | Missing Review schema: 98.1% | Missing Person: 94.9%

Full Audit — Quick Reference

Script: /root/lead-machine/scripts/aeo-audit.js (VPS: [VPS-IP])
Output: /root/lead-machine/output/aeo-audits/ (JSON + email HTML + report HTML)
Deploy audit to: /var/www/reports/{domain-slug}-aeo.html
URL format: https://reports.strategicaiarchitects.com/{domain-slug}-aeo.html

MANDATORY Command Template

ssh root@[VPS-IP] "cd /root/lead-machine && node scripts/aeo-audit.js \
  --url https://DOMAIN.COM \
  --name 'LEAD NAME' \
  --business 'BUSINESS NAME' \
  --industry insurance \
  --location 'CITY ST' \
  --lat LATITUDE \
  --lng LONGITUDE \
  --lead-rating RATING \
  --lead-reviews REVIEW_COUNT \
  --rank-category 'SEARCH CATEGORY' \
  --rank-grid 5 \
  --rank-radius 5 \
  --competitors 5 \
  --monthly-searches 120 \
  --client-value 1500"

CRITICAL: Coordinate Resolution (Mar 11 fix)

NEVER rely on Nominatim city geocoding alone — it returns the geographic center of the city, which for large cities (Houston, Dallas, etc.) can be 15+ miles from the business’s actual location.
aeo-audit.js now auto-resolves coordinates via Google Places API: After Nominatim geocoding, it searches Google Places for the business name + location, gets the Place Details, and uses the real lat/lng. This happens automatically when GOOGLE_PLACES_KEY is set.
If auto-resolve fails or for manual override: Pass --lat and --lng with the business’s actual coordinates. Find them by searching Google Places for the business name.
To find business coordinates: Search “Business Name City” on Google Maps, right-click → “What’s here?” to get lat/lng. Or use Google Places API: searchGooglePlaces("Business Name City", approxLat, approxLng, key, 50000).
Example (Wise Insurance): Nominatim returned 29.79, -95.29 (east Houston). Actual location: 29.705, -95.568 (southwest Houston, 17 miles away). Rank map went from 0% to 48% top-3 coverage after fixing coordinates.

Optional Revenue Data Flags (for Cost of Waiting section)

--monthly-searches — Real monthly search volume for the keyword+location (from Google Keyword Planner, Semrush, etc.)
--client-value — Lifetime value of one client in dollars (e.g., Medicare client = $1000-1500)
--monthly-revenue — Lead’s current monthly revenue (for context)
If these are provided: Shows data-driven revenue calculation (searches x 25% capture x 10% close x LTV)
If NOT provided: Shows qualitative copy only — NO fabricated numbers. Never guess.

NEVER FORGET These Flags

--location — MANDATORY. Enables competitor comparison AND rank map AND geocoding.
--lat / --lng — Business’s ACTUAL coordinates. Auto-resolved via Google Places API if GOOGLE_PLACES_KEY is set. Pass manually as fallback.
--lead-rating / --lead-reviews — Pass Google review data for the lead. Look up on Google Maps or their site’s JSON-LD schema first.
--rank-category — The Google Maps search category (e.g., “medicare agent”, “health insurance agent”, “insurance agent”). Use the lead’s primary H2/service keyword. This MUST match what customers would search on Google Maps.
--rank-grid 5 --rank-radius 5 — Standard 5x5 grid, 5 mile radius. Can increase to 7x7 / 10mi for rural areas.
--competitors 5 — Find and audit 5 competitors.

MANDATORY Report Structure (EVERY audit must have ALL of these)

Every single audit report MUST contain this exact structure — no exceptions:

AEO Score — the big number out of 100, with category breakdown
AI Visibility — how the business appears in AI search (ChatGPT, Perplexity, etc.)
Key Findings — top 3-5 findings generated by Haiku
Detailed Scoring Breakdown — bars for Content, Q&A, E-E-A-T, Schema, Technical
Local Rank Map — interactive Leaflet map with color-coded grid showing Google Maps ranking at each point. MUST show average rank and top-3 coverage percentage.
Competitor Comparison — ALWAYS 5 competitors. Audit their AEO scores. Include the lead’s OWN website in the comparison so they can see where they stand.
Google Reviews Comparison — lead’s rating/reviews vs competitors
Website Health Audit — crawl 20 pages, check broken links/images/meta/SSL
PageSpeed — BOTH mobile AND desktop Lighthouse scores
Cost of Inaction — revenue impact calculation (if —monthly-searches and —client-value provided)
CTA — book strategy call button

Step-by-Step Process

1. Research the Lead (before running audit)

Google their business: Find rating, review count, exact address, city
Check their website: Does it have JSON-LD schema? What industry keywords?
Determine search category: What would someone search on Google Maps to find them?

2. Run the Audit

Use the command template above with ALL flags filled in
Takes ~8-12 minutes (scraping + scoring + Gemini/SearXNG competitors + rank map + Lighthouse)
Run in background: add 2>&1 and use run_in_background

3. Deploy the Report

# Copy report to web root
ssh root@[VPS-IP] "cp /root/lead-machine/output/aeo-audits/{slug}-{date}-report.html /var/www/reports/{domain-slug}-aeo.html"

4. Verify All Sections Present

Check the deployed report for ALL of these sections:

Section 01: AEO Score (the big number)
Section 02: AI Visibility / AI Search Answers
Section 03: Key Findings
Section 04: Detailed Breakdown
Section 05: Competitor Comparison (requires —location + working search)
Section 05b: Google Reviews comparison (requires —lead-rating/—lead-reviews)
Rank Map (interactive Leaflet map, requires —location + geocoding)
Website Health Audit (auto-runs, 20 pages)
PageSpeed (BOTH mobile AND desktop Lighthouse scores)
Cost of Inaction
Section 06: CTA

5. Provide Links to Mike

Always in this format:

Audit: https://reports.strategicaiarchitects.com/{domain-slug}-aeo.html

Score Stability Rules (CRITICAL)

The AEO score MUST be deterministic. Same site = same score. Changes implemented:

URLs sorted alphabetically before slicing to MAX_PAGES — deterministic page selection
networkidle0 (wait for ALL network to settle, not just 2 connections)
4-second JS render wait (not 2s) — more content captured consistently
30-second page timeout (not 15s) — fewer timeout-based page skips
Retry-once on page scrape failure — fewer random failures
15-second sitemap/robots check (not 8s) — consistent technical score
Scoring input fingerprint logged — [Inputs] line shows exact data feeding the score

If the score changes between runs on the same site, check the [Inputs] log line. The word count, page count, heading counts, and link counts should be identical. If not, investigate which page loaded differently.

Competitor Filtering Rules

skipDomains: ONLY blocks true directories (Yelp, YellowPages, BBB), social media, news, gov, medical info sites. NEVER block insurance companies — even national ones like The Zebra, Insurify, Progressive, State Farm. If they rank locally, they ARE competitors.
rejectNames: Blocks generic page titles like “find the best”, “homepage”, “insurance agents in”, “directory”. Does NOT block company names that happen to contain common words.
RULE: If it’s an actual business that sells insurance, it’s a competitor. Period. Don’t over-filter.

Known Issues

Gemini API geo-blocked on VPS — User location is not supported. CF Worker proxy at /root/lead-machine/workers/gemini-proxy.js needs to be deployed to Cloudflare to fix.
SearXNG unreliable — Brave/DuckDuckGo/Startpage get rate-limited/CAPTCHA’d. Restart with docker restart searxng may temporarily help. When all engines are down, competitor search and rank map return empty.
FIXED (Mar 11): Rank map 0% due to wrong coordinates — Nominatim geocodes “Houston TX” to city center (29.79, -95.29), but businesses can be 15+ miles away. aeo-audit.js now auto-resolves business coordinates via Google Places API after geocoding. Falls back to Nominatim if Places lookup fails. Pass --lat/--lng as manual override.
Google Places API working — GOOGLE_PLACES_KEY set in /root/lead-machine/.env. Rank map uses Places Text Search API. $0.032 p erse a rc h (25 se a rc h es =$ 0.80/audit for rank map grid). Free $200/mo credit covers ~250 audits.
sed compression bug — NEVER use sed to inject multi-line JavaScript on VPS. It compresses to single lines. If a line starts with //, everything after becomes a comment. ALWAYS use Python to modify JS files.

Scoring Categories (100 points total)

Category	Max	What it measures
Content	25	Word count, page count, long-form pages, headings, internal links
Q&A	20	FAQ sections, question headings, answer patterns, definitions
E-E-A-T	20	Author info, credentials, contact, privacy, testimonials, social
Schema	15	JSON-LD, LocalBusiness, Person, FAQPage, Review, Breadcrumb
Technical	20	HTTPS, mobile viewport, meta descriptions, OG, canonical, sitemap, robots, alt text

Template Files

Report template: /root/lead-machine/templates/aeo-report-v1.html
Email template: /root/lead-machine/templates/aeo-audit-email.html
CSS variables in template ensure dark text on light backgrounds

Supporting Modules

/root/lead-machine/scripts/google-reviews-section.js — Google Reviews comparison
/root/lead-machine/scripts/rank-map-embed.js — Rank map visualization (Leaflet + SVG fallback)
/root/lead-machine/scripts/rank-map.js — Rank map data generation (search grid)
/root/lead-machine/scripts/site-audit.js — Website health audit + PageSpeed
/root/lead-machine/scripts/llm-access-check.js — LLM accessibility checker (see below)

LLM Accessibility Check v2 (Mar 14-15, 2026)

Tests whether ChatGPT, Gemini, Perplexity, and Claude can actually access/crawl a website. 100% free — just HTTP requests with different User-Agent strings. Zero API cost.

How It Works (v2 — with JS dependency detection)

Fetches raw HTML with browser User-Agent (this is what bots see — NO JS execution)
PRIMARY CHECK: Extracts text content from raw HTML, strips tags, counts chars
- If <100 chars → empty/JS shell → ALL bots are blind (regardless of status code)
- If <300 chars + SPA shell patterns → JS-dependent → ALL bots blind
- If <500 chars + many scripts + bundle filenames → JS-heavy SPA → ALL bots blind
Detects platform from HTML patterns (GHL, Wix, Squarespace, React, Angular, etc.)
If site HAS content in raw HTML → checks per-bot access (robots.txt, HTTP 403, Cloudflare, differential content)

Detection Methods

JS dependency (v2 — catches ~16% of sites!): Raw HTML has <500 chars of text content. LLM bots can’t execute JS, so they see an empty page. This is the BIGGEST issue — 5x more common than robots.txt blocking.
robots.txt: Checks for Disallow: / for each bot’s user-agent
HTTP 403/401/503: Bot gets blocked status while browser gets 200
Cloudflare challenge: Detects cf-mitigated: challenge header, “Just a moment…” title, window._cf_chl_opt
Differential content: Bot gets <15% of browser’s content (served a different page)

Scripts & Integration Points

Core module: scripts/llm-access-check.js — checkLlmAccess(url) returns per-bot accessibility + jsAnalysis + platform + summary
- New exports: extractTextContent(), analyzeJsDependency(), detectPlatform()
- Returns jsAnalysis: { jsDependent, reason, textLength, spaShell, scriptCount }
- Returns platform: { id, name } (ghl, wix, squarespace, react-spa, wordpress, etc.)
Bulk scanner (Pass 3): scripts/aeo-llm-check.js — Runs on all principals, outputs principals-llm-enriched.csv
- Concurrency: 15, ~42 min for 19K domains
- Output columns: chatgpt_can_see, gemini_can_see, perplexity_can_see, claude_can_see, llm_visible_count, llm_block_method, llm_js_dependent, llm_platform, llm_text_length, llm_summary
- Summary JSON now includes js_dependency stats and platforms breakdown
Self-serve audit tool: Integrated into api-audit-routes.js runFastAudit(). Results shown in FREE tier (before email gate).
- 4 cards (green check / red X per AI engine)
- JS-dependent sites show “Blind” instead of “Blocked”, with “Invisible (JS-only site)” reason
- Special alert box for JS-dependent: explains JavaScript rendering issue, names the platform
- This is the killer cold email hook
Email gate now collects: email + phone + consent checkbox (required)
Full audit report: TODO — add LLM access section to aeo-audit.js output

Enrichment Pipeline Order (updated)

Google Maps scrape
Website enrichment
Email guessing/verification
Principal list generation
AEO Quick-Scan Pass 1 (12 signal checks)
AEO Quick-Scan Pass 2 (robots/schema content + AI bot analysis)
AEO Quick-Scan Pass 3 (LLM accessibility — actual HTTP checks with bot UAs)
Copy generation (uses AEO + LLM data for personalization)

Cold Email Angles (LLM-specific)

0/4 visible: “AI search engines literally CANNOT see your website. Even if your AEO score is decent, it doesn’t matter if ChatGPT and Gemini can’t read your content.”
Some blocked: “Google’s Gemini AI can’t access your site, which means it can’t recommend you. Your competitors who ARE visible are getting those referrals.”
Cloudflare blocking: “Your Cloudflare security settings are blocking AI search engines. Easy fix, huge impact.”
JS-only: “Your website loads content with JavaScript, which AI bots can’t execute. They see an empty page.”
robots.txt: “Your robots.txt file is telling ChatGPT to go away. Literally.”

SAA Brain

Explorer

aeo-audit-process

AEO Audit Process — Standard Operating Procedure

TWO TIERS — Quick-Scan vs Full Audit

RULE: Run Quick-Scan on EVERY lead/principal. Use results for email personalization.

Quick-Scan Pass 2 — Content Extraction

Enrichment Pipeline Order (for new lead batches)

AEO Quick-Scan Benchmark Stats (Mar 13, 2026 — 20K insurance agencies)

Full Audit — Quick Reference

MANDATORY Command Template

CRITICAL: Coordinate Resolution (Mar 11 fix)

Optional Revenue Data Flags (for Cost of Waiting section)

NEVER FORGET These Flags

MANDATORY Report Structure (EVERY audit must have ALL of these)

Step-by-Step Process

1. Research the Lead (before running audit)

2. Run the Audit

3. Deploy the Report

4. Verify All Sections Present

5. Provide Links to Mike

Score Stability Rules (CRITICAL)

Competitor Filtering Rules

Known Issues

Scoring Categories (100 points total)

Template Files

Supporting Modules

LLM Accessibility Check v2 (Mar 14-15, 2026)

How It Works (v2 — with JS dependency detection)

Detection Methods

Scripts & Integration Points

Enrichment Pipeline Order (updated)

Cold Email Angles (LLM-specific)

Graph View

Table of Contents

SAA Brain

Explorer

aeo-audit-process

AEO Audit Process — Standard Operating Procedure

TWO TIERS — Quick-Scan vs Full Audit

RULE: Run Quick-Scan on EVERY lead/principal. Use results for email personalization.

Quick-Scan Pass 2 — Content Extraction

Enrichment Pipeline Order (for new lead batches)

AEO Quick-Scan Benchmark Stats (Mar 13, 2026 — 20K insurance agencies)

Full Audit — Quick Reference

MANDATORY Command Template

CRITICAL: Coordinate Resolution (Mar 11 fix)

Optional Revenue Data Flags (for Cost of Waiting section)

NEVER FORGET These Flags

MANDATORY Report Structure (EVERY audit must have ALL of these)

Step-by-Step Process

1. Research the Lead (before running audit)

2. Run the Audit

3. Deploy the Report

4. Verify All Sections Present

5. Provide Links to Mike

Score Stability Rules (CRITICAL)

Competitor Filtering Rules

Known Issues

Scoring Categories (100 points total)

Template Files

Supporting Modules

LLM Accessibility Check v2 (Mar 14-15, 2026)

How It Works (v2 — with JS dependency detection)

Detection Methods

Scripts & Integration Points

Enrichment Pipeline Order (updated)

Cold Email Angles (LLM-specific)

Related

Graph View

Table of Contents