Build SEO-optimized pages at scale using templates and data.
Build SEO-optimized pages at scale using templates and data. Create page generators that target keyword patterns and produce unique, valuable content for each variation.
User wants to create many SEO-driven pages from a template (e.g., "[product] vs [competitor]", "[service] in [city]")
User mentions programmatic SEO, template pages, directory pages, location pages, or comparison pages at scale
User wants to build an "alternatives to X" page set, integrations directory, or glossary
User has a data set they want to turn into individual landing pages
Auditing existing SEO issues (use seo-auditor skill)
Writing a single blog post or landing page (use content-machine skill)
One-off competitive analysis (use competitive-analysis skill)
Information gain is everything — Every page must contain information the user cannot find on existing top-10 results. Proper H1s, keyword density, and clean structure are baseline hygiene, not a ranking advantage. Pages that "look like SEO" but add no new information are what Google's spam systems now target.
Unique value per page — Every page must provide value specific to that page, not just swapped variables in a template. If you removed the entity name, would two pages be indistinguishable? If yes, don't publish.
Proprietary data wins — Hierarchy: proprietary > product-derived > user-generated > licensed > public (weakest). The strongest pSEO pages contain data no one else has — original tests, first-party metrics, real user reviews, proprietary calculations.
Subfolders, not subdomains — yoursite.com/templates/resume/nottemplates.yoursite.com/resume/
Match search intent precisely — Pages must match the intent behind the query, not just contain the keywords. This means choosing the right page type (blog post vs. comparison table vs. tool vs. directory listing), answering the actual question, and structuring content the way users expect for that query type.
Quality over quantity — 100 great pages beat 10,000 thin ones. Scale without information gain is just industrialized noise. Google has more content than it needs — your pages must earn their place by being genuinely better than what already ranks.
Topical authority compounds — A site that demonstrates deep expertise in a specific topic earns trust signals that lift all pages in that topic cluster. Scattered, shallow coverage across unrelated topics builds no authority.
Before writing a single line of code, determine whether the existing application is a Single Page Application (SPA).
SPAs (React, Vue, Angular, Svelte) are invisible to Googlebot. Googlebot fetches raw HTML and does not execute client-side JavaScript. A React component that renders <h1>Currency Converter</h1>only works in a browser — Googlebot sees an empty<div id="root"></div>.
Is the app a SPA (React/Vue/Angular)?
│
├── YES → SEO pages MUST be server-rendered (SSR)
│ Options:
│ A. Express/Fastify route returning a complete HTML string ← simplest for existing SPAs
│ B. Migrate to Next.js/Nuxt/SvelteKit with SSR support
│ C. Add a static site generator (Astro) alongside the SPA for SEO pages only
│
└── NO (already Next.js, Nuxt, etc.) → Use the framework's built-in SSR/SSG/ISR patterns (see Step 6)
The SPA trap: The most common pSEO mistake is adding routes to a React Router / Wouter / Vue Router app and assuming they will be indexed. They will not. Always verify by curl-ing the URL and checking if the content is present in the raw HTML response:
curl -s https://yoursite.com/your-seo-page | grep "target keyword"
# If empty or only returns <div id="root"></div> → not crawlable
When building programmatic SEO for the user's own company, you will not have access to their internal data (customer stories, case studies, testimonials, product metrics, pricing, team bios, etc.). Do not fabricate this information.
Customer names, logos, or testimonials they want featured
Case study data (metrics, outcomes, quotes)
Product-specific details (features, pricing tiers, integrations list)
Any proprietary data that should populate template variables
Industry research and statistics (sourced via webSearch)
General descriptions of the problem/solution category
Feature explanations based on what's publicly visible on their site (use webFetch on their domain)
Placeholder blocks clearly marked [INSERT: customer testimonial]or[INSERT: case study metrics]
Comparison data pulled from public sources (G2, Capterra reviews via webSearch)
Never generate: fake customer quotes, fabricated ROI numbers, invented case studies, made-up testimonials, or fictional company metrics. These damage trust and can create legal liability.
For generic/research topics (e.g., "[city] cost of living", "[tool A] vs [tool B]", glossary terms), use webSearch to gather real data and cite sources.
This is the most important section in this skill. AI excels at scaling execution — drafting content, structuring pages, generating variations. But it cannot make the strategic decisions that determine whether a pSEO campaign succeeds or fails. These decisions must come from the user (or a human SEO strategist), and the agent must actively prompt for them rather than making assumptions.
Before building anything, the user must answer (or the agent must ask):
"best CRM for real estate" → listicle/comparison (not a product page)
"HubSpot vs Salesforce" → head-to-head comparison (not a blog post)
"what is a CRM" → educational/glossary (not a product page)
"CRM pricing" → pricing table with real numbers (not a blog post about pricing)
What information gain does this page offer over the current top 10? If the answer is "nothing — we're just rewriting what's already there with AI," do not build the page. Specifically ask: do we have original data, unique analysis, proprietary metrics, real case studies, or first-hand experience that the current results lack?
Does the user's site have topical authority in this area? A brand-new domain publishing 200 glossary pages on a topic it has no track record in will be ignored. Topical authority is built through depth (many related, high-quality pages), external trust signals (backlinks from relevant sites), and user engagement over time.
What is the commercial intent and conversion path? Pages that attract traffic but have no clear next step for the user waste crawl budget and dilute site quality. Every pSEO page should have a purpose beyond "get impressions."
Always ask the user what unique data, experience, or perspective they can bring to the topic before generating content. If they say "just write something about X," push back and explain that pages without information gain will not rank.
Never assume scale equals strategy. If the user asks for "500 SEO articles," the correct response is to first validate whether 500 pages is the right number, whether the topics have demand, and whether the user has enough unique data to make each page valuable.
Recommend starting small. Build 10-20 high-quality pages first, validate they get indexed and earn impressions, then scale what works. This is the opposite of the "blast 500 pages and hope" approach.
Be explicit about what AI is doing vs. what requires human input. AI writes the page. The human decides whether the page should exist.
A page can have perfect H1 tags, clean H2/H3 hierarchy, optimized meta descriptions, proper schema markup, good keyword density, and fast load times — and still rank nowhere. These are table stakes, not differentiators. Google's ranking systems evaluate whether a page genuinely satisfies the user's query better than alternatives. Structure without substance is what practitioners call "SEO cosplay" — it looks like SEO but doesn't perform.
The agent must not conflate technical SEO hygiene with content quality. Both are necessary. Neither is sufficient alone.
Google's systems increasingly evaluate "information gain" — whether a page adds something new to the corpus of existing results for a query. This is the single most important concept in modern pSEO.
| Source of gain | Example | Strength |
|---|---|---|
| Original data / metrics | Your own A/B test results, proprietary benchmarks, internal analytics | Strongest — impossible to replicate |
| First-hand experience | Actual product reviews you conducted, real case studies with named clients | Very strong — hard to fake |
| Unique analysis | Novel comparisons, calculated scores, derived insights from raw data | Strong — requires expertise |
| Curated judgment | Expert picks, opinionated rankings with reasoning, "here's what we'd actually use" | Moderate — requires credibility |
| Structured aggregation | Data from multiple sources combined into a single useful view (with attribution) | Moderate — useful but reproducible |
| Rewritten common knowledge | The same information available on 50 other sites, just rephrased | Zero — this is what gets penalized |
Before publishing any pSEO page set, run this test on 5 random pages:
Search Google for the target keyword
Read the top 3 results
Read your generated page
Ask: "Does our page contain at least one piece of information, data point, or insight that none of the top 3 results have?"
If no → do not publish. Improve the data source or reduce the page set to only pages where you have genuine information gain.
Google now explicitly targets "scaled content abuse" — content produced at scale (whether by AI, humans, or automation) that exists primarily to manipulate rankings rather than help users. The pattern Google detects is:
Many pages with similar structure
Smooth, readable prose with no new information
Comprehensive coverage that is really just reorganized common knowledge
High page count with no corresponding trust signals or user engagement
Each page is readable, but after reading it you've learned nothing new
This pattern triggers penalties regardless of whether AI wrote the content. A human content farm producing the same pattern gets the same treatment. The issue is not AI — it is the absence of value at scale.
| Playbook | URL pattern | Who does it | Scale |
|---|---|---|---|
| Integrations | /apps/[A]/integrations/[B] | Zapier — ~56k pages, 5.8M+ monthly organic visits, ranks for 1.3M keywords. Proprietary data (triggers/templates per app pair) no one else can replicate. | N² combinations |
| Conversions | /currency-converter/[from]-to-[to]-rate | Wise — 8.5M pages across locale subfolders, 60M+ monthly visits. Live exchange-rate data + fee calculators = unique value per page. | N² × locales |
| Locations | /Restaurants-[city],/[cuisine]-Restaurants-[city],/Restaurants-[neighborhood] | Tripadvisor — 700M+ pages, 226M+ monthly visits. UGC reviews keep pages fresh; layered matrix (city × cuisine × neighborhood). | city × category × modifier |
| Data profiles | /[city-slug] | Nomad List — cost-of-living, internet speed, safety scores per city. Pages are pure data tables — minimal prose, high value. | N entities |
| Comparisons | /[A]-vs-[B],/alternatives/[A] | G2, Capterra — "vs" pages + "alternatives" pages, populated by user reviews. | N² / 2 |
| Templates | /templates/[type] | Canva, Notion — each template is a landing page. | N types |
| Glossary | /learn/[term] | Ahrefs, HubSpot — definition pages cluster topical authority. | N terms |
| How-to guides | /guides/[task]-with-[tool] | Documentation sites — step-by-step guides, HowTo schema | N tasks × M tools |
| Personas | /[product]-for-[audience] | "CRM for real estate agents" | N × M |
The test: If your data doesn't meaningfully change between page variations, don't build it. Zapier works because Slack+Asana genuinely differs from Slack+Trello. "Plumber in Austin" vs "Plumber in Dallas" with identical boilerplate = thin content penalty.
Layer playbooks for long-tail: Tripadvisor's "Best Italian Restaurants in Chinatown NYC" = curation × cuisine × neighborhood.
Every successful pSEO example above shares one trait: the data genuinely changes between pages. Zapier's Slack+Asana page has different triggers, different actions, and different templates than Slack+Trello. Wise's USD-to-EUR page has a different exchange rate, different fee structure, and different historical chart than USD-to-GBP. Tripadvisor's pages have different restaurants, different reviews, and different ratings per city.
Copycats fail because they replicate the URL structure without replicating the information gain. Creating 500 "/service-in-city" pages where only the city name changes and the rest is identical prose is not programmatic SEO — it is spam at scale. Google's systems detect this pattern reliably: many pages, similar structure, no meaningful data variation.
Before choosing a playbook, ask: "Do I have genuinely different data for each page variation, or am I just swapping one variable into the same template?" If the latter, reduce the page count to only variations where you have real data, or choose a different playbook entirely.
Identify the repeating structure and variables
Count how many unique combinations exist
Validate demand: aggregate search volume, distribution (head vs. long tail), trend direction
What data populates each page?
Is it first-party, scraped, licensed, or public?
How is it updated and maintained?
H1 with target keyword
Unique intro (not just variables swapped — conditional content based on data)
Data-driven sections with original insights/analysis per page
Related pages / internal links
CTAs appropriate to intent
Conditional content blocks that vary based on data attributes
Calculated or derived data (not just raw display)
Editorial commentary unique to each entity
User-generated content where possible
Hub: Main category page (e.g., "/integrations/")
Spokes: Individual programmatic pages (e.g., "/integrations/slack-asana/")
Cross-links between related spokes
Every page must be reachable from the main site. Update the main app's footer and navigation to include links to SEO hub pages — SEO pages that aren't linked from the main site are orphan pages that Google may never discover or trust.
Include XML sitemap and breadcrumbs with structured data.
Prioritize high-volume patterns for initial crawling
Noindex very thin variations rather than publishing them
Manage crawl budget (separate sitemaps by page type)
Monitor indexation rate in Search Console
| Page count | Data freshness | Strategy |
|---|---|---|
| <1,000 | Rarely changes | SSG — pre-render everything at build |
| 1,000-100,000 | Changes daily/weekly | ISR — pre-render popular subset, generate rest on-demand + cache |
| 100,000+ or live data | Real-time (prices, rates) | ISR with short revalidate or SSR |
SSG is fastest but build time scales linearly — 50k pages can mean 30+ min builds. ISR is the pSEO sweet spot: instant deploys, pages generate on first request then cache.
When adding pSEO to an existing SPA, the simplest approach is adding Express routes that return complete HTML strings. Organize into three layers:
// server/ssrShared.ts — create before writing any individual page
export function ssrHtmlShell({ title, description, canonical, schemaJson, css, body, ...}): string {
return `<!DOCTYPE html>
<html lang="...">
<head>
<meta charset="UTF-8" />
<title>${title}</title>
<meta name="description" content="${description}" />
<link rel="canonical" href="${canonical}" />
<!-- OG tags, fonts, schema -->
<style>${sharedCss()}${css ?? ""}</style>
</head>
<body>
${header()}
${body}
${footer()}
</body>
</html>`;
}
Create this shared shell before writing any individual page. Every new SEO page reuses it. This ensures consistent branding (header, footer, fonts, base CSS) across all SSR pages without duplication.
// server/glossary.ts
export const TERMS: GlossaryTerm[] = [ /* structured data */ ];
export function getTermHtml(slug: string, logoBase64: string): string | null { ... }
export function getTermIndexHtml(logoBase64: string): string { ... }
Keep data as typed arrays/objects in the same file as the generator. This makes content easy to update without touching routing logic.
// in routes.ts
app.get("/glossary/:slug", (req, res) => {
const html = getTermHtml(req.params.slug, LOGO_BASE64);
if (!html) { res.status(404).end(); return; }
res.setHeader("Cache-Control", "public, max-age=86400");
res.setHeader("Content-Type", "text/html; charset=utf-8");
res.send(html);
});
Set cache headers based on data freshness — not one-size-fits-all:
| Page type | Recommended header | Reason |
|---|---|---|
| Static content (glossary, guides) | public, max-age=86400 | Content rarely changes; CDN can serve |
| Live data (exchange rates, prices) | no-cacheors-maxage=60 | Must be fresh; stale data damages credibility |
| Semi-dynamic (weekly updates) | public, s-maxage=3600, stale-while-revalidate=86400 | Balance freshness vs. load |
export async function generateStaticParams() {
const popular = await db.query('SELECT slug FROM entities ORDER BY search_volume DESC LIMIT 500');
return popular.map(e => ({ slug: e.slug }));
}
export const dynamicParams = true;
export const revalidate = 3600;
export async function generateMetadata({ params }) {
const { slug } = await params;
const entity = await getEntity(slug);
return {