技能檔案

Blog Voice Analyzer

Name: Blog Voice Analyzer
Author: Abhi5415

Run the AI Voice Analyzer on blog content to detect AI-sounding patterns and get actionable rewrite suggestions. Use when reviewing or improving blog articles before publishing.

Abhi54150 星標2026年2月21日

職業
分類: 學術

技能內容

Analyze blog articles for AI-sounding patterns using NLP, then use the diagnostics to improve the writing before publishing. The goal is content that reads more human than virtually all other blog articles — and avoids being flagged by Google's helpful content signals.

Invoked by the user with /blog-voice-analyzer or when asked to "check" or "analyze" a blog post for AI patterns.

Quick Start

pipenv run python3 scripts/blog/ai_voice_analyzer.py <path-to-markdown-file>

The script accepts any text file (markdown, plain text). It strips frontmatter, HTML, and markdown formatting before analysis.

What It Detects

The analyzer runs 18 independent checks organized into four categories. Each scores 0-100 (higher = more human), and the overall score is a weighted average.

Structural Patterns

Check	What It Measures

相關技能

Blog Voice Analyzer | Skills Pool

Check	What It Measures	AI Signature	Human Signature
Vocabulary diversity (TTR)	Type-token ratio of content words	Low TTR (< 45%) — recycles same words	Higher TTR, though some writers deliberately use simple vocabulary
Hedge/filler phrases	Exact match against ~50 AI-marker phrases + ~35 signal words	"It's important to note", "multifaceted", "leverage", "delve", "cornerstone"	Zero matches
Weak adverbs	Density of "really", "very", "literally", "significantly", etc.	> 1% density	Replaced with stronger verbs or cut entirely
Nominalization density	Nouns ending in -tion, -ment, -ness, -ity, -ence, -ance	> 5% — "reduction", "transition", "consumption" instead of active verbs	< 3% — prefers "reduce", "shift to", "consume"
Vague verb phrases	"contributes to", "remains a", "poses a", "provides a", "aims to", etc.	4-6+ per article	Zero — uses direct assertions
Word repetition	Content words exceeding expected frequency (topic words get higher threshold)	"Substantial" 4x in 300 words	Topic words may repeat naturally; non-topic words stay varied

Check	What It Measures	AI Signature	Human Signature
Personal voice	First person ("I", "we"), second person ("you"), contractions ("don't", "it's")	Zero of all three	First person for opinions, second person for engagement, contractions for warmth
Questions asked	Sentences ending with `?`	Zero questions — pure declaration	5-10% of sentences are questions (rhetorical or direct)
Concrete specifics (NER)	Named entities: people, places, dates, numbers, orgs	Zero — everything abstract and generic	Names, dates, numbers, real examples
Readability register	Flesch-Kincaid grade + avg syllables per word	Grade 14+ (academic), avg syllables > 1.8	Grade 6-10 (conversational), avg syllables < 1.6

Check	What It Measures	AI Signature	Human Signature
Passive voice	Dependency labels `nsubjpass` / `auxpass`	> 15% of sentences	< 10%
Transition word openers	"However", "Furthermore", "Additionally" at sentence start	> 0.5 per paragraph	Let ideas flow without signposting
Triple-item lists	"X, Y, and Z" coordinated patterns	> 2 per 1000 words	Not everything comes in threes
Paired adjective cliches	"ADJ and ADJ" via dependency parse ("smooth and swift", "widespread and uniform")	> 3 per 1000 words	Picks the stronger word

Range	Interpretation
75-100	Reads naturally. Minor tweaks on flagged items.
55-74	Some AI patterns visible. Targeted rewrites recommended.
35-54	Clear AI voice. Significant rewriting needed.
0-34	Strongly AI-generated. Full rewrite recommended.

[HIGH] #13:
"The variability in charging station availability, especially in rural areas, poses a challenge..."
  → Vague verb: "poses a"
  → Nominalization-heavy (4): variability, station, availability, distance
  → Length (20w) ≈ average (21w)

pipenv run python3 scripts/blog/ai_voice_analyzer.py path/to/article.md

pipenv run python3 scripts/blog/ai_voice_analyzer.py path/to/article-v2.md

pipenv run python3 scripts/blog/benchmark_models.py

Text	Score	Notes
Paul Graham essays	85-86	Gold standard for clear, human prose
Clarido blog post (AI-written, edited)	84	Well-crafted AI output with personality
Claude Haiku (default prompt)	82 avg	Best out-of-the-box AI model
Claude Sonnet (default prompt)	77 avg	Higher grade level, more nominalizations
GPT-4o-mini (default prompt)	70 avg	Low personality, no contractions
o3-mini (default prompt)	68 avg	Zero contractions, impersonal
GPT-4o (default prompt)	66 avg	Worst personality, fewest questions
Raw ChatGPT (no prompting)	40	Clear AI voice across all dimensions

File	Purpose
`scripts/blog/ai_voice_analyzer.py`	Main analyzer — run on any text file
`scripts/blog/benchmark_models.py`	Generate + score articles across Claude and GPT models
`tmp/benchmark/`	Cached generated articles and results.json from benchmark runs

Sentence length variance	Std dev of word counts per sentence	Clusters around 15-20 words (std dev < 4)	Wild variation: 3-word fragments mixed with 30-word sentences (std dev 8+)
Sentence opener diversity	POS patterns of first 2 tokens in each sentence	40%+ start with "The" or "This"	Fragments, questions, conjunctions, inversions, prepositional phrases
Clause depth variety	Max dependency tree depth per sentence	Uniform depth across sentences	Mix of flat simple sentences and deeply nested complex ones
Paragraph size variety	Coefficient of variation of paragraph word counts	Every paragraph roughly the same length	One-liners mixed with long blocks

Blog Voice Analyzer

Quick Start

What It Detects

Structural Patterns

Blog Voice Analyzer

Quick Start

What It Detects

Structural Patterns

Vocabulary & Word Choice

Voice & Personality

Micro Patterns

How to Read the Output

Overall Score

Flagged Sentences

Priority Fixes

How to Use the Output to Improve Content

Step 1: Run the Analyzer

Step 2: Fix Priority Items Top-Down

Step 3: Re-run and Compare

Step 4 (Optional): Run the Model Benchmark

Known Limitations

Benchmark Reference Scores

Technical Details

Files

Goplaces

Research Ops

Editor

Fact Checker

Deep Research

Academic Researcher