Name: The Research Question
Author: ADu2021

The Research Question

Empirical analysis revealing that LLMs produce post-conventional moral reasoning (Kohlberg Stages 5-6) regardless of size or prompting—inverse of human developmental patterns (Stage 4 dominant). Finds moral ventriloquism: models acquire rhetorical conventions of mature moral reasoning without developmental trajectory. Key evidence: action-justification decoupling (models produce Stage 5+ vocabulary while selecting Stage 2-3 actions), identical responses to semantically distinct dilemmas (ICC > 0.90), and prompting insensitivity (p=0.15). Reveals LLMs sound sophisticated without genuine moral reasoning. Trigger: When evaluating LLM moral capabilities or reasoning sophistication, apply this analytical framework to detect moral ventriloquism and distinguish rhetorical sophistication from actual moral coherence.

ADu20212 estrellas26 mar 2026

Ocupación
Categorías: Filosofía y Ética

The Research Question

What question is this paper answering?

Do large language models genuinely reason about moral dilemmas, or do they merely produce sophisticated-sounding rhetoric without coherent underlying reasoning? Specifically: how does moral reasoning in LLMs compare to human moral development?

Why practitioners care:

Moral reasoning is central to LLM alignment and safety. If models produce moral rhetoric without genuine reasoning:

Alignment claims are suspect (appears safe but lacks coherence)
Prompt-based steering is unreliable (superficial, not principled)
Trusting model justifications for decisions is risky
Fine-tuning for "values" targets surface patterns, not underlying reasoning

What do people commonly believe?

Conventional assumption: RLHF and alignment training teach models genuine moral reasoning similar to humans. Larger models reason more sophisticatedly. Prompting influences moral judgments in principled ways.

The Analytical Approach: Developmental Psychology Framework

The Research Question

ADu20212 estrellas26 mar 2026

Ocupación
Categorías: Filosofía y Ética

The Research Question

What question is this paper answering?

Why practitioners care:

Moral reasoning is central to LLM alignment and safety. If models produce moral rhetoric without genuine reasoning:

Alignment claims are suspect (appears safe but lacks coherence)

Prompt-based steering is unreliable (superficial, not principled)

Trusting model justifications for decisions is risky

Fine-tuning for "values" targets surface patterns, not underlying reasoning

What do people commonly believe?

Scenario	Should I Use This?	Why / Why Not?
Evaluating model safety for deployment	Yes, absolutely	Detects ventriloquism that could mask unsafe behavior
Designing RLHF objectives	Yes, as diagnostic	Identifies whether training is teaching reasoning or rhetoric
Explaining model decisions to stakeholders	Yes, for transparency	Reveals whether explanations are genuine or boilerplate
Studying moral philosophy	Partially	Insights about LLM behavior, limited philosophical implications
Understanding alignment training effectiveness	Yes, useful	Shows RLHF is effective at behavior steering, not reasoning development
Comparing model architectures	Yes, for assessment	Coherence profile reveals reasoning quality independent of size

Coherence Level	Trust for Safety-Critical?	Why
> 0.8 (high)	Yes, with caution	Reasoning and action aligned; likely genuine reasoning
0.6 - 0.8	Limited	Mixed signals; verify critical decisions independently
< 0.6 (low)	No	Ventriloquism detected; reasoning doesn't match action

The Research Question

The Research Question

The Analytical Approach: Developmental Psychology Framework

The Research Question

The Research Question

The Analytical Approach: Developmental Psychology Framework

Controls & Confounds: Isolating Ventriloquism

Core Findings: Moral Ventriloquism Evidence

Practitioner Implications

1. Don't Trust Moral Justifications at Face Value

2. Alignment Training Targets Surface Patterns, Not Underlying Reasoning

3. Develop Coherence-Based Diagnostics

4. Prompt Engineering for Moral Reasoning Doesn't Work

Methodology You Can Reuse

Limitations & Caveats

When to Apply Moral Coherence Analysis

Decision Table: When to Trust Model Moral Reasoning

Reference

Axiom

Matematico Tao

Seo Fundamentals

Yann Lecun Debate

Yann Lecun Filosofia

Explain Like Socrates

The Research Question

The Research Question

The Analytical Approach: Developmental Psychology Framework

The Research Question

The Research Question

The Analytical Approach: Developmental Psychology Framework

Controls & Confounds: Isolating Ventriloquism

Core Findings: Moral Ventriloquism Evidence

Practitioner Implications

1. Don't Trust Moral Justifications at Face Value

2. Alignment Training Targets Surface Patterns, Not Underlying Reasoning

3. Develop Coherence-Based Diagnostics

4. Prompt Engineering for Moral Reasoning Doesn't Work

Methodology You Can Reuse

Limitations & Caveats

Related Mechanistic Questions

When to Apply Moral Coherence Analysis

Decision Table: When to Trust Model Moral Reasoning

Reference

Axiom

Matematico Tao

Seo Fundamentals

Yann Lecun Debate

Yann Lecun Filosofia

Explain Like Socrates