Name: Analyze Results
Author: expectedparrot

스킬 검색.../

Analyze Results | Skills Pool

from edsl import Results

# Load by UUID (from Expected Parrot cloud)
results = Results.pull("123e4567-e89b-12d3-a456-426614174000")

# OR load from local file
results = Results.load("path/to/results")  # .json.gz extension optional

import os
import glob

# Find existing analysis directories and get next number
existing = glob.glob("./analysis_*")
existing_nums = []
for d in existing:
    try:
        num = int(d.split("_")[-1])
        existing_nums.append(num)
    except ValueError:
        pass

next_num = max(existing_nums, default=0) + 1
output_dir = f"./analysis_{next_num}"
os.makedirs(output_dir, exist_ok=True)

Question: "What would you like me to focus on in the analysis?"
Header: "Focus"

# Get survey from results
survey = results.survey

# 1. Export survey as markdown
survey_md = survey.to_markdown()
with open(f"{output_dir}/survey.md", "w") as f:
    f.write(survey_md)

# 2. Export survey as mermaid diagram
# Note: Sanitize HTML tags for mermaid v11+ compatibility
import re
survey_mermaid = survey.to_mermaid()
# Remove HTML tags that cause syntax errors in newer mermaid versions
survey_mermaid = re.sub(r'<b>|</b>|<br/>', '\n', survey_mermaid)
survey_mermaid = re.sub(r'\n+', '\n', survey_mermaid)  # Clean up multiple newlines
with open(f"{output_dir}/survey.mermaid", "w") as f:
    f.write(survey_mermaid)

# 3. Export results as CSV
results_csv = results.to_csv()
results_csv.write(f"{output_dir}/results.csv")
# OR: with open(f"{output_dir}/results.csv", "w") as f: f.write(results_csv.text)

import pandas as pd

# Load and examine the CSV
df = pd.read_csv(f"{output_dir}/results.csv")

# Get column info
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

# Identify question answer columns (answer.*)
answer_cols = [c for c in df.columns if c.startswith('answer.')]
print(f"Answer columns: {answer_cols}")

# Identify agent columns (agent.*)
agent_cols = [c for c in df.columns if c.startswith('agent.')]
print(f"Agent columns: {agent_cols}")

# Identify scenario columns (scenario.*)
scenario_cols = [c for c in df.columns if c.startswith('scenario.')]
print(f"Scenario columns: {scenario_cols}")

# Identify question text columns (question_text.*)
question_text_cols = [c for c in df.columns if c.startswith('question_text.')]
print(f"Question text columns: {question_text_cols}")

# Identify question option columns (question_options.*)
question_options_cols = [c for c in df.columns if c.startswith('question_options.')]
print(f"Question options columns: {question_options_cols}")

# Identify question type columns (question_type.*)
question_type_cols = [c for c in df.columns if c.startswith('question_type.')]
print(f"Question type columns: {question_type_cols}")

# Results Analysis Report

## Study Design

### Questions

[For EACH question in the survey, show:]

#### Q1: [question_name] ([question_type])

**Template:** [The raw question template text from question_text.* column]

**Options:** [The options from question_options.* column, if applicable]

**Realized versions:**

[If the question uses scenario variables (Jinja2 templates like {{ var }}), show a table
of ALL unique realized question texts across scenario conditions. Group by scenario
variables to show what each condition looked like.]

| Scenario Condition | Realized Question Text |
|--------------------|----------------------|
| [condition 1]      | [full realized text] |
| [condition 2]      | [full realized text] |
...

[Repeat for each question]

### Scenario Variables

[Show ALL scenario variables and their unique values]

| Variable | Unique Values |
|----------|--------------|
| scenario.domain | health_insurance, software_platform, ... |
| scenario.framing | neutral, status_quo |
...

### Scenario Matrix

[Show the full crossing of scenario variables as a table, so the reader can see
every unique experimental condition. Include the count of observations per cell.]

### Agents / Models

[Show the models or agents used, their configuration (temperature, etc.),
and how many responses each produced]

## Data Summary
- Number of responses: N
- Agent traits collected: [list]
- Scenarios tested: [list]

## Detailed Results

### Q1: [Question Name]
[Response distribution table]
![Question 1 visualization](q1_distribution.png)
[Interpretation of results]

### Q2: [Question Name]
[Same pattern - table, visualization, interpretation together]

## Key Findings
[Main insights from the data]

## Cross-Tabulations (if applicable)
[Relationships between variables - only include agent breakdowns if agents have meaningful names, not UUIDs]

## Files Generated

| File | Description |
|------|-------------|
| [survey.md](survey.md) | Survey documentation |
| [survey.mermaid](survey.mermaid) | Survey flow diagram |
| [results.csv](results.csv) | Raw results data |
| [report.html](report.html) | This report (HTML) |

# Get unique realized question texts per scenario condition
for qt_col in question_text_cols:
    q_name = qt_col.replace('question_text.', '').replace('_question_text', '')
    template = df[qt_col].iloc[0]  # Template text (may have {{ var }} syntax)

    # Check if question uses scenario variables by looking at prompt columns
    # The actual realized text (with scenarios filled in) is in the prompt.* columns
    prompt_col = f'prompt.{q_name}_user_prompt'
    if prompt_col in df.columns:
        # Extract unique realized prompts grouped by scenario conditions
        if scenario_cols:
            # Group by scenario variables to show each condition
            groups = df.groupby([c for c in scenario_cols if df[c].nunique() > 1])
            for name, group in groups:
                realized_text = group[prompt_col].iloc[0]
                # Include in report table

for col in scenario_cols:
    unique_vals = df[col].dropna().unique()
    # Show variable name (without 'scenario.' prefix) and all its values

# Cross-tabulate scenario variables to show the design
if len(scenario_cols) >= 2:
    # Pick the most meaningful scenario columns (skip index/id columns)
    meaningful_scenario_cols = [c for c in scenario_cols
                                 if not c.endswith('_index') and not c.endswith('_id')]
    # Show unique combinations and observation counts
    design = df.groupby(meaningful_scenario_cols).size().reset_index(name='n_observations')

import matplotlib.pyplot as plt

# For each question, generate chart and include in report immediately
for col in answer_cols:
    question_name = col.replace('answer.', '')
    value_counts = df[col].value_counts()

    # Add question section to report
    report += f"### {question_name}\n\n"
    report += "[Response distribution table here]\n\n"

    # Generate and save chart
    if len(value_counts) <= 20:
        fig, ax = plt.subplots(figsize=(10, 6))
        value_counts.plot(kind='bar', ax=ax)
        ax.set_title(f'Response Distribution: {question_name}')
        ax.set_xlabel('Response')
        ax.set_ylabel('Count')
        plt.tight_layout()
        chart_path = f"{question_name}_distribution.png"
        plt.savefig(f'{output_dir}/{chart_path}', dpi=150)
        plt.close()

        # Include chart immediately after question data
        report += f"![{question_name} distribution]({chart_path})\n\n"

    report += "[Interpretation of this question's results]\n\n"

output_dir/
├── survey.md          # Survey in markdown format
├── survey.mermaid     # Survey flow diagram
├── results.csv        # Full results data
├── report.md          # Analysis report
├── report.html        # Styled HTML report
├── *.png              # Visualization files
└── analysis.py        # Optional: reproducible analysis script

# CSS file location
CSS_FILE="$HOME/tools/ep/skills/assets/report.css"

# Generate HTML report (no --metadata title to avoid duplicate with markdown h1)
pandoc "${output_dir}/report.md" \
    -o "${output_dir}/report.html" \
    --css="${CSS_FILE}" \
    --standalone

import subprocess
import os

css_file = os.path.expanduser("~/tools/ep/skills/assets/report.css")
# Note: Don't use --metadata title= since report.md already has # heading
subprocess.run([
    "pandoc",
    f"{output_dir}/report.md",
    "-o", f"{output_dir}/report.html",
    f"--css={css_file}",
    "--standalone"
], check=True)
print(f"Generated: {output_dir}/report.html")

"""
EDSL Results Analysis Script
"""
from edsl import Results
import pandas as pd
import matplotlib.pyplot as plt
import os
import re
from datetime import datetime

# === CONFIGURATION ===
# Modify this to load your results
RESULTS_UUID = "123e4567-e89b-12d3-a456-426614174000"  # Or use file path
# RESULTS_PATH = "./results.json.gz"

# === LOAD RESULTS ===
results = Results.pull(RESULTS_UUID)
# results = Results.load(RESULTS_PATH)

# === CREATE OUTPUT DIRECTORY ===
import glob
existing = glob.glob("./analysis_*")
existing_nums = []
for d in existing:
    try:
        num = int(d.split("_")[-1])
        existing_nums.append(num)
    except ValueError:
        pass
next_num = max(existing_nums, default=0) + 1
output_dir = f"./analysis_{next_num}"
os.makedirs(output_dir, exist_ok=True)

# === EXPORT DOCUMENTATION ===
survey = results.survey

# Survey markdown
with open(f"{output_dir}/survey.md", "w") as f:
    f.write(survey.to_markdown())

# Survey mermaid (sanitize HTML tags for mermaid v11+ compatibility)
survey_mermaid = survey.to_mermaid()
survey_mermaid = re.sub(r'<b>|</b>|<br/>', '\n', survey_mermaid)
survey_mermaid = re.sub(r'\n+', '\n', survey_mermaid)
with open(f"{output_dir}/survey.mermaid", "w") as f:
    f.write(survey_mermaid)

# Results CSV
results_csv = results.to_csv()
results_csv.write(f"{output_dir}/results.csv")

# === LOAD DATA FOR ANALYSIS ===
df = pd.read_csv(f"{output_dir}/results.csv")

# Identify column types
answer_cols = [c for c in df.columns if c.startswith('answer.')]
agent_cols = [c for c in df.columns if c.startswith('agent.')]
scenario_cols = [c for c in df.columns if c.startswith('scenario.')]

# === HELPER: Check if string looks like a UUID ===
import re
def is_uuid(s):
    """Check if a string looks like a UUID (8-4-4-4-12 hex pattern)."""
    if not isinstance(s, str):
        return False
    uuid_pattern = r'^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
    return bool(re.match(uuid_pattern, s.lower()))

# Check if agents have meaningful names (not UUIDs)
has_meaningful_agents = False
if 'agent.agent_name' in df.columns:
    agent_names = df['agent.agent_name'].dropna().unique()
    has_meaningful_agents = len(agent_names) > 0 and not all(is_uuid(str(name)) for name in agent_names)

# === GENERATE STUDY DESIGN SECTION ===
# Identify column types for study design
question_text_cols = [c for c in df.columns if c.startswith('question_text.')]
question_options_cols = [c for c in df.columns if c.startswith('question_options.')]
question_type_cols = [c for c in df.columns if c.startswith('question_type.')]
prompt_cols = [c for c in df.columns if c.startswith('prompt.') and c.endswith('_user_prompt')]
model_cols = [c for c in df.columns if c.startswith('model.')]

study_design = ""

# --- Questions section ---
study_design += "## Study Design\n\n### Questions\n\n"

for qt_col in question_text_cols:
    q_name = qt_col.replace('question_text.', '').replace('_question_text', '')
    template = str(df[qt_col].iloc[0])

    # Get question type
    qt_type_col = f'question_type.{q_name}_question_type'
    q_type = str(df[qt_type_col].iloc[0]) if qt_type_col in df.columns else 'unknown'

    # Get question options
    qo_col = f'question_options.{q_name}_question_options'
    q_options = str(df[qo_col].iloc[0]) if qo_col in df.columns and df[qo_col].notna().any() else None

    study_design += f"#### {q_name} ({q_type})\n\n"
    study_design += f"**Template:** {template}\n\n"
    if q_options and q_options != 'nan':
        study_design += f"**Options:** {q_options}\n\n"

    # Show realized versions if scenarios exist
    # Use the prompt column which has the fully realized text
    prompt_col = f'prompt.{q_name}_user_prompt'
    if prompt_col in df.columns and scenario_cols:
        meaningful_scenario_cols = [c for c in scenario_cols
                                     if not c.endswith('_index') and df[c].nunique() > 1]
        if meaningful_scenario_cols:
            # Get unique realized prompts per scenario combination
            unique_prompts = df.groupby(meaningful_scenario_cols)[prompt_col].first().reset_index()
            if len(unique_prompts) > 1:  # Only show if there are actual variations
                study_design += "**Realized versions by scenario:**\n\n"
                study_design += "| " + " | ".join(c.replace('scenario.', '') for c in meaningful_scenario_cols) + " | Question Text |\n"
                study_design += "| " + " | ".join("---" for _ in meaningful_scenario_cols) + " | --- |\n"
                for _, row in unique_prompts.iterrows():
                    conditions = " | ".join(str(row[c]) for c in meaningful_scenario_cols)
                    # Truncate very long prompts for the table; show key differences
                    prompt_text = str(row[prompt_col]).replace('\n', ' ').replace('|', '\\|')
                    if len(prompt_text) > 200:
                        prompt_text = prompt_text[:200] + "..."
                    study_design += f"| {conditions} | {prompt_text} |\n"
                study_design += "\n"

    study_design += "\n"

# --- Scenario Variables section ---
if scenario_cols:
    study_design += "### Scenario Variables\n\n"
    study_design += "| Variable | # Unique | Values |\n"
    study_design += "|----------|----------|--------|\n"
    for col in scenario_cols:
        if not col.endswith('_index'):
            unique_vals = df[col].dropna().unique()
            vals_str = ", ".join(str(v) for v in sorted(unique_vals, key=str))
            if len(vals_str) > 150:
                vals_str = vals_str[:150] + "..."
            study_design += f"| {col.replace('scenario.', '')} | {len(unique_vals)} | {vals_str} |\n"
    study_design += "\n"

    # --- Scenario Matrix ---
    meaningful_scenario_cols = [c for c in scenario_cols
                                 if not c.endswith('_index') and not c.endswith('_id') and df[c].nunique() > 1]
    if len(meaningful_scenario_cols) >= 2:
        study_design += "### Scenario Matrix\n\n"
        design_matrix = df.groupby(meaningful_scenario_cols).size().reset_index(name='n_observations')
        study_design += design_matrix.to_markdown(index=False) + "\n\n"

# --- Agents / Models section ---
study_design += "### Agents / Models\n\n"
if 'model.model' in df.columns:
    model_info = df.groupby('model.model').size().reset_index(name='n_responses')
    study_design += "| Model | Responses |\n"
    study_design += "|-------|-----------|\n"
    for _, row in model_info.iterrows():
        study_design += f"| {row['model.model']} | {row['n_responses']} |\n"
    study_design += "\n"
    # Show model config
    config_cols = [c for c in model_cols if c not in ['model.model', 'model.model_index'] and df[c].nunique() == 1]
    if config_cols:
        study_design += "**Model configuration:** "
        configs = [f"{c.replace('model.', '')}={df[c].iloc[0]}" for c in config_cols]
        study_design += ", ".join(configs) + "\n\n"

if has_meaningful_agents:
    agent_info = df.groupby('agent.agent_name').size().reset_index(name='n_responses')
    study_design += "| Agent | Responses |\n"
    study_design += "|-------|-----------|\n"
    for _, row in agent_info.iterrows():
        study_design += f"| {row['agent.agent_name']} | {row['n_responses']} |\n"
    study_design += "\n"

# === GENERATE REPORT ===
report = f"""# Results Analysis Report

Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

{study_design}

## Data Summary

- **Total responses**: {len(df)}
- **Questions**: {len(answer_cols)}
- **Agent traits**: {len(agent_cols)} ({', '.join(agent_cols) if agent_cols else 'None'})
- **Scenario variables**: {len(scenario_cols)} ({', '.join(scenario_cols) if scenario_cols else 'None'})

## Response Distributions

"""

# Add distribution for each answer column
for col in answer_cols:
    question_name = col.replace('answer.', '')
    value_counts = df[col].value_counts()

    report += f"### {question_name}\n\n"
    report += "| Response | Count | Percentage |\n"
    report += "|----------|-------|------------|\n"
    for val, count in value_counts.items():
        pct = count / len(df) * 100
        report += f"| {val} | {count} | {pct:.1f}% |\n"
    report += "\n"

    # Generate chart
    if len(value_counts) <= 20:  # Only plot if reasonable number of categories
        fig, ax = plt.subplots(figsize=(10, 6))
        value_counts.plot(kind='bar', ax=ax)
        ax.set_title(f'Response Distribution: {question_name}')
        ax.set_xlabel('Response')
        ax.set_ylabel('Count')
        plt.xticks(rotation=45, ha='right')
        plt.tight_layout()
        chart_path = f"{question_name}_distribution.png"
        plt.savefig(f'{output_dir}/{chart_path}', dpi=150)
        plt.close()
        report += f"![{question_name} distribution]({chart_path})\n\n"

# Only add agent analysis if agents have meaningful names
if has_meaningful_agents:
    report += """## Analysis by Agent

"""
    # Add per-agent breakdowns here
    for col in answer_cols:
        question_name = col.replace('answer.', '')
        crosstab = pd.crosstab(df['agent.agent_name'], df[col], normalize='index') * 100
        report += f"### {question_name} by Agent\n\n"
        report += crosstab.to_markdown() + "\n\n"

report += """## Key Findings

[Add key findings based on the analysis]

## Methodology Notes

This analysis was generated from EDSL Results data. The survey was administered to AI agents
using the Expected Parrot platform.
"""

# Save report
with open(f"{output_dir}/report.md", "w") as f:
    f.write(report)

# Generate HTML report with pandoc (no --metadata title to avoid duplicate)
import subprocess
css_file = os.path.expanduser("~/tools/ep/skills/assets/report.css")
subprocess.run([
    "pandoc",
    f"{output_dir}/report.md",
    "-o", f"{output_dir}/report.html",
    f"--css={css_file}",
    "--standalone"
], check=True)

print(f"Analysis complete! Output saved to: {output_dir}/")
print(f"  - survey.md")
print(f"  - survey.mermaid")
print(f"  - results.csv")
print(f"  - report.md")
print(f"  - report.html")

File	Description
`survey.md`	Human-readable survey documentation with questions, options, and rules
`survey.mermaid`	Mermaid diagram showing survey flow and skip logic
`results.csv`	Full results data in CSV format for analysis
`report.md`	Comprehensive analysis report with findings and visualizations
`report.html`	Styled HTML report (via pandoc with Expected Parrot CSS)
`*.png`	Charts and visualizations referenced in the report
`analysis.py`	(Optional) Reproducible Python script for the analysis

# Compare responses across scenarios
pd.crosstab(df['scenario.condition'], df['answer.question_name'], normalize='index')

# First check if agents have meaningful names (not UUIDs)
import re
def is_uuid(s):
    uuid_pattern = r'^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
    return bool(re.match(uuid_pattern, str(s).lower()))

# Only analyze by agent if names are meaningful
if not all(is_uuid(name) for name in df['agent.agent_name'].dropna().unique()):
    df.groupby('agent.agent_name')['answer.question_name'].value_counts(normalize=True)

# For numeric responses
df[[c for c in answer_cols if df[c].dtype in ['int64', 'float64']]].corr()

Analyze Results

Analyze EDSL Results

Usage

Workflow

Analyze Results

Analyze EDSL Results

Usage

Workflow

1. Parse the Input

2. Load the Results

3. Create Output Directory and Ask about Report Focus

4. Export Documentation Files

5. Initial Data Exploration

6. Generate Analysis Report

Structure

Generating the Study Design Section

Generate Visualizations (Inline with Questions)

7. Save All Outputs

8. Generate HTML Report with Pandoc

Complete Example Script

Output Files

Output options

Tips

Common Analysis Patterns

Cross-tabulation by Scenario

Agent Trait Analysis

Response Correlation

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling