Name: Statistics For Product Analytics
Author: moshesham

スキルを検索.../

Statistics For Product Analytics | Skills Pool

1. State H₀ (null hypothesis) and H₁ (alternative hypothesis)
2. Choose significance level α (typically 0.05)
3. Choose the test statistic based on data type and assumptions
4. Compute the test statistic and p-value
5. Compare p-value to α → reject or fail to reject H₀
6. Interpret in business terms

Scenario	Test	Python Function
Compare two proportions (conversion rates)	Two-proportion z-test	`statsmodels.stats.proportion.proportions_ztest`
Compare two means (revenue per user)	Two-sample t-test	`scipy.stats.ttest_ind`
Compare means, paired data (before/after)	Paired t-test	`scipy.stats.ttest_rel`
Compare proportions across categories	Chi-squared test	`scipy.stats.chi2_contingency`
Non-normal data, compare medians	Mann-Whitney U test	`scipy.stats.mannwhitneyu`
Compare more than 2 groups	ANOVA / Kruskal-Wallis	`scipy.stats.f_oneway` / `kruskal`

import numpy as np
from scipy import stats


def two_proportion_ztest(
    successes_a: int, n_a: int,
    successes_b: int, n_b: int,
    alpha: float = 0.05
) -> dict:
    """Two-proportion z-test for comparing conversion rates.
    
    Args:
        successes_a: Number of conversions in control
        n_a: Total users in control
        successes_b: Number of conversions in treatment
        n_b: Total users in treatment
        alpha: Significance level
    
    Returns:
        Dictionary with test results and interpretation
    """
    p_a = successes_a / n_a
    p_b = successes_b / n_b
    p_pool = (successes_a + successes_b) / (n_a + n_b)
    
    se = np.sqrt(p_pool * (1 - p_pool) * (1/n_a + 1/n_b))
    z_stat = (p_b - p_a) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))  # two-tailed
    
    # Confidence interval for the difference
    se_diff = np.sqrt(p_a * (1 - p_a) / n_a + p_b * (1 - p_b) / n_b)
    z_crit = stats.norm.ppf(1 - alpha / 2)
    ci_lower = (p_b - p_a) - z_crit * se_diff
    ci_upper = (p_b - p_a) + z_crit * se_diff
    
    return {
        "control_rate": round(p_a, 4),
        "treatment_rate": round(p_b, 4),
        "absolute_lift": round(p_b - p_a, 4),
        "relative_lift_pct": round(100 * (p_b - p_a) / p_a, 2) if p_a > 0 else None,
        "z_statistic": round(z_stat, 3),
        "p_value": round(p_value, 4),
        "significant": p_value < alpha,
        "ci_95": (round(ci_lower, 4), round(ci_upper, 4)),
    }


def two_sample_ttest(
    values_a: np.ndarray,
    values_b: np.ndarray,
    alpha: float = 0.05,
    equal_var: bool = False
) -> dict:
    """Two-sample t-test for comparing means (e.g., revenue per user).
    
    Uses Welch's t-test (equal_var=False) by default since equal
    variance assumption rarely holds in practice.
    """
    t_stat, p_value = stats.ttest_ind(values_a, values_b, equal_var=equal_var)
    
    mean_a, mean_b = np.mean(values_a), np.mean(values_b)
    se_diff = np.sqrt(np.var(values_a, ddof=1)/len(values_a) + np.var(values_b, ddof=1)/len(values_b))
    t_crit = stats.t.ppf(1 - alpha/2, df=min(len(values_a), len(values_b)) - 1)
    
    return {
        "mean_control": round(float(mean_a), 4),
        "mean_treatment": round(float(mean_b), 4),
        "difference": round(float(mean_b - mean_a), 4),
        "t_statistic": round(float(t_stat), 3),
        "p_value": round(float(p_value), 4),
        "significant": p_value < alpha,
        "ci_95": (
            round(float((mean_b - mean_a) - t_crit * se_diff), 4),
            round(float((mean_b - mean_a) + t_crit * se_diff), 4),
        ),
    }

def required_sample_size(
    baseline_rate: float,
    minimum_detectable_effect: float,
    alpha: float = 0.05,
    power: float = 0.80,
) -> int:
    """Calculate required sample size per group for a two-proportion test.
    
    Args:
        baseline_rate: Current conversion rate (e.g., 0.10 for 10%)
        minimum_detectable_effect: Minimum relative change to detect (e.g., 0.05 for 5% relative lift)
        alpha: Significance level
        power: Statistical power (1 - β)
    
    Returns:
        Required sample size per group
    """
    p1 = baseline_rate
    p2 = baseline_rate * (1 + minimum_detectable_effect)
    
    z_alpha = stats.norm.ppf(1 - alpha / 2)
    z_beta = stats.norm.ppf(power)
    
    p_avg = (p1 + p2) / 2
    
    n = ((z_alpha * np.sqrt(2 * p_avg * (1 - p_avg)) +
          z_beta * np.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) /
         (p2 - p1)) ** 2
    
    return int(np.ceil(n))


# Example: 10% baseline conversion, want to detect 5% relative lift
# required_sample_size(0.10, 0.05) → ~31,234 per group

Baseline Rate	MDE (relative)	α=0.05, Power=0.80	Duration at 10K users/day
5%	10%	~30,400/group	~6 days
5%	5%	~121,600/group	~24 days
10%	10%	~14,300/group	~3 days
10%	5%	~57,200/group	~11 days
20%	10%	~6,400/group	~1.3 days
20%	5%	~25,600/group	~5 days

def bootstrap_ci(
    data: np.ndarray,
    statistic: callable = np.mean,
    n_bootstrap: int = 10000,
    ci_level: float = 0.95,
    seed: int = 42
) -> tuple[float, float, float]:
    """Compute bootstrap confidence interval for any statistic.
    
    Returns:
        Tuple of (point_estimate, ci_lower, ci_upper)
    """
    rng = np.random.default_rng(seed)
    boot_stats = np.array([
        statistic(rng.choice(data, size=len(data), replace=True))
        for _ in range(n_bootstrap)
    ])
    
    alpha = 1 - ci_level
    ci_lower = np.percentile(boot_stats, 100 * alpha / 2)
    ci_upper = np.percentile(boot_stats, 100 * (1 - alpha / 2))
    
    return float(statistic(data)), float(ci_lower), float(ci_upper)

Distribution	Common Use	Example
Normal	Means of large samples (CLT)	Average session duration
Binomial	Success/failure outcomes	Conversion rate
Poisson	Count data with known average rate	Support tickets per day
Exponential	Time between events	Time between purchases
Log-normal	Right-skewed positive values	Revenue per user, session length
Power law	Heavy-tailed distributions	Content virality, page views

Statistics For Product Analytics

Overview

When to Use This Skill

Statistics For Product Analytics

Overview

When to Use This Skill

Hypothesis Testing

Framework

Choosing the Right Test

Python Implementation

Power Analysis

Sample Size Calculation

Power Analysis Decision Table

Confidence Intervals

Correct Interpretation

Bootstrap Confidence Intervals

Common Statistical Pitfalls

1. P-hacking / Multiple Testing

2. Peeking at Results

3. Simpson's Paradox

4. Survivorship Bias

5. Confusing Statistical and Practical Significance

Distributions in Product Analytics

Best Practices

Additional Resources

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling