技能檔案

Using ba

Name: Using ba
Author: thorwhalen

Guide for using the ba (Bayesian Association) library to analyze categorical data. Use this skill whenever someone wants to explore associations in a dataset, compute contingency table metrics, run Bayesian inference on binary or categorical data, perform QCA (Qualitative Comparative Analysis), mine association rules, choose priors for small-sample analysis, or interpret Bayes factors and credible intervals. Also triggers on questions about odds ratios, relative risk, phi coefficient, lift, support, confidence, truth tables, Boolean minimization, or prior sensitivity analysis.

thorwhalen0 星標2026年4月1日

職業
分類: 數據分析

技能內容

ba analyzes associations in categorical data with Bayesian uncertainty quantification. It's especially useful when sample sizes are small (n=10–50) and point estimates alone are unreliable.

Quick Start

import ba

# One-liner: analyze all pairwise associations
result = ba.analyze(df, outcome='retained_custody')
result.summary()          # metrics + Bayesian CIs for every pair
result.top_pairs(5)       # strongest, most certain associations

The Three Tiers

Tier 1: Façade (start here)

# Analyze everything at once
result = ba.analyze(df, outcome='Y')
result.summary()
result.top_pairs(5, sort_by='bayes_factor')

# With association rules
result = ba.analyze(df, outcome='Y', rules=True)
result.top_rules(10)

ba.analyze() automatically detects variable types, computes all pairwise contingency tables, runs Bayesian inference with Jeffreys prior, computes appropriate metrics, and flags small-sample warnings.

相關技能

Using ba | Skills Pool

# Single contingency table
ct = ba.contingency_table(a=10, b=5, c=3, d=12)
ct.odds_ratio       # 8.0
ct.phi               # 0.471
ct.metrics(['lift', 'phi', 'fisher_p'])

# From a DataFrame
ct = ba.from_dataframe(df, 'treatment', 'outcome')

# Bayesian posterior
post = ba.bayesian.posterior(ct, prior='jeffreys')
post.credible_interval['risk_difference']  # (0.12, 0.71)
post.prob_gt(0.0, 'risk_difference')       # P(RD > 0 | data)

# Bayes factor
bf = ba.bayesian.bayes_factor(ct)  # BF > 1 favors association

# Prior sensitivity
ba.bayesian.sensitivity(ct, priors=['jeffreys', 'uniform', 'beta(2,2)'])

# QCA
binary_df = ba.qca.calibrate(df, {'age': 30, 'illness': 'any_present'})
tt = ba.qca.truth_table(binary_df, 'Y', ['A', 'B', 'C'])
solution = ba.qca.minimize(tt)
ba.qca.necessity(binary_df, 'Y', ['A', 'B'])

# Association rules
rules = ba.rules.mine(df, min_support=0.1, outcome='Y')

from ba.core import ContingencyTable, registry
from ba.core.pot import to_contingency, from_contingency
from ba.bayesian.priors import from_mean_kappa, from_quantiles

Prior	Code	ESS	When to use
Jeffreys	`'jeffreys'`	1	Default. Minimal influence.
Uniform	`'uniform'`	2	Transparent, slightly more regularized.
Beta(2,2)	`'beta(2,2)'`	4	Prevents boundary estimates (0 or 1).
Custom mean+strength	`from_mean_kappa(0.3, 10)`	10	"I think it's about 30%."
Custom interval	`from_quantiles(0.2, 0.05, 0.6, 0.95)`	varies	"I'm 90% sure it's between 0.2 and 0.6."
Imaginary data	`from_counts(2, 8)`	11	"Imagine 2 successes and 8 failures."

store = ba.DataStore(df)
store.vars.treatment          # 'treatment' (attribute access)
store.vars.binary()           # list of binary columns
ct = store.contingency('treatment', 'outcome')  # cached
pairs = store.all_pairs(outcome='Y')            # all pairs with Y

# Pot algebra (requires spyn)
joint = store.pot('treatment', 'outcome')
conditional = joint / 'treatment'   # P(outcome | treatment)
marginal = joint['outcome']         # marginalize to outcome

# 1. Binarize
binary_df = ba.qca.calibrate(df, {
    'age': 30,                    # >= 30 → 1
    'illness': 'any_present',     # truthy → 1
    'score': 'median',            # >= median → 1
    'custom': lambda x: x > 100,  # custom function
})

# 2. Build truth table (choose 3-5 conditions — not all 28!)
tt = ba.qca.truth_table(binary_df, 'Y', ['A', 'B', 'C'], n_cut=2)

# 3. Minimize
solution = ba.qca.minimize(tt)
print(solution.expression)  # e.g., "A*B + ~A*C"

# 4. Necessity/sufficiency
ba.qca.necessity(binary_df, 'Y', ['A', 'B', 'C'])
ba.qca.sufficiency(binary_df, 'Y', ['A', 'B', 'C'])

# Register a custom measure
def my_measure(ct):
    return ct.counts[0, 0] / ct.n

ba.measures.register('my_metric', my_measure, description='top-left proportion')

# Use it
ct.metrics(['my_metric', 'lift', 'phi'])

from ba.sample_data import custody_data, market_basket

df = custody_data()   # 13 cases, 7 binary columns
df = market_basket()  # 20 transactions, 4 items

BF₁₀	Evidence
> 10	Strong for association
3–10	Moderate
1–3	Anecdotal
1/3–1	Anecdotal for independence
< 1/3	Moderate for independence

Using ba

Quick Start

The Three Tiers

Tier 1: Façade (start here)

Using ba

Quick Start

The Three Tiers

Tier 1: Façade (start here)

Tier 2: Paradigm (when you know what you want)

Tier 3: Primitives (full control)

Choosing a Prior

Interpreting Bayes Factors

Working with the DataStore

QCA Workflow

Small-Sample Warnings

Extending the Measure Registry

Sample Data

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling