从零构建机构级三表模型(IS/BS/CF)— 完整公式联动、季度/半年/年频自适应、IFRS/US GAAP/中国准则。 触发词:三表模型、financial model、3-statement、建模、从零建模、收入预测。 ❌ 填写已有模板请用 financial-analysis:3-statements
This skill builds: A complete institutional-grade 3-statement financial model (IS / BS / CF) in Excel, with full formula linkage, zero hardcoded forecast cells, and 9-step QC validation.
Prerequisites — install before starting:
pip install openpyxl yfinance pandas
pip install notebooklm # optional — only needed if you have a NotebookLM notebook
How to trigger: Just say "建个三表模型" / "build a 3-statement model for [Company]" and the skill guides you step by step.
What to prepare:
BABA, 0700.HK, 600519.SS)⚠️ Data Source Guide — Read Before Starting
| Option | Setup | Token Cost | Recommended? |
|---|---|---|---|
| NotebookLM notebook (annual reports / prospectus pre-loaded) | One-time OAuth auth setup | Very low — NLM handles the PDF; Claude only receives answers | ✅ Best path |
| Excel upload (historical IS/BS/CF already structured) + short PDF excerpts | None | Low | ✅ Good |
| Direct PDF upload (full annual report, prospectus) | None | 🔴 Very high — a single A-share annual report can be 200+ pages | ⚠️ Pro users: avoid |
| Web only (Sina / Yahoo Finance fallback) | None | Low | ✅ Fallback |
Strongly recommended for new users: set up NotebookLM first. The one-time auth flow takes ~5 minutes and saves significant token consumption for every future model:
pip install notebooklm
# Run once interactively — browser will open for Google OAuth
python3 -c "
import asyncio
from notebooklm import NotebookLMClient
async def auth():
async with await NotebookLMClient.from_storage() as client:
nbs = await client.notebooks.list()
print(f'Auth OK — {len(nbs)} notebooks accessible')
asyncio.run(auth())
"
# After login, session is saved to ~/.notebooklm/storage_state.json
# You only need to do this once (re-auth if session expires, typically ~7 days)
If you skip NLM: the most token-efficient alternative is to upload a clean Excel file with historical IS/BS/CF (3–5 years) plus any short PDF excerpts (earnings release, key pages only). Uploading a full 300-page annual report in one go is the worst option for token consumption — especially on Claude Pro.
5-session build flow:
SESSION A — Raw data extraction + Assumptions setup (~15–30 min)
SESSION B — Income Statement (~15–20 min)
SESSION C — Balance Sheet (~15–20 min)
SESSION D — Cash Flow + reconciliation checks (~15–25 min)
SESSION E — Returns + Cross-check + Summary (~15–20 min)
State is preserved in the Excel _State tab between sessions. If a session is interrupted, the startup protocol automatically resumes from the last checkpoint.
Supported: CN GAAP / IFRS / US GAAP · Quarterly / Semi-annual / Annual (auto-detected)
To prevent state loss after context compaction, add the following block to your Claude User Preferences (Settings → Profile → Custom Instructions, or your CLAUDE.md file):
## 3-Statements-Ultra — Compaction Recovery Protocol
When I am building a 3-statement financial model using the 3-statements-ultra skill,
after ANY context compaction event, you MUST:
1. Re-read the 3-statements-ultra SKILL.md before writing any code
2. Open the Excel model file and read the _State tab — determine exact resume point
3. Read _model_log.md — recover prior session outputs (key totals, check results)
4. Read _pending_links.json — check if BS→CF back-fill is pending
5. Run RAW_MAP + ASM_MAP spot-check before using any row numbers
6. Never hardcode any IS/BS/CF forecast cell — every cell must be a string formula
7. Resume from the next incomplete step only — never re-run completed sections
Do not rely on conversation memory for row numbers or intermediate calculation results.
Disk state (_State, _model_log.md, _pending_links.json) is always authoritative.
This prevents the most common failure mode: resuming from memory after compaction and writing wrong row numbers or hardcoded values.
THE #1 FAILURE MODE OF THIS SKILL IS HARDCODED FORECAST CELLS.
CORRECT — historical cell: ws["B5"].value = '=Raw_Info!C12'
CORRECT — forecast cell: ws["E5"].value = '=Assumptions!B8'
BROKEN: ws["E5"].value = 0.30 ← hardcode, model is dead
BROKEN: ws["E5"].value = float(revenue * 0.30) ← hardcode, model is dead
BROKEN: ws["E5"].value = prev_value * growth ← hardcode, model is dead
Formula reference rules — two modes, never mix:
=
ws["B5"].value = f"=Raw_Info!C{raw_map['IS: Revenue']}"ws["E5"].value = f"=Assumptions!B{asm_map['Revenue YoY %']}"=
ws["E5"].value = f"=D5*(1+Assumptions!B{asm_map['Revenue YoY %']})"f"=D5*(1+=Assumptions!B2)" → Excel parse error (extra = inside)Before writing Python code for any IS / BS / CF tab, run this self-check:
=, referencing Assumptions=Raw_Info!The ONLY cells permitted to hold hardcoded numeric values:
After context compaction fires mid-session, BEFORE writing any more code:
Step 1: Re-read this SKILL.md ← re-internalize Rule Zero + Code Gen Protocol
Step 2: Open Excel file → read _State tab
Step 3: Read current session's last completed step from _State
Step 4: Read RAW_MAP + ASM_MAP from _State, run spot-check
Step 5: Read _model_log.md ← recover intermediate results (key numbers per tab)
Step 6: Read _pending_links.json ← check if deferred cross-sheet refs need back-fill
Step 7: Continue from the next incomplete step within the current session only
Do NOT re-verify PHASE_DONE of prior sessions — that is the session startup protocol below. Do NOT reload reference files unless actively needed for the current step.
Why Steps 5-6 exist: Context compaction loses intermediate calculation results. Without _model_log.md, you cannot cross-validate the next tab's outputs against prior tabs. Without _pending_links.json, you may forget which BS cells still need CF back-fill references.
THE #2 FAILURE MODE IS GENERATING TOO MUCH CODE AT ONCE.
Quarterly IS tabs produce 300+ lines of Python. A single truncated output = silently broken model.
Rule: Write ONE code block per logical tab section → execute it → verify output → checkpoint to _model_log.md → write the NEXT code block.
Each code block should cover ONE complete tab section:
IS: All Revenue rows | COGS through GP | OpEx block | EBIT+below+NCI+checks
BS: All Current Assets | All Non-Current Assets | All Current Liab | NCL+Equity
CF: One full year at a time (CFO → CFI → CFF → Others → Cash → back-fill)
Target: ~200–350 lines per code block (one tab section).
If a section exceeds 350 lines, split at natural boundaries (e.g., Revenue vs COGS).
NEVER exceed 400 lines in a single code block.
After every code block:
IS_DONE: Revenue)wb.save() completed (file mtime updated)If output is truncated mid-code-block: the partial code was NOT executed. Re-generate that section only (not the full session). Read _State to find where to resume.
Why this limit exists: Quarterly IS = 35 cols × 50 rows = 1750 cell assignments. At ~20 tokens/line, a single script easily hits the 32k-token Bash output cap. The script is silently truncated — no error, no exception — and the Excel file ends up partially written. Tab-section-level blocks prevent this entirely while avoiding the fragmentation of overly small (≤100 line) blocks.
Purpose: After each tab section completes, write key output numbers to _model_log.md.
The next tab section reads from this file for cross-validation — NOT from context memory.
Format:
# 3-Statement Model Log
## SESSION A — Raw_Info + Assumptions
- Completed: 2026-03-28 14:30
- Revenue segments: [Freight matching: ¥12.3B, Commission: ¥2.1B, ...]
- Historical periods: FY2021, FY2022, FY2023, H1 2024
- Unit: 百万元
- Granularity: semi-annual
## SESSION B — IS
### Revenue block
- Completed: 2026-03-28 15:00
- FY2023 Total Rev (model): ¥18,432M → matches Raw_Info ✓
- FY2025E Total Rev: ¥24,100M (YoY +15.2%)
### COGS+GP block
- Completed: 2026-03-28 15:15
- FY2023 GP Margin (model): 52.3% → matches source 52.4% (rounding ±0.1%) ✓
### OpEx block
- ...
### EBIT+below+checks block
- FY2023 NI (model): ¥5,210M → NI CHECK = 0 ✓
- REV CHECK = 0 ✓
## SESSION C — BS
### Current Assets
- Completed: ...
- FY2023 Total CA (model): ¥xx,xxxM
- Cash cell: PLACEHOLDER (pending CF back-fill)
- ⚠ Pending links written to _pending_links.json: [Cash row refs]
### Non-Current Assets
- ...
Rules:
Problem: SESSION C builds BS, but Cash = CF Ending Cash (built in SESSION D). Other cells may also depend on CF outputs (e.g., lease liability changes). Writing placeholder values risks forgetting to back-fill after compaction.
Mechanism:
import json
# --- SESSION C: when writing BS Cash as placeholder ---
pending = []
# For each year's Cash cell:
pending.append({
"target_sheet": "BS",
"target_cell": "E15", # e.g., FY2023 Cash
"formula": "=CF!E42", # will be filled when CF row 42 exists
"depends_on": "CF",
"description": "BS Cash = CF Ending Cash FY2023"
})
# ... repeat for each forecast year
with open("_pending_links.json", "w") as f:
json.dump(pending, f, indent=2)
# Also record in _State:
write_state_key(wb, "PENDING_LINKS", str(len(pending)))
# --- SESSION D: after CF is complete, batch back-fill ---
with open("_pending_links.json") as f:
pending = json.load(f)
for link in pending:
ws = wb[link["target_sheet"]]
ws[link["target_cell"]].value = link["formula"]
print(f"✅ Back-filled: {link['target_sheet']}!{link['target_cell']} = {link['formula']}")
# Clear pending
with open("_pending_links.json", "w") as f:
json.dump([], f)
write_state_key(wb, "PENDING_LINKS", "0")
wb.save(filepath)
Rules:
_pending_links.json with all BS→CF deferred referencesPENDING_LINKS: N to _State (N = count of pending items)_pending_links.json FIRST, then builds CF, then batch back-fillsPENDING_LINKS: 0PENDING_LINKS > 0, STOP — SESSION D incompleteThe model is built across 5 sessions. Each session starts with a clean context, loads
only its own reference file, and communicates state through the _State tab in Excel
plus two sidecar files: _model_log.md and _pending_links.json.
Build principle: Each session writes only what it needs for that session — some sessions build one tab, others build several. Never write all tabs at once within a session. Tabs and sections are added incrementally across sessions, with _State + sidecar files carrying state between them.
PRE-FLIGHT — NLM kickoff 问答 (首次运行必做,结果写入 _State)
└─ included in references/session-a.md
SESSION A — Raw_Info + Assumptions → Read references/session-a.md
SESSION B — IS → Read references/session-b.md
SESSION C — BS (Cash placeholder) → Read references/session-c.md
└─ writes _pending_links.json for Cash + any CF-dependent cells
SESSION D — CF + back-fill _pending_links → Read references/session-d.md
└─ reads _pending_links.json → builds CF → batch back-fills BS → checks
SESSION E — Returns + Cross_Check + Summary → Read references/session-e.md
└─ verifies PENDING_LINKS = 0 before proceeding
Step 7 of session startup = Read ONLY the reference file for THIS session. No others.
Step 1: Re-read this SKILL.md ← re-internalize Rule Zero
Step 2: Open Excel file → read _State tab
⚠ SESSION A first run exception: if the file does not exist yet,
skip Steps 3–11 entirely. Create the file. Go directly to PRE-FLIGHT in session-a.md.
Step 3: Parse _State into dict (format: "KEY: value" per line)
Step 4: Verify FILE_HASH matches current file mtime ← if mismatch, STOP and alert user
Step 5: Verify SKILL_VERSION in _State matches current skill version ← if mismatch, STOP
Step 6: Verify previous session PHASE_DONE exists ← if missing, STOP
(Exception: SESSION A — if no PHASE_DONE at all, run PRE-FLIGHT first
from references/session-a.md before building Raw_Info)
Step 7: Read ONLY references/session-[X].md for this session
Step 8: Read RAW_MAP + ASM_MAP from _State
⚠ SESSION A exception: if "RAW_MAP" key is absent from _State (mid-session crash
before Raw_Info was written), skip Steps 8–11 and resume from session-a.md Step 2.
Step 9: Run spot-check on RAW_MAP + ASM_MAP (see below)
⚠ SESSION A exception: only run spot-check if Raw_Info tab already exists in wb.
Step 10: Read _model_log.md ← recover prior session outputs
Step 11: Read _pending_links.json ← check deferred refs status
⚠ SESSION E: if PENDING_LINKS > 0, STOP — SESSION D did not complete back-fill
Step 12: Continue from next incomplete step only
_State format — one key per row, parsed with:
import openpyxl, os
def read_state(wb):
ws = wb["_State"]
state = {}
for row in ws.iter_rows(values_only=True):
if row[0] and ": " in str(row[0]):
k, v = str(row[0]).split(": ", 1)
state[k.strip()] = v.strip()
return state
def write_state_key(wb, key, value):
ws = wb["_State"]
for row in ws.iter_rows():
if row[0].value and str(row[0].value).startswith(key + ": "):
row[0].value = f"{key}: {value}"
return
ws.append([f"{key}: {value}"])
RAW_MAP / ASM_MAP format in _State:
RAW_MAP: {"IS: Revenue": 15, "IS: COGS": 16, "BS: AR": 28, ...}
ASM_MAP: {"Revenue YoY %": 2, "Gross Margin %": 3, "AR Days": 7, ...}
Parse with json.loads(state["RAW_MAP"]).
RAW_MAP / ASM_MAP spot-check (Step 9) — mandatory before using any map entry:
import json
raw_map = json.loads(state["RAW_MAP"])
asm_map = json.loads(state["ASM_MAP"])
for label, row in list(raw_map.items())[:3]:
actual = ws_raw.cell(row, 1).value
assert actual == label, f"RAW_MAP stale: expected '{label}' at row {row}, got '{actual}' — STOP"
for label, row in list(asm_map.items())[:3]:
actual = ws_asm.cell(row, 1).value
assert actual == label, f"ASM_MAP stale: expected '{label}' at row {row}, got '{actual}' — STOP"
If assertion fails: file was externally edited or _State is stale. Rebuild _State before proceeding.
[1] Summary ← built last; all figures link to model cells
[2] Assumptions ← every forecast driver; single source of truth
[3] IS ← Income Statement
[4] BS ← Balance Sheet
[5] CF ← Cash Flow Statement (indirect)
[6] Returns ← ROE / ROA / ROIC / DuPont
[7] Cross_Check ← assumption validation + revision log
[8] Raw_Info ← source extraction (built first; never re-read source after)
[_State] ← session metadata; last tab; delete after MODEL_COMPLETE
Sidecar files (same directory as Excel):
_model_log.md ← checkpoint results per tab section (append-only)
_pending_links.json ← deferred cross-sheet references (written by C, consumed by D)
R1 — One source: After Raw_Info is complete, never re-read the source. All data flows via =Raw_Info! links.
R2 — No free plugs: Never use = Total_LE − Total_Assets to force-balance BS. One permitted plug only: Non-cash Adj – Others row in CFO (R3). One IS plug for CN GAAP (R8).
R3 — Others plug (non-circular):
CF!Others = BS!Total_LE − BS!Total_NCA − SUM(BS!CA_excl_Cash) − BS!Prior_Cash
− CF!NI − CF!DA − CF!SBC − CF!Impairment − CF!JV
− CF!WC_Total − CF!CFI_Total − CF!CFF_Total
Non-circular by construction. Guarantees BS CHECK = 0 and CF CHECK = 0 simultaneously.
|Others| ≤ 15% of |CFO| → accept ✓|Others| > 15% of |CFO| → material CF line missing; find and model it explicitlyR4 — NCI persists: If historical NCI ≠ 0, it appears in every forecast period.
Attr_to_NCI = NI × NCI_RatioBS NCI_end = Prior + Attr_to_NCI − NCI_DividendsBS Reserves_end = Prior + Attr_to_Owners − Owner_Dividends ← NOT total NIR5 — No double-count in CF: Interest/tax paid are already in NI. No separate CFO rows in forecasts.
R6 — Cash last: All BS items built → CF complete year by year → BS!Cash = CF!Ending_Cash back-filled immediately after each year's CF is complete.
R7 — Every BS Δ has a CF home:
| BS Item | CF Line |
|---|---|
| PP&E net | CFI −Capex + CFO +D&A_PPE only |
| ROU Assets | CFO +ROU amort + CFF −Lease principal; if flat → Others |
| Intangibles | CFI −capex + CFO +amort; if flat → Others |
| LT Investments | CFI other investing |
| DTA / DTL | Others plug |
| Inventory / AR / Contract Assets / Prepayments | CFO WC (asset ↑ = cash out) |
| AP / Contract Liab / Accrued | CFO WC (liab ↑ = cash in) |
| ST + LT Borrowings | CFF draws / repayments |
| Lease Liabilities | CFF lease principal |
| Share Capital / APIC | CFF equity issuance + SBC |
| Reserves | IS Attr. to Owners (rolling) |
| NCI | IS Attr. to NCI (rolling) |
| Cash | CF Ending Cash — back-filled via _pending_links.json in SESSION D |
PP&E note: Use PP&E-specific depreciation rate (e.g. 20%), NOT total D&A. R3 absorbs the residual.
R8 — CN GAAP IS plug:
Historical: Other Op Inc = Source 营业利润 − (Model Rev − COGS − OpEx − Impairment)
Forecast: % of revenue or absolute from Assumptions tab
If |Other Op Inc| > 10% of EBIT → investigate; model the largest component explicitly.
R9 — Four reconciliation checks (all must = 0):
BS CHECK = Total Assets − Total L+ECF CHECK = Ending Cash − BS CashNI CHECK = Model NI − Source NI (tolerance ±10 for quarterly rounding)REV CHECK = Model Total Revenue − Source Total RevenueR10 — Match granularity to finest disclosed data:
| Disclosure pattern | IS / CF granularity | BS granularity |
|---|---|---|
| 10-Q / quarterly 季报 | Quarterly — Q1, Q2, Q3, Q4 | Annual |
| HK 中报 + annual | Semi-annual — H1, H2 | Annual |
| Annual only | Annual | Annual |
BS is always annual. Quarterly CF: Beg_Cash Q1 = prior FY BS Cash; Q2/Q3/Q4 = prior quarter CF Ending Cash (back-filled within SESSION D year-by-year loop).
Revenue: Sub_curr = Sub_prior × (1 + YoY%) | Vol×Price: Rev = Vol × ASP
WC → BS: AR = Rev × AR_Days/365 | Inv = COGS × Inv_Days/365 | AP = COGS × AP_Days/365
CF WC: Δ Asset = −(curr − prior) | Δ Liability = +(curr − prior)
PP&E: Net = Prior × (1 − Depr_Rate) + |Capex| ← PPE-specific rate, NOT total D&A
Equity: Reserves = Prior + Attr_Owners − Div | NCI = Prior + Attr_NCI − Div_NCI
CN GAAP: Other Op Inc = Source 营业利润 − (Rev − COGS − Tax&Surcharge − OpEx − Impairment)
Quarterly: Q_seg = Q_total × (FY_seg / FY_total) | FY = Q1+Q2+Q3+Q4
YTD→Q: Q1=YTD_Q1 · Q2=H1−Q1 · Q3=YTD_Q3−H1 · Q4=FY−YTD_Q3
ROIC: NOPAT / AVERAGE(IC_curr, IC_prior)
Checks: BS: TA−TLE=0 | CF: EndCash−BSCash=0 | NI: Model−Source=0 | Rev: Model−Source=0
=) for embedding inside compound formulas only; standalone (starts with =) for direct cell assignment only; f"=D5*(1+=Assumptions!B2)" is an Excel parse error= TLE − NCA − CA_excl; only R3 Others in CFO (and R8 for CN GAAP IS)=Raw_Info! links{3,6,9,12} = quarterly, {6,12} = semi-annual; do NOT use yfinance as primary data source三表模型完成、QC Suite全部通过后,在同一 Excel 文件内建 _Registry sheet 登记数据来源。
| 数据类型 | 分类 | derived | 处理方式 |
|---|---|---|---|
| Raw_Info 从原始报告提取的历史值(Revenue、COGS、NI 等) | 原始数据 | FALSE | 登记 tier/source/timestamp |
| Assumptions tab 的输入参数(YoY%、Margin%、AR Days 等) | 原始数据 | FALSE | tier 按来源判断(analyst=T4,consensus=T3) |
| IS/BS/CF 中由公式算出的行(GP、EBIT、FCF、Net Debt 等) | 派生数据 | TRUE | 登记 formula + inputs |
| Tier | 含义 |
|---|---|
| T1 | 公司官方原始申报文件(年报、10-K、招股书) |
| T2 | 经审计/交易所数据库(Wind、Bloomberg raw feed) |
| T3 | 市场一致预期(卖方 consensus) |
| T4 | 分析师内部估算(自研假设) |
gross_profit, ebit, ebitda, net_income, adj_eps
cfo_total, cfi_total, cff_total, fcf
net_debt, ev, roic, roe
# 批量登记 _Registry — 纯 openpyxl,无外部脚本依赖
import openpyxl, json
from datetime import date as today_date
# 创建或获取 _Registry sheet
if "_Registry" not in wb.sheetnames:
ws_reg = wb.create_sheet("_Registry")
ws_reg.append(["data_point_id", "value_cell", "tier", "source", "timestamp",
"data_category", "derived", "formula", "inputs"])