スキル内容

Builds lexicon_v2.db from raw linguistic data sources. This is a one-time operation — once built, updates and enrichments are managed via the metaforge-pipeline-management skill.

Optionally outputs a snapshot as PRE_ENRICH.sql. Use this full-text dump of lexicon_v2.db to restore the database to its post-import state before any Metaforge-specific modications.

See also: data-pipeline/CLAUDE.md for architecture overview and key concepts.

Data Sources

Source	Description	Expected Path	Download URL	Licence
OEWN (via sqlunet)	Synsets, lemmas, relations	`data-pipeline/raw/sqlunet_master.db`	TODO	TODO
Brysbaert GPT Familiarity	Word familiarity ratings	`data-pipeline/input/multilex-en/*.xlsx`	TODO	TODO
SUBTLEX-UK	Subtitle word frequencies

data-pipeline/input/subtlex-uk/*.xlsx

Download raw data files and place them at the expected paths above

Python venv at the worktree root:

python3 -m venv .venv && source .venv/bin/activate
pip install -r data-pipeline/requirements.txt

FastText vectors symlinked: data-pipeline/raw/wiki-news-300d-1M.vec → ~/.local/share/metaforge/wiki-news-300d-1M.vec

source .venv/bin/activate

# Build only
./data-pipeline/import_raw.sh

# Build and dump PRE_ENRICH.sql baseline
./data-pipeline/import_raw.sh --dump

Table	Expected Count
synsets	~120,000
lemmas	~160,000
relations	~80,000
frequencies	~60,000
syntagms	~35,000
vn_classes	~400
property_vocab_curated	35,000
property_antonyms	~576
enrichment	0
property_vocabulary	0
synset_properties	0

File	Purpose
`data-pipeline/SCHEMA.sql`	Canonical DDL — all CREATE TABLE + CREATE INDEX statements
`data-pipeline/import_raw.sh`	Build orchestrator
`data-pipeline/output/PRE_ENRICH.sql`	Committed baseline dump (base data + empty enrichment schema)
`data-pipeline/scripts/utils.py`	Shared constants including hardcoded paths for raw data files

Metaforge Pipeline Creation | Skills Pool

Metaforge Pipeline Creation

Metaforge Pipeline Creation

Data Sources

Prerequisites

Build

Verification

Key Files

Notes

Notion

Feishu Wiki

Gemini

Obsidian Vault Maintainer

Openclaw Pr Maintainer

Wiki Maintainer