Knowledge Discovery & Graphs
Purpose
Discover hidden patterns, build knowledge graphs, and extract novel insights from structured and unstructured data.
Key Datasets
- WALS (wals.info): World Atlas of Language Structures — 192 linguistic features across 2,679 languages in CLDF format (CC-BY 4.0)
- HistWords (nlp.stanford.edu/projects/histwords): Historical word embeddings tracking semantic change across 4 languages over centuries (.npy/.pkl format)
Protocol
- Data exploration — Profile data, identify patterns, check distributions
- Feature engineering — Create derived features, temporal features, cross-references
- Pattern detection — Apply clustering, association rules, anomaly detection
- Knowledge graph construction — Build entity-relation graphs from discovered patterns