Trace foundation affiliation for each repo using script (cache + heuristics) and LLM Web Search for remaining unknowns
Determine which repos belong to open-source foundations. Uses a 3-layer strategy: cache → heuristics → LLM.
output/repo_exp.csv — expanded repo list (from step ⑧)Run the cache builder script to fetch project lists from structured data sources (official APIs and curated lists):
python3 scripts/build_foundation_cache.py -o output/.cache/foundation_projects.json --summary
The script fetches from:
projects.apache.org/projects.json (official JSON API)landscape.cncf.io/api/items (Landscape API)projects.eclipse.org/api/projects (Eclipse API)To merge new projects with an existing cache (preserving LLM-discovered entries):
python3 scripts/build_foundation_cache.py --merge -o output/.cache/foundation_projects.json --summary
python3 scripts/trace_foundations.py output/repo_exp.csv -o output/foundation.csv --summary
The script applies 3 layers:
apache/* → Apache, kubernetes/* → CNCF)foundation_name=unknown for LLM processingFor repos still marked unknown:
"{project_name}" foundation OR governancefoundation_name — foundation name, or none if not affiliatedevidence — actual URLs and facts from searchconfidence — S/A/B/CIMPORTANT: If LLM discovers a previously unknown foundation with projects:
output/.cache/foundation_projects.jsonscripts/trace_foundations.py's ORG_FOUNDATION_MAPShow results grouped by foundation:
User reviews and may correct assignments. Update output/foundation.csv with corrections.
Print:
none)→ Can run in parallel with /trace-companies