Download books from Project Gutenberg into the Alejandría corpus. Split into chapters, reflow text, parse footnotes, generate .meta.json.
Download public-domain books from Project Gutenberg, process them into
corpus-ready chapters (.txt + .meta.json), and place them in corpus/en/manuals/.
Before adding books, use the /book-discovery skill to check MTP catalog first,
then Gutenberg, then archive.org. This avoids wasted effort on OCR when clean
transcriptions exist.
Run python scripts/download_gutenberg.py --list-books for the full list. Key entries:
| ID | Slug | Author | Notes |
|---|---|---|---|
| 42238 | articles-of-faith | Talmage | 24 lectures |
| 35514 | great-apostasy | Talmage | 10 chapters |
| 45149 | house-of-the-lord | Talmage |
| ~10 chapters |
| 47182 | vitality-of-mormonism | Talmage | 104 essays |
| 74447 | discourses-brigham-young | BY/Widtsoe | 42 chapters |
| 46202 | new-witness-for-god-vol1 | Roberts | 18 chapters |
| 47316 | new-witnesses-for-god-vol2 | Roberts | 37 chapters |
| 59951 | new-witnesses-for-god-vol3 | Roberts | 38 chapters |
| 52391 | outlines-ecclesiastical-history | Roberts | sequential |
| 49526 | missouri-persecutions | Roberts | 22 chapters |
| 35974 | corianton | Roberts | word-numbered chapters |
| 60235 | seventys-course-theology-1st | Roberts | sequential |
| 60490 | seventys-course-theology-2nd | Roberts | sequential |
| 60575 | seventys-course-theology-3rd | Roberts | sequential |
| 60491 | seventys-course-theology-4th | Roberts | sequential |
| 60492 | seventys-course-theology-5th | Roberts | 5 chapters |
| 50302 | rise-and-fall-of-nauvoo | Roberts | 45 chapters |
| 45464 | mormon-doctrine-of-deity | Roberts | 7 chapters |
| 45303 | life-of-john-taylor | Roberts | 46 chapters |
| 47091 | history-of-the-church-vol1 | Smith/Roberts | 48 chapters |
| 47192 | history-of-the-church-vol2 | Smith/Roberts | sequential |
| 47316 | history-of-the-church-vol3 | Smith/Roberts | 15 chapters |
| 60757 | history-of-the-church-vol4 | Smith/Roberts | 30 chapters |
| 60706 | history-of-the-church-vol5 | Smith/Roberts | 32 chapters |
| 60758 | history-of-the-church-vol6 | Smith/Roberts | 12 chapters |
| 45054 | essentials-in-church-history | J.F. Smith | 54 chapters |
| 45619 | history-of-prophet-joseph-by-his-mother | Lucy M. Smith | 54 chapters |
| 44896 | autobiography-parley-p-pratt | P.P. Pratt | 54 chapters |
| 47109 | gospel-doctrine | Joseph F. Smith | 25 chapters |
| 35333 | life-of-heber-c-kimball | O.F. Whitney | 66 chapters |
| 44941 | government-of-god | John Taylor | 12 chapters |
| 46028 | leaves-from-my-journal | W. Woodruff | 28 chapters |
| 47703 | wilford-woodruff-fourth-president | M.F. Cowley | 56 chapters |
| 47519 | heber-c-kimball-journal | H.C. Kimball | 17 chapters |
| 45051 | william-clayton-journal | W. Clayton | 18 monthly sections |
| 47708 | biography-of-lorenzo-snow | Eliza R. Snow | 87 chapters+letters |
| 2443 | story-of-the-mormons | W.A. Linn | 81 chapters (6 books) |
| 46783 | early-scenes-church-history | Various | 17 chapters |
| 51730 | life-of-david-w-patten | L.A. Wilson | 8 chapters |
# List available pre-configured books
python scripts/download_gutenberg.py --list-books
# Download a single book
python scripts/download_gutenberg.py --book-id 42238
# Download all pre-configured books
python scripts/download_gutenberg.py --book-id 42238 35514 45149 47182 74447
# Dry run (show what would be downloaded)
python scripts/download_gutenberg.py --book-id 42238 --dry-run
# Any Gutenberg book (fetches metadata from Gutendex API)
python scripts/download_gutenberg.py --book-id 12345
gutenberg.org/files/{id}/{id}-0.txt_italic_, =bold=, {page})[Footnote N: ...]){NN}-chapter-{NN}.txt + .meta.json per chaptercorpus/en/manuals/{slug}/
01-chapter-1.txt
01-chapter-1.meta.json
02-chapter-2.txt
...
The files are in corpus/ but NOT indexed. To index them, run incremental
ingestion when the Alejandría container is running:
curl -X POST http://localhost:4300/index/ingest
Edit BOOK_CONFIGS in scripts/download_gutenberg.py and add: