Transcribe page scans from pages_760w/ into pages_txt/. Use when the user wants to transcribe pages from the 1702 La Circe scans.
Transcribe page images from pages_760w/ into plain text files in pages_txt/.
$ARGUMENTS[0] — start page number (e.g., 0030), required$ARGUMENTS[1] — end page number (e.g., 0039), optional (defaults to start + 9, i.e., 10 pages)Page numbers are the 4-digit suffixes from the filenames, e.g., b30535827_0030.jpg.
Each output file is plain text. No markdown formatting except *asterisks* for italics.
[6 — Ulysses, Circe, — Dial. I.] or [— Oister, and Mole. — 7]
Use em-dashes to separate the three zones (left, center, right). Empty zones
still get the dash: [8 — Ulysses, Circe, —]*Ulysses.* To hear thee talk thus...
This matches the original printing, where the speaker name begins the paragraph in
italics. Do NOT put speaker names on a separate line from the dialogue.*Oister.**asterisks*— to mark
continuation (the next page's file should start with the remainder)[* footnote text here]---
on its own line.pages_txt/ to avoid re-doing workpages_760w/ (read up to 5 at a time in parallel)pages_txt/b30535827_NNNN.txt[?]....since she sends us
into
Then the next page starts: into the World Cloathed...
Note the blank line before the catch-word, consistent with the spacing between all
other elements in the file.