Scaffolds a new public dictionary into the liwca registry. Edits registry.txt, fetchers.py, api.rst, test_fetchers.py, and test_remote.py. Use when a contributor wants to add a new remote dictionary.
You are adding a new remote dictionary to the liwca project. The dictionary
name was optionally passed as: $ARGUMENTS
Ask the user for the following. If $ARGUMENTS provides the name, skip that
question. Do NOT ask about file format, categories, or example terms --- you
will auto-detect those in Step 2.
threat, sleep, honor).
This becomes fetch_<name>().[2] reference in the docstring).Download the file and compute its MD5 hash using pooch. Then inspect the file to determine format, categories, and example terms.
import pooch
path = pooch.retrieve("<URL>", known_hash=None)
md5 = "md5:" + pooch.file_hash(path, alg="md5")
print(md5)
After downloading, read and inspect the file to auto-detect:
_EXAMPLES dictFor .dic and .dicx files, read_dx() handles parsing automatically.
For other formats, determine the correct pandas call and any cleanup needed.
All lists in the project are in alphabetical order. Insert new entries at the correct position.
src/liwca/data/registry.txtAdd a line in alphabetical order by filename:
<filename>.<ext> md5:<hash> <url>
src/liwca/fetchers.pyAdd to __all__ --- insert "fetch_<name>" alphabetically.
Add the fetch function in alphabetical order among existing functions.
.dic / .dicx filedef fetch_<name>() -> pd.DataFrame:
"""
Fetch the <human name> dictionary.
Returns
-------
:class:`pandas.DataFrame`
Dictionary with ``"cat_a"`` and ``"cat_b"`` categories.
Notes
-----
The <human name> dictionary is described in <First Author> et al.\\ [1]_
and publicly available on <Platform>\\ [2]_.
References
----------
.. [1] <Authors>, <Year>.
<Title>.
*<Journal>*
doi:`<DOI_ID> <https://doi.org/<DOI_ID>>`__
.. [2] `<SOURCE_URL> <<SOURCE_URL>>`__
Examples
--------
>>> import liwca
>>> dx = liwca.fetch_<name>() # doctest: +SKIP
"""
return read_dx(_pup.fetch("<filename>.<ext>"))
def fetch_<name>() -> pd.DataFrame:
"""
<SAME DOCSTRING STRUCTURE AS TEMPLATE A>
"""
path = _pup.fetch("<filename>.<ext>")
# ... custom parsing into a DataFrame with DicTerm index ...
logger.debug("Read <name> dictionary: %d terms from %s", len(df), path)
return dx_schema.validate(df)
The output DataFrame must have:
"DicTerm" (lowercase string terms)Look at fetch_mystical (xlsx), fetch_sleep (tsv), and fetch_threat
(txt) in src/liwca/fetchers.py for concrete non-standard parsing examples.
docs/api.rstIn the .. autosummary:: block under .. _api-fetchers:, add
fetch_<name> alphabetically. Use 3 spaces of indentation.
tests/test_fetchers.pyTwo edits:
_FETCH_FUNCTIONS list --- add liwca.fetch_<name>, alphabetically.TestRegistryIntegrity.test_all_filenames_registered --- add the
filename string to the expected set. This is easy to miss!If the function accepts parameters (like fetch_bigtwo's version), add
specific parameter tests following the test_fetch_bigtwo_* pattern.
tests/test_remote.py_FETCH_FUNCTIONS list --- add ("<name>", liwca.fetch_<name>),
alphabetically._EXAMPLES dict --- add "<name>": [<terms>], using the example
terms detected in Step 2.Run and fix any issues:
uv run ruff check src/liwca/fetchers.py tests/test_fetchers.py tests/test_remote.py
uv run ruff format src/liwca/fetchers.py tests/test_fetchers.py tests/test_remote.py
uv run pytest tests/test_fetchers.py -x -q
Do NOT run tests/test_remote.py automatically --- it downloads real files.
Ask the user before running it.
Before finishing, confirm every item:
registry.txt has the new line with correct md5 hash and URLfetchers.py --- function added to __all__ alphabeticallyfetchers.py --- function defined with full NumPy docstringapi.rst --- function listed in autosummary blocktest_fetchers.py --- function in _FETCH_FUNCTIONStest_fetchers.py --- filename in test_all_filenames_registered expected settest_remote.py --- (name, function) tuple in _FETCH_FUNCTIONStest_remote.py --- example terms in _EXAMPLESpytest tests/test_fetchers.py passes