Repository-wide QA orchestration for the VJU document archive. Use this for repository maintenance, document QA, transcription QA, or PR re-checks in this repository.
Use this skill for end-to-end document QA in this repository.
Values
Accuracy and readability matter more than speed.
Do not loosen QA criteria to make a batch pass.
If a check fails, fix it when safe or leave it open with a clear report.
Trigger Conditions
Use this skill when any of the following is true:
This is the first task performed in this repository during the current conversation.
The user asks for repository maintenance or document maintenance here.
The user asks for document QA, transcription QA, PDF↔MD checking, translation QA, batch QA, or PR re-checks here.
First-Invocation Briefing
On the first invocation: send a short message (in Japanese) naming what you will inspect first and the expected flow (inventory → checks → fixes → report/status → PR if requested). Skip this on later invocations unless scope changes materially.
Load Order
Skills relacionados
Read skills/repo-qa/references/vju-project.md.
Read only the relevant sections of docs/QA_CHECKLIST.md.
Load additional reference docs from skills/repo-qa/references/ as needed for the current runtime, issuer, or document family.
Inspect the current scripts, report files, and status file before making decisions.
Runtime Routing
The parent repo-qa skill is the main contract.
Runtime-specific guidance is intentionally kept in reference docs, not in separate workflows.
Use skills/repo-qa/references/runtime-codex.md when the current runtime is Codex.
Use skills/repo-qa/references/runtime-gemini.md when the current runtime is Gemini.
skills/repo-qa/children/codex-qa/SKILL.md and skills/repo-qa/children/gemini-qa/SKILL.md are thin runtime overlays only.
skills/repo-qa/children/copilot-qa/SKILL.md is deprecated compatibility only; do not use it for new work.
Do not create any new child-skill concept for repo-qa unless the workflow truly needs a distinct contract.
Reference Load Set
skills/repo-qa/references/runtime-codex.md: Codex runtime guidance for repo-qa runs.
skills/repo-qa/references/runtime-gemini.md: Gemini runtime guidance, including structure-aware transcription reading.
skills/repo-qa/references/issuer-vju.md: issuer-specific guidance for VJU / Vietnam-Japan University documents.
skills/repo-qa/references/3626-QD-DHQGHN_Regulation on Undergraduate Training_transcription_reference.md: structure-preserving reference transcription sample.
Add more docs to references/ when the workflow needs them; prefer reference docs over new child-skill concepts.
Core Rules
Keep user-facing updates in Japanese.
Keep repository-facing names, branch names, PR titles/bodies, and commit messages in English.
Never add AI co-author trailers such as Co-authored-by.
Prefer deterministic checks and scripts before model judgment.
Run structural checks and table-integrity checks before model judgment whenever PDF-derived tables are involved.
External tools: Python must not be used for document processing, QA checks, or transcription. Use existing Node.js scripts and shell commands. External translation APIs (DeepL, Google Translate, etc.) are prohibited.
Gemini model selection:
For initial PDF transcription / first-pass generation: prefer in this order: latest Flash Lite, Flash, and their -preview variants first; fall back to non-preview Flash Lite → Flash → Pro only if preview models fail or are unavailable.
For QA re-checks, second-pass verification, or quality audits of transcriptions: prefer Gemini Pro (current generation). Use Flash only if Pro is unavailable or rate-limited.
Do not guess model IDs; confirm available models for the session before invoking Gemini.
After deterministic checks, the currently running generative AI model performs the QA review pass. Do not call external AI services, external APIs, or separate AI applications for QA review.
AI translation is performed by the currently running generative AI model (the AI that is executing this skill). No external translation APIs, libraries, or third-party AI services are permitted.
The following conditions are mandatory for all translations:
The VI transcription must be complete (no placeholders, no truncation) before translation begins.
All glossary terms from the VJU Glossary must be applied: ĐHQGHN→VNU/ベトナム国家大学ハノイ校, Giám đốc ĐHQGHN→President/総長 (NOT "Director"/"ディレクター"), 副学長→Vice President/副総長 (NOT "副社長"), KT.→Acting/代理署名, ベトナム日本大学→日越大学, etc. The correct English name for ĐHQGHN is "Vietnam National University, Hanoi" (NOT "Hanoi National University").
The translation must preserve the full structural layout of the VI source: all articles, appendices, tables, and signature blocks.
After translation, run check_structure.js and check_disclaimer_issuer_link.js.
Update tmp/qa_status.json to record translation method as claude-ai (or the appropriate model identifier).
Token budget rule: Monitor remaining context tokens throughout the run. If the remaining context drops below 30%, immediately stop translation work, report progress to the user, and await further instruction. Do not start a new translation that cannot be completed within the remaining budget.
Parallel translation limit: Do not run more than 2 translation or large-content-generation tasks in parallel at the same time. Queue additional work and start it only after one of the active tasks completes.
Transient API error retry policy: When an API call fails with a transient error (network timeout, 503, 429 rate-limit, or similar recoverable errors), retry using this backoff schedule:
Wait 1 minute, then retry (first attempt).
Wait 5 minutes, then retry (second attempt).
Wait 10 minutes between each subsequent retry.
Do not retry immediately. Do not use shorter intervals than those listed above.
If the error persists after 3 retries, record it as a blocker and move on to the next safe task.
Post-document break: After each document set reaches complete status, wait 5 minutes before beginning work on the next document set. This cooldown applies after every completed document regardless of whether API errors occurred.
If the VI transcription is incomplete or blocked, do not proceed to translation — fix transcription first.
Treat the public glossary spreadsheet as the primary terminology reference: VJU Glossary.
Normalize ベトナム日本大学 to 日越大学, and use the glossary for other organization names, titles, abbreviations, and recurring legal terms.
If a term is ambiguous, check the glossary before making a local editorial choice.
When translation introduces a glossary gap, add the term to the glossary sheet through Google Sheets API with Category left blank and fill the remaining fields through the normal spreadsheet update workflow.
During long runs, briefly re-open this skill and the relevant checklist section about every 5 minutes or after each major phase boundary.
Work Window
DEFAULT_WORK_WINDOW = 30 document sets.
If the workflow uses a count-based pause boundary, use 30, not 3 or 12.
For full-repository runs, passing 30 items is not a stop reason by itself. Continue automatically until an explicit stop condition is reached.
Rate limit:MAX_DOCS_PER_HOUR = 9. Do not exceed 9 document sets processed per hour.
Track the number of documents completed since the start of the current hour.
If the current pace would reach 10 or more documents in the hour, pause and inform the user in Japanese that the rate limit may be hit, and ask for explicit confirmation before continuing.
Example warning: 「このまま続けると1時間あたり10件以上の処理となり、レートリミットに到達する可能性があります。続けますか?」
Required Files
Before execution, confirm these paths exist or can be initialized:
docs/qa_report_master.md
Tasks.md
tmp/qa_status.json
docs/QA_CHECKLIST.md
If the repository does not contain the required repo-qa files or contracts, stop and report that blocker clearly. Do not switch to another repository.
Workflow
Run the work in this order:
Precheck (remote sync + naming audit)
Inventory
Required filename normalization
Missing transcription generation or recovery
Doc ID review (after transcription and translation are complete for each document set)
Immediately report them to the user (Japanese) with: filename, likely document identity (if determinable), and proposed new name.
If the document identity is clear: rename using git mv and commit before proceeding.
If the document identity is unclear (scanned image, no text layer, ambiguous name): report to user and ask for identification. Do not rename without confirmation.
Track renamed files as new doc sets in tmp/qa_status.json and docs/qa_report_master.md.
After renaming, immediately add the renamed file to the current run's active inspection queue. A successfully renamed file is treated as a newly registered document set and must be processed (transcription → translation → Doc ID review → QA) within the same session. Do not defer renamed files to a future run.
Note: Legacy reference PDFs in data/public/ with filenames like 1.-Bo-tieu-chuan-*.pdf, 10B-TT04_*.pdf, Guide_to_*.pdf are tracked external reference materials — they predate the current naming convention and are not primary archive documents. Confirm with the user before renaming these.
Important: Files with a REF-* prefix (e.g. REF-AUN-QA_..., REF-BGDDT_..., REF-IQR_..., REF-JUAA_..., REF-VJU-IT_...) are primary archive documents that happen to use REF as their issuer/series code. They are not legacy files and must be included in the normal transcription and QA workflow. Do not skip or deprioritize REF-* files.
Inventory And Priority
Detect document sets by *_source.pdf and *_transcription*.md.
All files matching *_source.pdf are work targets, regardless of prefix (including REF-*, WEB-*, VJU-*, and all numeric prefixes). A source PDF with no corresponding *_transcription.md is an unprocessed document set and must be included in the work scope.
Do not rely solely on tmp/qa_status.json to determine the in-scope document list. Always cross-check against the actual filesystem (data/public/ and data/confidential/) to catch files not yet registered in the status file.
Use these commands to discover all unprocessed document sets:
# All source PDFs missing their base transcription
ls data/public/*_source.pdf | sed 's/_source\.pdf//' | while read base; do
[ ! -f "${base}_transcription.md" ] && echo "UNPROCESSED: $(basename "$base")"
done
ls data/confidential/*_source.pdf 2>/dev/null | sed 's/_source\.pdf//' | while read base; do
[ ! -f "${base}_transcription.md" ] && echo "UNPROCESSED: $(basename "$base")"
done
Cover both data/public/ and data/confidential/ roots unless the user explicitly narrows the scope.
The goal is repository-wide coverage, not a small sample.
A set remains in scope until it passes all required QA gates or is explicitly blocked.
Apply this priority order:
Non-compliant filenames (rename or report before anything else)
Unprocessed document sets (source PDF with no transcription at all) — highest processing priority
Documents issued within the last 3 months
Unfinished capability-gated work
Remaining QA work for not-yet-passed items
If all items already passed, re-check the 30 oldest checked sets
Doc ID Review (Step 5 — after transcription and translation complete)
After VI transcription and EN/JA translations are complete for a document set, review whether the current doc_id (as reflected in the filename and front matter) is appropriate. This step applies especially to newly transcribed documents, REF-* prefixed files, and any document where the filename was inherited from an informal or ambiguous source.
Review criteria
Official document number: If the document has an official government/institution reference number (e.g., 3626/QĐ-ĐHQGHN, 01/2024/TT-BGDĐT), the doc_id must match it. The sanitized filename form uses - separators (e.g., 3626-QD-DHQGHN).
External reference documents (REF- prefix):* For external reference documents that do not carry an official Vietnamese/Japanese government document number, the REF-{ISSUER}_{SeriesOrYear} convention is acceptable. Review:
Is the issuer code accurate? (e.g., AUN-QA, BGDDT, IQR, JUAA, VJU-IT)
Is there a version or year that could replace the REF prefix for clarity? (e.g., AUN-QA-2024_Assessment-Guide-v4)
Would a more specific identifier better distinguish this document from related ones?
"Alt Version" and duplicate-name documents: Files with Alt Version in the title share a base document name with another file. Review whether both files should be kept, whether one should be the canonical version, and whether a disambiguating identifier can be added.
No official number available: If no official number exists, keep the current sanitized form and record it in tmp/qa_status.json as the accepted doc_id.
Review process
After translation is complete, read the document's front matter and first page content to confirm or correct the doc_id.
If the doc_id needs changing:
a. Report to the user (in Japanese) with the current name, the proposed new name, and the reason.
b. Wait for user confirmation before renaming.
c. After confirmation: rename all associated files (_source.pdf, _transcription.md, _transcription_en.md, _transcription_ja.md) with git mv, update front matter doc_id fields, and update tmp/qa_status.json.
If the doc_id is acceptable, record it as doc_id_reviewed: true in tmp/qa_status.json and continue.
qa_status.json field
Add "doc_id_reviewed": true to the document entry in tmp/qa_status.json once review is complete (no rename needed, or rename confirmed and applied).
Checks And Fixes
The QA checklist defines pass/fail criteria. Do not weaken it during execution.
If a document fails a check and the issue is safely fixable, fix it and rerun the relevant checks.
Do not change checklist criteria or downgrade a failure to make the run pass.
If the issue cannot be fixed safely, keep the item open and report why.
Public reader validation must include the actual browser-rendered output, not only file-level markdown checks, whenever the document uses HTML blocks (<p align=...>, tables, embedded divs) or mixed markdown lists.
A document does not pass if the browser reader still shows literal markdown markers such as - , * , **bold**, or raw heading/list syntax where structured HTML should appear.
For any confidential document change, confidential metadata change, or reader/deployment change that can affect restricted docs, run node scripts/check_confidential_readiness.js after node scripts/build-search-index.js.
check_confidential_readiness.js must finish with Errors: 0. If it reports warnings (for example a missing Drive map entry), mention them explicitly in the user report and keep the affected document open when the warning is user-visible.
Confidential reader QA must verify that backend content-not-found / document-not-found failures are surfaced as explicit availability problems, not misreported as generic browser-side Firestore permission errors.
Confidential reader QA must verify that one missing language variant does not blank the whole reader when other variants are available; available panes must still render and unavailable languages must be clearly indicated.
Source PDF preview validation must confirm that the right pane renders actual PDF canvases/pages, not only that the _source.pdf URL returns HTTP 200.
When diagnosing source preview issues, check for client-side rerender failures caused by detached ArrayBuffer reuse, ResizeObserver-triggered rerenders, or other PDF.js lifecycle errors in the browser console.
Preserve layout fidelity in official headers, centered blocks, appendices, form structures, and signature sections.
When reviewing or regenerating PDF-derived transcriptions/translations, keep the PDF's structural layout intact in Markdown as far as the format permits: page breaks, heading hierarchy, table columns/cells, merged cells, appendices, footnotes, and signature blocks must remain traceable.
Treat non-PDF helper artifacts as temporary recovery inputs only; remove them after the transcription QA for that document set is complete.
Heading-count mismatches are only signals. Confirm semantic structure before editing.
List-item mismatches are only signals. Sub-bullet formula notation in VI source may inflate list counts relative to EN/JA translations. Always verify semantic completeness before treating a list-item count difference as a failure.
Do not treat placeholders, TODO text, or "translation will be provided later" notes as completed restoration.
DISCLAIMER must be present in all language variants independently. VI, EN, and JA transcription files each require their own DISCLAIMER block. Missing DISCLAIMER in VI while EN/JA have it is a defect — fix it.
SOURCE_NOTE format: Both > **[SOURCE_NOTE]** (blockquote) and <div class="source-note"> (HTML div) are acceptable per the QA checklist. Do not convert one to the other.
Re-check: always update last_processed_at. When re-checking a previously passed document set (priority 5), update last_processed_at in tmp/qa_status.json regardless of whether fixes are applied.
Partial translations: A > **[PARTIAL TRANSCRIPTION]** / > **[部分転記]** notice is acceptable for very large technical documents where complete translation is impractical. The partial scope must be explicitly documented in the file.
Count Reporting
Every run must make it obvious how many document sets were:
in scope
checked
passed
fixed but still open
blocked
skipped
If the target count is 0, do not accept that casually.
Re-check whether inventory logic, target root, filters, and status file interpretation are correct.
If the correct result is still 0, report that explicitly to the user as a real result.
Gemini Availability
Prefer Gemini CLI or SDK only when the configured runtime actually works in this environment.
Authentication: OAuth only. Always invoke Gemini via the gemini -p CLI wrapper. Never use an API key (GEMINI_API_KEY or GOOGLE_API_KEY) for repo-qa work in this repository.
If Gemini API calls fail with invalid-key or model-not-found errors, stop Gemini-dependent work for that gate, record the blocker, and continue safe deterministic checks.
Treat API_KEY_INVALID as an authentication/key problem, not a quota signal. Use quota wording only for RESOURCE_EXHAUSTED, 429, or explicit rate-limit responses.
Do not guess model IDs; use only verified working IDs for this session.
write_file is blocked in headless (-p) mode. When Gemini generates transcription but cannot save it, recover content from ~/.gemini/tmp/document-archive/chats/session-*.json → messages[].toolCalls[name=write_file].args.text. See skills/repo-qa/references/runtime-codex.md for the extraction command.
Correct CLI syntax:gemini -p "...prompt including file path..." — file path must be inside the prompt string, never as a positional argument.
Confidential PDFs: Gemini cannot read data/confidential/ (.gitignore restriction). For confidential docs, generate VI content from EN/JA inference only, mark with SOURCE_NOTE.
Read skills/repo-qa/references/troubleshooting.md before starting any Gemini transcription task.
Strict Reviewer Pass
After the ordinary QA pass:
Review the changes as a strict reviewer.
Look for regressions, misleading report claims, formatting drift, layout damage, metadata mistakes, and claim-vs-diff mismatches.
If problems are found, go back and fix the documents, scripts, or reports.
Do not improve the checklist or reinterpret the result to hide a defect.
Self-Check Gate
Before declaring completion or creating a PR:
Confirm docs/qa_report_master.md preserved prior history and appended the new run.
Confirm Tasks.md preserved prior history and appended the new run.
Confirm tmp/qa_status.json is updated for every touched doc_id.
Confirm no touched transcription file gained last_updated.
Confirm no placeholder-only content is described as restored content.
Confirm layout-sensitive regions remain layout-faithful.
Confirm report claims match the final diff.
Confirm commit messages are in English and contain no AI co-author trailer.
If the user asked for release/deploy work, confirm the final handoff includes git push, Firebase Hosting deploy, and Cloudflare Pages deploy.
If any self-check item fails:
do not declare completion
do not relax the checklist
return to the relevant fix step
Stop Conditions
Stop only when one of these is true:
No more eligible non-blocked work remains under repo-qa.
Only blocked work remains.
A real runtime or tool limit prevents safe continuation.
These are not valid stop reasons:
“3 items are done”
“12 items are done”
“30 items are done”
“a first batch is done”
“a summary has been written”
“the workflow was demonstrated”
Expected Deliverables
At the end of a run, leave:
updated document files where safe fixes were applied
an appended docs/qa_report_master.md
an appended Tasks.md
an updated tmp/qa_status.json
a concise Japanese summary to the user
a PR, if the user requested PR creation
for release tasks, a completed git push plus Firebase Hosting and Cloudflare Pages deploys
PR Rule
Create a PR when the user asks for it.
Do not merge at this stage unless the user separately asks for merge.