Author, revise, reorganize, or review step-by-step tutorials for beginner-to-intermediate learners, primarily static text-and-image format. Grounded in Mayer's CTML, Sweller's CLT, and van der Meij & Carroll's minimalism. Use when writing a new guide, improving an existing one, auditing tutorial structure, or reorganizing course content. Triggers on: 'tutorial', 'step-by-step guide', 'hands-on guide', 'walkthrough', 'how-to', 'reorganize tutorial', 'revise guide', 'improve tutorial quality', 'review tutorial structure', 'チュートリアル', '整理し直す', 'マルチメディア学習'.
metyatech0 スター2026/04/16
職業
カテゴリ
教育
スキル内容
Use this skill when writing, revising, or auditing any document
where a reader follows steps to build or achieve something.
Target learner (expertise reversal boundary)
This skill is optimised for beginner-to-intermediate
learners encountering the subject for the first or second time.
Most multimedia learning principles (Signaling, Pre-training,
Personalization, heavy imagery) are strongest in that range and
weaken — or reverse — for experts (Kalyuga's expertise
reversal effect). If the artefact is an expert-facing quick
reference, the author MUST scale back Signaling, Concept
density, and hand-holding narrative, and lean on Reference
tables. When in doubt, state the target learner explicitly in
the document's opening.
Scientific foundations
All authoring rules below derive from the principles in this
table. Principles are stated so their scopes do NOT overlap;
when two seem to conflict, the "Scope & limits" column
resolves the boundary. The agent MUST apply them actively when
writing new tutorials and when reviewing existing ones.
Underlying load model (Sweller's CLT)
関連 Skill
Every principle in the table below is a tactic for managing one
of the three load types in Cognitive Load Theory (Sweller, 1988).
When two principles compete, resolve by asking which load type
currently dominates.
Multimedia, Personalization, Generative activity, Worked example, Feedback
Expertise reversal (Kalyuga, 2007) predicts that tactics which
reduce extraneous load for novices can increase extraneous
load for experts (because redundant signals compete with
established schemas). This is why the skill scopes itself to
beginner-to-intermediate learners.
Citation note: The Mayer principles above cite the 2nd
edition (Mayer, 2009) which defined 12 principles. The 3rd
edition (Mayer, 2021) expanded to 15, adding Split-attention,
Transient information, and Immersion. This skill incorporates
Split-attention (relevant to static tutorials) and notes
Immersion as out-of-scope. Transient information applies to
video/animation tutorials and is not covered here; consult
Jiang & Sweller (2021) when authoring such content.
Principles not applicable to static text-and-image tutorials
The following are canonical CTML principles, but they require
audio or speaker presence and therefore are not applicable
to static text-and-image tutorials. If the artefact is a
narrated video or an avatar-driven walkthrough, these
principles MUST be applied from the primary sources — this
skill does not cover their application.
Principle
Applies to
Why out of scope here
Voice principle (Mayer, 2009)
音声ナレーション媒体
人間の声 vs 機械音声の比較。静的ページに音声はない
Image principle (Mayer, 2009)
動画教材で話者の画像を画面に出すか
話者の顔映像の有無は静的ページで判断不能
Embodiment principle (Mayer, 2014)
動画で話者がジェスチャーを伴うか
ジェスチャーは動画特有
Immersion principle (Mayer, 2021)
VR / 没入型媒体
本スキルは 2D 静的ページに限定
Authors writing narrated video or VR content MUST NOT assume
this skill covers these principles; apply them separately from
Mayer & Fiorella (2021), Cambridge Handbook of Multimedia
Learning (3rd ed.).
Information hierarchy
Every tutorial MUST be composed of exactly these layers:
Tutorial (page)
├── Prerequisites — what the learner needs before starting (optional)
└── Section (depth 0) — milestone ("when done, you can X"); `goal` required
├── goal — 1 future-tense sentence declaring what the learner will achieve
├── Concept × N — term/background, always collapsible, before first use
├── Reference × N — lookup tables, always collapsible, near relevant sub-section
├── Section × N (depth 1+, nested) — a group of actions toward one sub-goal
│ ├── goal — optional at depth > 0 (use to express sub-goal in task language)
│ ├── Action × N — the atomic unit (image + instruction + result)
│ ├── Recovery — error recovery, inline, after the action that can fail
│ └── Verify — "→ expected result" (1 text line; optional result screenshot)
├── Checkpoint — end-of-section checklist (exactly one per top-level Section)
└── NextSteps — what to do after the tutorial (final top-level Section only, optional)
A single <Section> component is used recursively — it replaces what earlier
versions of this skill called <Step> (top-level milestone) and <Procedure>
(sub-goal grouping). Nesting depth is computed at compile time and mapped to
h2 (depth 0) → h3 → h4 → … capped at h6.
Always visible, → prefix, 1 text line; optional img displayed above the text row (Spatial contiguity)
End of a Section that contains Actions
Concept
Term definition, background
Collapsible (<details>)
Before the Section that first uses the term
Reference
Key tables, panel lists
Collapsible (<details>)
Near the Section that needs it
Recovery
Error recovery steps
Always visible, short
After the Action that can fail
Next steps
What to do after completing this tutorial
Always visible, bullet list
End of the final top-level Section or after the last Checkpoint
Image / text / video hierarchy
Subject
Primary
Secondary
Rationale
UI operation (where to click)
Annotated screenshot
Label text only
Unknown UI requires visual anchor
Result (what happened)
Text (→ 1 line in <Verify>)
Screenshot via <Verify img="..."> (optional)
State changes are faster to judge via text; screenshot confirms visual result state when text alone is ambiguous
Concept (why)
Text only
—
Abstraction does not benefit from images
Multi-step continuous flow
Video/GIF
Text (supplement)
Motion cannot be conveyed in stills
Atomic unit: Action
The smallest learning unit is:
[Image: WHERE to interact] → [Text: WHAT to do] → [Text: WHAT happens (optional)]
Rules:
The author MUST use one image per action. The author MUST NOT
batch images. (mechanised: tutorial/action-single-image)
The image MUST be placed above or before the text (spatial
contiguity).
The author MUST NOT describe in text what the image already
shows (redundancy elimination).
The image and the text MUST cover non-overlapping channels:
the image carries WHERE (position, visual anchor, step
ordering via numbered callouts); the text carries the
imperative WHAT plus any values the image cannot convey.
The author MUST keep in the text: values the learner types by
hand (e.g. UE90min, CLEAR!), user-specific paths the
screenshot cannot generalise, and UI element names that are
not labelled in the shot.
The author MUST remove from the text: positional prefixes when
the image already shows position (mechanised:
tutorial/action-positional-prefix); label/value pairs
already paired in the image's numbered callouts; micro-
interaction details already conveyed by the image's arrows or
step markers.
Reducing Action text to a bare verb such as "クリックします"
MUST NOT be used as a redundancy fix. The imperative WHAT
plus the values the image cannot convey is the minimum;
strip only what the image already carries.
If the action produces a visible result, the author MUST state
it inline. The author MUST NOT create a separate Verify for
this — Verify is reserved for Section-level confirmation only.
Use second-person direct address to the reader. Do not
describe the reader in third person ("受講者が〜する",
"初学者が〜", "学習者は〜").
Use active, conversational Japanese: 「〜しましょう」
「〜してください」「ここで〜を確認します」.
Do not open a page by describing what the document is or
who it is for ("この教材は〜のための資料です" は NG).
Open with the first learner-facing step or an inviting
goal statement.
Personalization is about friendliness, not familiarity.
Emoji spam, 余談, 感情過剰な装飾はむしろ Coherence 原理違反
になるため避ける。親しみやすさの上限は「先輩が隣で教えて
くれる」程度が目安。
Goal strings already use future-declarative form; they also
implicitly address the reader — do not revert them to
third-person ("受講者が〜する状態になります" は Goal と
Personalization の両方に違反する).
Every <Action> image MUST have an alt prop that describes
what the image shows in the context of the step. The alt text
MUST convey the WHERE information (which panel, which
button, which area) so that a screen-reader user can follow
the procedure without seeing the image. If the image is purely
decorative (rare in tutorials), use alt="".
Numbered callouts, arrows, and highlights in images MUST NOT
rely on colour alone to convey meaning. Pair colour with
shape (numbered circles, arrows with labels) so that readers
with colour vision deficiencies can follow the sequence
(WCAG SC 1.4.1).
When an image conveys information not present anywhere in the
surrounding text (e.g. a UI layout, a spatial relationship
between panels), the author MUST provide a text equivalent
nearby — either in the Action text, a Concept, or a
Reference — so that the meaning is recoverable without the
image.
Text annotations overlaid on screenshots MUST meet a minimum
contrast ratio of 3:1 against the background they sit on
(WCAG SC 1.4.11 for non-text UI components).
The tutorial's heading hierarchy is produced by <Section>
nesting depth (depth 0 → h2, depth 1 → h3, … capped at
h6). The remark-section-headings plugin injects these
semantic headings automatically so that screen-reader
navigation by heading works correctly.
Interactive examples or embedded widgets (if any) MUST be
operable by keyboard alone in a logical tab order.
The goal string is rendered verbatim as a banner directly below
the section heading; no prefix such as "ゴール:" is added. It MUST
read as a complete sentence on its own. Bare noun-phrase endings
such as 「〜した状態」 are forbidden because they render as
incomplete prose.
The author MUST write goals in future-declarative form describing
what the learner will achieve by the time the section is complete:
Acquired behavior / state → 「〜ようになります」 /
「〜の状態になります」(例: 「触れたら消えるようになります」)
The author MUST NOT write goals in past or completed form
(「〜した」「〜された」「〜した状態」「〜している」「〜できます」) because
those frame the section as a retrospective of what already happened
instead of a preview of what the learner is about to build.
(mechanised: tutorial/section-goal-required, tutorial/section-goal-tense)
Action text
The author MUST use the imperative mood: 「〜をクリックします」
「〜と入力します」.
The author MUST name a UI element in bold only when the
image does not clearly label it with a visible caption or a
numbered callout, or cannot disambiguate it from similar
elements. Repeating an image-labelled element in text violates
the Redundancy principle; Identity is image-primary when
labelled in the shot (see the WHERE/WHAT channel rules in
Atomic unit: Action).
Panel and location names appear inline on first use only
when the shot does not already make the panel unambiguous.
A full-screen screenshot that shows the panel in context does
not require redundant naming in text.
Regardless of the above, values the learner must type
(e.g. UE90min), user-specific paths, and gestures/motions
(drag direction, hover vs click) MUST remain in text because
a still image cannot convey them.
Concept text
Concept serves the Pre-training principle: it teaches the
name and the key features of a term the learner is about
to encounter, so that main-task cognitive load is reduced.
A Concept MUST be at most 5 sentences or 1 short table.
A Concept MUST answer "what is it?" (name + key features)
and "why does the learner need to know right now?".
A Concept MUST be placed immediately before the first
Section that uses the term. Placing Concepts far upfront
violates Minimalism; omitting them until after first use
violates Pre-training.
Concepts for terms that appear much later MUST NOT be written
now. Pre-training applies to the next sub-task, not to the
entire page.
If a Concept needs more than 5 sentences, the author MUST split
it into multiple Concepts and place each before its own
first-use Section.
Verify text
In component-based tutorials the Verify component renders its
own leading →; the author MUST NOT include → in the source
or the rendered output will have a doubled arrow.
(mechanised: tutorial/verify-no-duplicate-arrow)
In plain-Markdown tutorials (no component), the Verify line
MUST start with →.
A Verify line MUST describe observable state, not internal
mechanics:
✅ 「キューブが消えれば成功です」
❌ 「Destroy Actor が実行されました」
Prerequisites text
Prerequisites MUST appear at the page top, before the first
top-level Section.
Each prerequisite MUST be actionable or verifiable: state
the required software version, completed prior tutorial,
or assumed knowledge concretely.
✅ 「Unreal Engine 5.4 以上がインストール済みであること」
✅ 「Step 1〜3(前回のチュートリアル)を完了していること」
❌ 「基本的な知識があること」(what knowledge?)
If no prerequisites exist, omit the section entirely (do not
write "特になし").
Recovery text
Recovery serves ミニマリズム P3 (error recognition and
recovery support). It covers the full error lifecycle:
prevention, detection, and correction.
A Recovery block MUST be placed after the Action that can
plausibly fail.
A Recovery block MUST follow the structure: symptom →
cause → fix (in that order). The symptom comes first
because the learner sees the symptom, not the cause.
For actions with a high failure probability, the author
SHOULD place a preventive note (1 sentence) immediately
before the Action, warning about the common mistake. This
note is distinct from Recovery (which is reactive).
Recovery MUST NOT be placed at the end of a Section as a
catch-all. Each Recovery block addresses a specific failure
point.
Next steps text
Next steps MUST appear only at the end of the final top-level
Section or after the last Checkpoint.
Each item MUST link to a concrete next action: another
tutorial, a documentation page, or an exercise.
The author MUST NOT use vague pointers ("詳しくは公式
ドキュメントを参照してください" without a link).
Checkpoint
A Checkpoint MUST be a bullet list of observable behaviors.
A Checkpoint MUST NOT include internal state or jargon.
Exactly one Checkpoint per top-level Section, placed as the
last element of that Section.
(mechanised: tutorial/checkpoint-placement)
Anti-patterns (do NOT do)
Judgement-based anti-patterns; a tool cannot reliably detect
these, so the author is responsible for catching them.
Anti-pattern
Violated principle
Fix
Text restating what image shows
Redundancy
Remove the text or remove the image
Button/selection label repeated in bold text while the image already labels it with a numbered callout
Redundancy
Keep text to "① を選びます" etc.; Identity is carried by the image
Mechanical splitting of a single-screen unified task into many Actions (one per item in the same dialog)
Segmenting (misapplied)
Keep 1 screen = 1 Action when the sub-goal is unified; split only on screen/state transitions
Decorative images, fun sidebars, background music
Coherence
Remove entirely; they impair learning
Same content in narration AND on-screen text
Redundancy / Modality
Use narration OR on-screen text, not both
:::note for concepts
Segmenting
Not collapsible; use Concept component
Verify after every action
Segmenting
Verify at Section end only
Front-loading reference tables
Minimalism
Use Reference, near first use
Term introduced before it's needed
Minimalism
Concept before first-use Section
Reducing Action text to a bare "クリックします" to avoid redundancy
Redundancy (over-correction)
Keep the imperative WHAT plus the values the image cannot convey
Settings table duplicating the image's numbered callouts row-for-row
Redundancy
Keep in text only the values the image cannot convey (typed input, user-specific paths, dropdown values absent from the shot)
Micro-interaction detail ("空白で離す", "カーソルを乗せ") redundantly described when image's arrows already convey it
Redundancy
Remove — but only after confirming the image truly conveys the gesture; motion attributes ("drop in empty space", "hover vs click") often need text because a still image cannot encode them
Opening a page by describing what the document is or who it is for ("この教材は〜のための資料です", "受講者が〜する授業")
Personalization
Rewrite in second-person direct address; open with the first learner-facing action or an inviting goal
Describing the reader in third person ("学習者は〜", "初学者向け", "受講者が〜") anywhere in the tutorial body
Mechanised checks (enforced at MDX build/dev time)
The following conventions are enforced by the
remarkTutorialLint plugin in
@metyatech/course-docs-platform. Violations surface in
npm run dev and npm run build output; author reliance on
memory is not required.
Severity policy (evidence-tiered)
Severity is tied to how strongly the rule is anchored in the
underlying research, so the tool does not over-reach.
Severity
Semantics
Build effect
error
Structural break that makes the MDX incoherent or loses required authoring metadata
Fails the MDX compile
warn
Principle violation with solid empirical support, or a render/technical bug
Emitted via console.warn + file.message(). Fails under TUTORIAL_LINT_STRICT=1
note
Advisory derived from a principle whose specific numeric threshold or lexical pattern is a professional guess rather than a direct research finding
Emitted via console.info only. Never promoted to an error, even under strict. In collect-all mode, notes appear in the summary but do not by themselves fail the build
TUTORIAL_LINT_COLLECT=1 aggregates every finding in a file
into a single failure message, so a PR author can fix all
violations in one pass instead of one at a time. If the
aggregated collection contains only notes, the summary is
printed via console.info and the build still passes.
Rule → severity mapping
Rule ID
Severity
Intent
tutorial/page-authoring-mode-invalid
error
Frontmatter authoringMode must be tutorial or non-tutorial
tutorial/page-mode-tutorial-requires-section
error
A page declared authoringMode: tutorial must contain at least one <Section>
tutorial/page-mode-non-tutorial-has-section
error
A page without authoringMode: tutorial must not use <Section>
tutorial/section-goal-required
error
Every <Section> declares its goal
tutorial/action-single-image
error
One image per Action
tutorial/checkpoint-placement
warn
Exactly one <Checkpoint> per top-level Section, placed last
tutorial/section-no-hrule
warn
No --- inside a Section
tutorial/verify-no-duplicate-arrow
warn
<Verify> body starts with → — component already renders it
tutorial/action-positional-prefix
warn
img-bearing <Action> body starts with a positional prefix
tutorial/section-lacks-feedback
warn
A Section with Actions has no <Verify> / <Recovery> / <Checkpoint>
6+ bold spans in one Action (dilution threshold is advisory)
The note tier exists because these rules are correct in
principle but their specific numeric boundary or lexical
trigger has no direct empirical backing — they are the
authoring equivalent of professional code review hints, not
hard gates.
Page authoring mode (frontmatter)
Tutorial pages MUST declare their authoring mode in
frontmatter so the lint plugin can apply tutorial-specific
rules only to tutorial pages:
Use second-person active voice ("〜しましょう", "確認してください")
Front-loading a long concept chapter before the first Action (Pre-training misapplied)
Pre-training × Minimalism
Move each term's Concept to immediately before its first-use Section; keep each Concept to name + key features only
Bold/highlight used for emotional emphasis or decoration, not tied to a learning-objective cue
Signaling × Coherence
Reserve bold/highlight for the element the learner must find or type; remove decorative emphasis
Multiple bold spans crammed in one sentence
Signaling (dilution)
Bold only the single element that most matters; demote the rest to plain text
Verify line that describes internal mechanics instead of observable state ("Destroy Actor が実行されました")
Generative activity
Rewrite as an observable outcome the learner can check ("キューブが消えれば成功")
Exercises tacked on for practice volume rather than learning objective
Coherence / Generative activity (misapplied)
Tie every Exercise to the Step's stated goal; drop unrelated drills
Applying beginner-weight Signaling/Concept density to an expert-facing reference
Expertise reversal
Scale back: use compact Reference tables, drop hand-holding narrative
Using <Action img> to show a result-state screenshot while the text contains result-check language ("〜になれば成功", "〜ていることを確認")
Feedback (Verify workaround)
Replace with <Verify img="..."> — the image carries the observable result state, the text carries the 1-line confirmation (mechanised: tutorial/verify-visual-workaround-as-action)
Organising tutorial sections by software feature/menu rather than by learner's task goal
ミニマリズム P2 (task anchoring)
Reorganise by what the learner wants to achieve, not by where the feature lives in the UI
No Recovery block after an action that commonly fails
ミニマリズム P3 (error support)
Add Recovery with symptom → cause → fix structure
Recovery that says "やり直してください" without diagnosing the cause
ミニマリズム P3 (error support)
Rewrite with concrete symptom, cause, and fix
All Steps require reading every prior Step to make sense; no standalone entry point
ミニマリズム P4 (flexible use)
Make each Step's goal self-explanatory; use collapsible Concepts/References so known readers can skip
Tutorial starts without stating required environment, software version, or prior knowledge
Prerequisites (ISO 26514)
Add a Prerequisites section at the page top listing concrete, verifiable requirements
Images with no alt text, or alt text that says "screenshot" / "image"
Accessibility (WCAG SC 1.1.1)
Write alt that describes WHERE information: which panel, button, or area is shown
Numbered callouts or highlights that use colour alone (no shape or label) to convey sequence
Accessibility (WCAG SC 1.4.1)
Pair colour with numbered circles, arrows with text labels, or other shape cues
Jumping straight to independent exercises without first showing a complete worked example
Scaffolding / Worked example
Start with Phase 1 (full example), then Phase 2 (guided variation), then Phase 3 (independent)
Introducing a new concept without connecting it to anything the learner already knows
Activation (Merrill)
Add an analogy or reference to a familiar concept in the Concept block
Screenshot + explanation table placed far apart, requiring the reader to scroll between them
Split-attention
Place the explanation immediately adjacent to (or overlaid on) the screenshot
tutorial/third-person-reader
note
Third-person reader references match a heuristic pattern list
tutorial/page-opens-with-doc-description
note
First paragraph starts with "この教材は〜" etc. (opener-only scope is heuristic)
tutorial/verify-internal-mechanics
note
Verify text matches the engine-state pattern list
tutorial/concept-length
note
Concept body exceeds 10 sentences or 1 table (numeric threshold is advisory)
tutorial/concept-placement
note
Concept has no following Action / Section / Exercise
tutorial/decorative-emoji
note
Non-allowlisted emoji outside signalling surfaces (allowlist is a cultural convention)
tutorial/verify-visual-workaround-as-action
note
<Action img> whose text matches result-check patterns ("〜になれば成功", "〜ていることを確認" etc.) — likely a Verify disguised as an Action
tutorial/prerequisites-placement
warn
<Prerequisites> appears after the first <Section>
tutorial/nextsteps-placement
note
<NextSteps> appears before the last <Section> (advisory)