Optimize Extended Min code size and branch speed by maximizing valid local fast branches/jumps while preserving compile correctness in both default and USE_ACCELERATOR builds.
Use this skill when modifying extended-min.min64x4 and you need to recover size/speed after layout drift.
Find the best layout and opcode form mix that:
-D USE_ACCELERATOR.align unless an explicit alignment-budget phase is requestedThis skill depends only on the repo-local compile helper:
skills/compile-min-64x4/scripts/compile_min64x4.shThat helper fetches the Minimal 64x4 BespokeASM config from GitHub and requires:
bespokeasmcurl or wgetFast -> Long fallback used by optimizer:
FPA -> JPAFEQ -> BEQFNE -> BNEFCC -> BCCFCS -> BCSFGT -> BGTFLE -> BLEFPL -> BPLFMI -> BMIAll scripts live in scripts/:
optimize_dual.sh
collect_metrics.sh
fast/long/align counts and score tupleg_stop from both pretty listingsrun_candidate.sh
show_align_padding.sh
.alignrg --pcre2 -n -U "LDZ ([A-Za-z0-9_]+)\\+0 STZ ([A-Za-z0-9_]+)\\+0\\n\\s*LDZ \\1\\+1 STZ \\2\\+1" \
extended-min.min64x4
rg --pcre2 -n -U "MZZ ([A-Za-z0-9_]+)\\+0,([A-Za-z0-9_]+)\\+0\\n\\s*MZZ \\1\\+1,\\2\\+1" \
extended-min.min64x4
rg --pcre2 -n -U "LDZ ([A-Za-z0-9_]+)\\+0 ADV ([A-Za-z0-9_]+)\\+0\\n\\s*LDZ \\1\\+1 AD\\.Z \\2\\+1" \
extended-min.min64x4
rg --pcre2 -n -U "LDZ ([A-Za-z0-9_]+)\\+0 SUV ([A-Za-z0-9_]+)\\+0\\n\\s*LDZ \\1\\+1 SU\\.Z \\2\\+1" \
extended-min.min64x4
rg --pcre2 -n -U "CLZ ([A-Za-z0-9_]+)\\+0\\n\\s*CLZ \\1\\+1" \
extended-min.min64x4
skills/optimize-size/scripts/optimize_dual.sh <candidate.min64x4> <tag>
skills/optimize-size/scripts/collect_metrics.sh \
<candidate.min64x4> \
/tmp/optimize-size.<tag>.noacc.pretty \
/tmp/optimize-size.<tag>.acc.pretty
collect_metrics.sh.extended-min.min64x4.Lexicographic order, lower is better:
max(g_stop_noacc, g_stop_acc)g_stop_noacc + g_stop_acclong_count-fast_countThis prioritizes smallest binary size across both build modes, then fastest branch mix.
USE_ACCELERATOR) at the tail.Only if requested:
.align only between functions<= 100 bytes).align insertion, rerun optimize_dual.sh and keep only net winsshow_align_padding.sh to measure actual padding cost from the listing/tmp/optimize-size.<tag>.noacc.pretty/tmp/optimize-size.<tag>.acc.prettyFor full method details and decision rules, read references/workflow.md.
PyTorch深度学习模式与最佳实践,用于构建稳健、高效且可复现的训练流程、模型架构和数据加载。