Optimize Roblox Luau files for maximum performance, type safety, and native codegen quality using luau-compile, luau-analyze, luau-ast, and the luau CLI. Use when the user says /optimize-luau, asks to optimize a Luau file, improve Luau performance, reduce allocations, improve type coverage, analyze bytecode, or apply native codegen.
Optimize Roblox Luau files for maximum performance, type safety, and native codegen quality using luau-compile, luau-analyze, luau-ast, and the luau CLI.
All scripts are Roblox Luau. Roblox APIs are always available. The vector type is Vector3 (Roblox userdata). We are NOT using the new type solver. Studio testing uses --!optimize 1 by default; live experiences use --!optimize 2. All luau-compile invocations must include --vector-lib=Vector3 --vector-ctor=new --vector-type=Vector3. Prefer .luau extension over .lua everywhere.
Workspace hygiene: CLI tools generate residual files (stats.json, profile.out, coverage.out, trace.json). At the start of execution, create a temp working directory and use it for all tool output:
# Windows
$LUAU_TMP = New-Item -ItemType Directory -Path "$env:TEMP\luau-optimize-$(Get-Random)"
# Linux/Mac
LUAU_TMP=$(mktemp -d)
Route all output there: --stats-file=$LUAU_TMP\stats.json, write benchmark files to $LUAU_TMP\bench.luau, run profiling/coverage tools with $LUAU_TMP as working directory. The OS handles cleanup -- no manual deletion needed.
CLI setup: After creating the temp directory, check if luau-compile is on PATH. If missing, download the latest release from https://github.com/luau-lang/luau and extract to $LUAU_TMP, then prepend to PATH for the session:
# Windows (PowerShell)
if (-not (Get-Command luau-compile -ErrorAction SilentlyContinue)) {
$tag = (Invoke-RestMethod "https://api.github.com/repos/luau-lang/luau/releases/latest").tag_name
Invoke-WebRequest -Uri "https://github.com/luau-lang/luau/releases/download/$tag/luau-windows.zip" -OutFile "$LUAU_TMP\luau.zip"
Expand-Archive -Path "$LUAU_TMP\luau.zip" -DestinationPath $LUAU_TMP
$env:PATH = "$LUAU_TMP;$env:PATH"
}
# macOS (bash/zsh)
if ! command -v luau-compile &> /dev/null; then
tag=$(curl -s "https://api.github.com/repos/luau-lang/luau/releases/latest" | grep -o '"tag_name": "[^"]*"' | cut -d'"' -f4)
curl -L "https://github.com/luau-lang/luau/releases/download/$tag/luau-macos.zip" -o "$LUAU_TMP/luau.zip"
unzip "$LUAU_TMP/luau.zip" -d "$LUAU_TMP" && chmod +x "$LUAU_TMP"/luau*
export PATH="$LUAU_TMP:$PATH"
fi
# Linux (bash)
if ! command -v luau-compile &> /dev/null; then
tag=$(curl -s "https://api.github.com/repos/luau-lang/luau/releases/latest" | grep -o '"tag_name": "[^"]*"' | cut -d'"' -f4)
curl -L "https://github.com/luau-lang/luau/releases/download/$tag/luau-ubuntu.zip" -o "$LUAU_TMP/luau.zip"
unzip "$LUAU_TMP/luau.zip" -d "$LUAU_TMP" && chmod +x "$LUAU_TMP"/luau*
export PATH="$LUAU_TMP:$PATH"
fi
.luau/.lua files, enter Phase 6 (multi-file mode).Check in this order:
*.server.luau / *.server.lua -> server (native codegen budget is server-side; benefit depends on code content)*.client.luau / *.client.lua -> client (native codegen less effective due to device architecture diversity)*.legacy.luau / *.legacy.lua -> ambiguous (could be cloned/moved at runtime). Ask the user.*.luau / *.lua (no suffix) -> ModuleScript, fall through to path check.ServerScriptService/ServerStorage in path -> server-side moduleStarterPlayer/StarterGui/StarterCharacterScripts/ReplicatedFirst in path -> client-side moduleReplicatedStorage/shared in path -> shared moduleUse the AskQuestion tool with these three questions:
Question 1: "Compilation optimization intensity"
minimal -- Headers (--!strict/--!optimize 2), type annotations on all function signatures, deprecated pattern replacements, native codegen analysismoderate -- + function restructuring for inlining, import hoisting, compound operators, fastcall enablement, allocation reductioninsane -- + every micro-optimization from the pattern catalog, full bytecode analysis, register pressure optimization, closure caching analysisQuestion 2: "Algorithm optimization"
none -- Don't touch logic or data flowlow hanging fruit -- Obvious O(n^2)->dict, redundant iterations, missing cachesmoderate -- + data structure changes, caching strategies, event-driven refactors, loop fusioninsane -- + full algorithmic redesign where beneficial, dynamic programming, architectural restructuringQuestion 3: "Benchmarking"
yes -- Benchmark all non-trivial changes with luau CLI (when function is standalone)no -- Trust bytecode/codegen metrics only+EV only -- Benchmark only when algorithmic changes or ambiguous bytecode results warrant it| Setting | minimal | moderate | insane |
|---|---|---|---|
| Headers | Always | Always | Always |
| Type annotations | Function signatures | + hot path locals | + all locals |
| Pattern replacements | Deprecated only | Full Priority 2 | + Priority 4 |
| Function restructuring | No | Yes (Priority 3) | Aggressive splitting |
| Bytecode verification | After all changes | After each priority | After each change |
| Algorithmic changes | Per algo quiz | Per algo quiz | Per algo quiz |
| Native codegen strategy | Per-function analysis; @native on beneficial functions; --!native only if most are native-worthy | Selective @native on hot computational functions | Full per-function cost/benefit with budget tracking |
Restructuring scope scales with compilation intensity:
This phase comes FIRST. Algorithmic improvements produce order-of-magnitude speedups that dwarf any bytecode optimization. This is also the largest code change, so it must happen before any bytecode-level work.
Read the entire source file thoroughly before touching any CLI tools.
Monolithic functions hide their algorithmic structure behind interleaved concerns. Before analyzing complexity, understand what the code is actually doing:
When the quiz allows restructuring (moderate/insane):
-O2 -- clearer code AND better bytecode.Even at "minimal" intensity, still perform this mental decomposition to inform the analysis below.
With the structure understood (or cleaned up), analyze:
Complexity analysis -- Identify time complexity of every significant function:
Data structure fitness -- Is the right data structure being used?
true values)table.concat pattern or bufferCaching and memoization:
Redundant work elimination:
Architectural patterns:
task.wait()Hot path identification:
Before applying any changes, present a summary of all findings organized by impact. For each finding:
Then quiz the user using the AskQuestion tool:
apply / skip / modifyapply all / cherry-pick / skip all:
If the algorithm quiz from Phase 0 was "none", present findings as informational only (no apply options). The user still sees what was found.
If the user selects "modify", wait for their input before proceeding with that specific change.
Apply only user-approved changes, in this order (dependencies flow downward):
Benchmark algorithmic changes if the benchmarking quiz warrants it (see Phase 4.5).
Run all diagnostic tools on the code (post-algorithmic changes if any were applied in Phase 1). These metrics are the baseline for bytecode optimization phases.
Detect OS: use --target=x64_ms on Windows, --target=x64 on Linux/Mac. Route stats output to temp directory.
luau-analyze --mode=strict <file>
luau-analyze --annotate <file>
luau-compile --remarks -O2 --vector-lib=Vector3 --vector-ctor=new --vector-type=Vector3 <file>
luau-compile --text -O2 --vector-lib=Vector3 --vector-ctor=new --vector-type=Vector3 <file>
luau-compile --codegen --target=x64_ms --record-stats=function --stats-file=$LUAU_TMP\stats.json -O2 --vector-lib=Vector3 --vector-ctor=new --vector-type=Vector3 <file>
--text)--remarks)--remarks: "inlining succeeded"/"inlining failed")any types in --annotate output--record-stats JSON lowerStats)--mode=strict)Using both the source (already read in Phase 1) and tool output from Phase 2, identify bytecode-level optimization opportunities:
any inferences hurting native codegen (especially Vector3, CFrame, buffer params). JIT uses annotations directly -- no runtime type analysis. Unannotated params assumed to be tables.math.max resolved at load time via GETIMPORT. Broken by getfenv/setfenv/loadstring (marks env "impure").obj:Method() uses fast method call instruction. Avoid obj.Method(obj). __index should point at a table directly (not function or deep chain)... in loops -> table.concat or string interpolation.getfenv/setfenv (even read-only!), loadstring disable ALL import optimization and fastcalls.math = ... disables fastcall. Lint BuiltinGlobalWrite catches this.__eq always called on ==/~= even for rawequal values.Apply changes in priority order. Re-run luau-compile --remarks -O2 --vector-lib=Vector3 --vector-ctor=new --vector-type=Vector3 <file> after structural changes to verify the compiler benefits.
Add headers:
--!strict / --!optimize 2. (--!optimize 2 is default in live but not Studio testing.)--!native. Native codegen is analyzed separately (see "Native codegen strategy" below).Remove deoptimizers:
getfenv/setfenv -- disables builtins, imports, and optimizations globally.table.getn -> #t, table.foreach/table.foreachi -> for..in.Type accuracy -- the first big battle:
Getting the file to pass --!strict cleanly is the single most impactful compilation change. Every any type is a function the native codegen cannot specialize. But the goal is accurate types, not silencing errors with casts:
luau-analyze --annotate and identify every any inference. These are the targets.any.Vector3, CFrame, buffer params explicitly -- native codegen generates specialized vector code. Unannotated params are assumed to be generic tables with extra type checks.:: any casts. A cast to any is worse than no annotation -- it actively tells the compiler to give up. If a type error is hard to fix, use the narrowest cast possible (:: SpecificType), or restructure the code so the type flows naturally.any in dictionaries. {[string]: any} is common but hurts codegen. Ask: what values actually go in there? If it's a known set of types, use a union: {[string]: string | number | boolean}. If the dictionary has known keys, use a typed table: {name: string, score: number}. Only fall back to any when the value type is genuinely unbounded (e.g., serialized data from an external source).type(), typeof(), :IsA(), assert()) to provide type info from runtime checks instead of casts.typeof(setmetatable(...)) with explicit self: ClassName on methods (old typechecker compatible).<T>) instead of any. If the function truly accepts anything, use unknown and narrow explicitly.any in --annotate output. The goal is zero (or as close as practical).Native codegen strategy -- --!native vs @native vs neither:
Native codegen has two limits: a 1M instruction cap per module and a per-experience memory ceiling shared across all native scripts. Applying --!native indiscriminately wastes both budgets on functions that don't benefit, crowding out functions that would.
Classify every function in the file by its dominant work:
Vector3/CFrame arithmetic, buffer ops, bit32 ops, numerical algorithms, physics calculations). These get genuine speedups from JIT compilation.Instance:Clone(), :FindFirstChild(), :GetChildren(), event connections), table construction without computation (allocation-bound), heavy Luau library calls (string.format, pattern matching -- C implementations native codegen can't speed up), one-shot initialization, coroutine/task scheduling. The dominant cost is the API/library call or GC, not Luau instruction dispatch.Decision matrix:
--!native on the whole file.@native on individual functions that benefit. This is the common case for real game scripts.@native placement rules:
@native directly above the function declaration.@native -- annotate them individually if needed.require() time; @native has no meaningful benefit there.Budget awareness: Check native codegen stats from --record-stats. If the module's native instruction count is high, be more selective with @native to leave budget for other scripts in the experience.
local floor = math.floor enables GETIMPORT fastcall. Fastcall builtins: assert, type, typeof, rawget/rawset/rawequal, getmetatable/setmetatable, tonumber/tostring, most math.* (not noise, random/randomseed), bit32.*, some string.*/table.*. Partial specializations: assert (unused return + truthy), bit32.extract (constant field/width), select(n, ...) O(1).pairs(t)/ipairs(t) -> for k, v in t do. Generalized iteration skips the pairs() call. for i=1,#t is slightly slower.math.floor(a / b) -> a // b (dedicated VM opcode, //= compound form).+=, -=, *=, //=, ..=) -- LHS evaluated once.table.concat or backtick interpolation (lowers to optimized string.format).table.create(n) for known-size arrays. Sequential fill: local t = table.create(N); for i=1,N do t[i] = ... end.table.insert(t, v) for unknown-size append -- #t is O(1) cached, worst case O(log N).a + (b - a) * t -> math.lerp(a, b, t) (exact at endpoints, monotonic).bit32.byteswap(n) (CPU bswap).bit32.countlz(n) (CPU instruction, ~8x faster).table.find(t, v).pairs clone -> table.clone(t).string.byte/string.char -> string.pack/string.unpack.rawlen(t) when metamethods not needed..//..//@ prefixes in require().-O2).: syntax), not metamethod. Disabled by getfenv/setfenv.local function double(x) return x*2 end; local y = double(5) folds to y = 10.pcall out of hot loops.obj:Method() not obj.Method(obj) -- fast method call instruction.__index should point at a table directly for inline caching.buffer for binary data (fixed-size, offset-based, efficient native lowering).buffer.readu32 + bit32 over buffer.readbits when schema known.* 2^n -> bit32.lshift, / 2^n -> bit32.rshift.tostring/tonumber in tight loops.table.freeze for readonly config (avoids proxy __index overhead).__eq cheap -- fires on every ==/~= and table.find.math.isfinite/math.isnan/math.isinf over x ~= x.Vector3 params for native specialization.if expr then A else B over cond and A or B (one branch, falsy-safe).luau CLIWhen a non-trivial optimization is applied to an isolated, pure-logic function (no Roblox API dependencies), write a small benchmark harness:
local function original(...)
-- paste original implementation
end
local function optimized(...)
-- paste optimized implementation
end
local ITERATIONS = 1_000_000
local clock = os.clock
for _ = 1, 1000 do original(...) end
for _ = 1, 1000 do optimized(...) end
local t0 = clock()
for _ = 1, ITERATIONS do original(...) end
local t1 = clock()
for _ = 1, ITERATIONS do optimized(...) end
local t2 = clock()
print(`Original: {t1 - t0:.4f}s`)
print(`Optimized: {t2 - t1:.4f}s`)
print(`Speedup: {(t1 - t0) / (t2 - t1):.2f}x`)
Run with both paths:
luau -O2 bench.luau
luau -O2 --codegen bench.luau
When to benchmark: Self-contained function, algorithmic/data structure change, ambiguous bytecode diff, tiebreak between approaches.
When NOT to benchmark: Roblox API dependencies (use Studio Script Profiler instead), purely additive changes, cold code.
Write the benchmark file to $LUAU_TMP\bench.luau, not the project directory. Run luau from $LUAU_TMP as working directory so profile.out and other residuals land there too.
Re-run all Phase 2 tools and present a before/after comparison:
any types)If any metric regressed, investigate and explain why (or revert).
When optimizing a directory:
any types, most allocations, most failed inlines.These principles govern every optimization decision:
luau-compile --remarks. If the compiler didn't benefit, the change wasn't worth it.any type is a missed specialization opportunity. The JIT uses annotations directly with no runtime analysis.--!native on every file wastes the per-experience native code memory limit on functions that don't benefit. Apply @native selectively to computational hot functions; reserve --!native for genuinely math-heavy modules.Consult reference.md during execution for detailed lookup tables covering:
--!native vs @native breakdown)luau-compile, luau-analyze, luau, luau-ast)