-Coco is a small extension to get True C Coroutine
-semantics for Lua 5.1.
-
-
-Coco is both available as a stand-alone release and integrated
-into LuaJIT 1.x.
-
-
-The stand-alone release is a patchset against the
-» standard Lua 5.1.4
-distribution. There are no dependencies on LuaJIT. However LuaJIT 1.x
-depends on Coco to allow yielding for JIT compiled functions.
-
-True C coroutine semantics mean you can yield from a coroutine
-across a C call boundary and resume back to it.
-
-
-Coco allows you to use a dedicated C stack for each coroutine.
-Resuming a coroutine and yielding from a coroutine automatically switches
-C stacks, too.
-
-
-In particular you can now:
-
-
-
Yield across all metamethods (not advised for __gc).
-
Yield across iterator functions (for x in func do).
-
Yield across callbacks (table.foreach(), dofile(), ...).
-
Yield across protected callbacks (pcall(), xpcall(), ...).
-
Yield from C functions and resume back to them.
-
-
-Best of all, you don't need to change your Lua or C sources
-and still get the benefits. It's fully integrated into the
-Lua core, but tries to minimize the required changes.
-
-
-
More ...
-
-Please visit the » Download page
-to fetch the current version of the stand-alone package.
-
-
-Coco needs some machine-specific features — please have a look
-at the Portability Requirements.
-
-
-Coco also provides some upwards-compatible
-API Extensions for Lua.
-
-The optional argument cstacksize specifies the size of the
-C stack to allocate for the coroutine:
-
-
-
A default stack size is used if cstacksize is not given
-or is nil or zero.
-
No C stack is allocated if cstacksize is -1.
-
Any other value is rounded up to the minimum size
-(i.e. use 1 to get the minimum size).
-
-
-Important notice for LuaJIT: JIT compiled functions cannot
-yield if a coroutine does not have a dedicated C stack.
-
-
-
olddefault = coroutine.cstacksize([newdefault])
-
-Returns the current default C stack size (may be 0 if the
-underlying context switch method has its own default).
-Sets a new default C stack size if newdefault is present.
-Use 0 to reset it to the default C stack size. Any other
-value is rounded up to the minimum size.
-
-
-
C API extensions
-
-All C API functions are either unchanged or upwards compatible.
-
-
-
int lua_yield(lua_State *L, int nresults)
-
-The semantics for lua_yield() have changed slightly.
-Existing programs should work fine as long as they follow
-the usage conventions from the Lua manual:
-
-
-return lua_yield(L, nresults);
-
-
-Previously lua_yield() returned a 'magic' value (-1) that
-indicated a yield. Your C function had to pass this value
-on to the Lua core and was not called again.
-
-
-Now, if the current coroutine has an associated C stack,
-lua_yield() returns the number of arguments passed back from
-the resume. This just happens to be the right convention for
-returning them as a result from a C function. I.e. if you
-used the above convention, you'll never notice the change.
-
-
-But the results are on the Lua stack when lua_yield()
-returns. So the C function can just continue and process them
-or retry an I/O operation etc. And your whole C stack frame
-(local variables etc.) is still there, too. You can yield from
-anywhere in your C program, even several call levels deeper.
-
-
-Of course all of this only works with Lua+Coco and not with standard Lua.
-
-
-
lua_State *lua_newcthread(lua_State *L, int cstacksize)
-
-This is an (optional) new function that allows you to create
-a coroutine with an associated C stack directly from the C API.
-Other than that it works the same as lua_newthread(L).
-
-
-You have to declare this function as extern
-yourself, since it's not part of the official Lua API.
-This means that a C module that uses this call cannot
-be loaded with standard Lua. This may be intentional.
-
-
-If you want your C module to work with both standard Lua
-and Lua+Coco you can check whether Coco is available with:
-
Fix compilation of the GCC inline assembler code on x64.
-Now works when compiled as C++ code (reported by Jonathan Sauer)
-or with -fPIC (reported by Jim Pryor).
-
Added GCC inline assembler for faster context switching on Sparc.
-Thanks to Takayuki Usui.
-
-
-
Coco 1.1.5 — 2008-10-25
-
-
Upgraded to patch cleanly into Lua 5.1.4.
-
Added GCC inline assembler for faster context switching on x64.
-Thanks to Robert G. Jakabosky.
-
-
-
Coco 1.1.4 — 2008-02-05
-
-
Upgraded to patch cleanly into Lua 5.1.3.
-
Fixed setjmp method for ARM with recent glibc versions.
-Thanks to the LuaTeX developers.
-
Fixed setjmp method for x86 on Mac OS X (rarely used,
-default is GCC inline assembler). Thanks to Jason Toffaletti.
-
-
-
Coco 1.1.3 — 2007-05-24
-
-
Upgraded to patch cleanly into Lua 5.1.2.
-
Merged patch from Zachary P. Landau for a Linux/ARM setjmp method (uClibc and glibc).
-
-
-
Coco 1.1.1 — 2006-06-20
-
-
Upgraded to patch cleanly into Lua 5.1.1.
-
C stacks are deallocated early: when a coroutine ends, and not when
-the coroutine object is collected. This mainly benefits Windows Fibers.
-
Windows threads get the required Fiber context when resuming
-a coroutine and not just on creation.
-
-
-
Coco 1.1.0 — 2006-02-18
-
-
Upgraded to patch cleanly into Lua 5.1 (final).
-
Added GCC inline assembler for context switching on x86 and MIPS32
-[up to 3x faster].
-
New targets for setjmp method:
-Mac OS X/x86, Solaris/x86 and x64 and Linux/MIPS32.
-
Workaround for WinXP problem with GetCurrentFiber().
-
The minimum C stack size has been increased to 32K+4K.
-
Removed lcocolib.c and integrated the (much smaller) changes
-into lbaselib.c.
-Note for embedders: this means you no longer need to call
-luaopen_coco().
-
Optional Valgrind support requires version 3.x.
-Renamed define to USE_VALGRIND.
-
C stacks are now registered with Valgrind.
-
-
-
Coco pre-release 51w6 — 2005-08-09
-
-This is the first pre-release of Coco. It targets Lua 5.1-work6 only
-and is no longer available for download.
-
-Coco needs some machine-specific features which are
-inherently non-portable. Although the coverage is pretty good,
-this means that Coco will probably never be a standard part
-of the Lua core (which is pure ANSI C).
-
-
-
Context Switching Methods
-
-Coco relies on four different machine-specific methods
-for allocating a C stack and switching context.
-The appropriate method is automatically selected at compile time.
-
-
-
GCC Inline Assembler
-
-This method is only available when GCC 3.x/4.x is used
-to compile the source.
-This is the fastest method for context switching, but only available
-for a few CPUs (see below).
-
-
-
Modified setjmp Buffer
-
-This method changes a few fields in the setjmp buffer to
-redirect the next longjmp to a new function with a new stack
-frame. It needs a bit of guesswork and lots of #ifdef's to
-handle the supported CPU/OS combinations, but this is quite
-manageable.
-
-
-This is the fallback method if inline assembler is not available.
-It's pretty fast because it doesn't have to save or restore signals
-(which is slow and generally undesirable for Lua coroutines).
-
-
-
POSIX ucontext
-
-The POSIX calls getcontext, makecontext and switchcontext
-are used to set up and switch between different C stacks.
-Although highly portable and even available for some
-esoteric platforms, it's slower than the setjmp method
-because it saves and restores signals, too (using at least one
-syscall for each context switch).
-
-
-You can force the use of ucontext (instead of setjmp) by enabling
--DCOCO_USE_UCONTEXT in src/Makefile.
-
-
-
Windows Fibers
-
-This is the standard method to set up and switch between
-different C stacks on Windows. It's available on Windows 98
-and later.
-
-
-None of the other methods work for Windows because OS specific code
-is required to switch exception handling contexts.
-
-
-
Supported Platforms
-
-Coco has support for the following platforms:
-
-
-
-
CPU
-
System
-
Method
-
-
-
x86
(any OS)
gccasm
-
-
x86
Linux
setjmp
-
-
x86
FreeBSD
setjmp
-
-
x86
NetBSD
setjmp
-
-
x86
OpenBSD
setjmp
-
-
x86
Solaris
setjmp
-
-
x86
Mac OS X
setjmp
-
-
x64
(any OS)
gccasm
-
-
x64
Solaris
setjmp
-
-
MIPS32
(any OS)
gccasm
-
-
MIPS32
Linux
setjmp
-
-
ARM
Linux
setjmp
-
-
PPC32
Mac OS X
setjmp
-
-
(any CPU)
POSIX
ucontext
-
-
(any CPU)
Windows
fibers
-
-
-
-
-It should work pretty much anywhere where a correct
-POSIX ucontext implementation is available. It has been tested
-on every systems I could get hold of (e.g. Sparc, PPC32/PPC64,
-IA64, Alpha, HPPA with various operating systems).
-
-
-
Caveats
-
-
-Some older operating systems may have defective ucontext
-implementations because this feature is not widely used. E.g. some
-implementations don't mix well with other C library functions
-like malloc() or with native threads.
-This is really not the fault of Coco — please upgrade your OS.
-
-
-Note for Windows: Please read the explanation for the default
-» Thread Stack Size
-in case you want to create large numbers of Fiber-based coroutines.
-
-
-Note for MinGW/Cygwin: Older releases of GCC (before 4.0) generate
-wrong unwind information when -fomit-frame-pointer is used
-with stdcalls. This may lead to crashes when exceptions are thrown.
-The workaround is to always use two flags:
--fomit-frame-pointer -maccumulate-outgoing-args.
-
-
-Note for MIPS CPUs without FPU: It's recommended to compile
-all sources with -msoft-float, even if you don't use
-any floating point ops anywhere. Otherwise context switching must
-save and restore FPU registers (which needs to go through
-the slow kernel emulation).
-
-
-To run Coco with » Valgrind
-(a memory debugger) you must add -DUSE_VALGRIND
-to MYCFLAGS and recompile. You will get random errors
-if you don't! Valgrind 3.x or later is required. Earlier versions
-do not work well with newly allocated C stacks.
-
-DynASM is a Dynamic Assembler for code generation
-engines.
-
-
-DynASM has been developed primarily as a tool for
-LuaJIT, but might be useful for other
-projects, too.
-
-
-If you are writing a just-in-time compiler or need to generate
-code on the fly (e.g. for high-performance graphics or other
-CPU-intensive computations), DynASM might be just what you
-are looking for.
-
-
-Please have a look at the list of Features
-to find out whether DynASM could be useful for your project.
-
-Sorry, right now there is no proper documentation available other
-than some Examples and of course
-the source code. The source is well documented, though (IMHO).
-
-
-I may add more docs in case someone actually finds DynASM to be
-useful outside of LuaJIT. If you do, I'd like to
-hear from you, please. Thank you!
-
-
-If you want to check it out please visit the
-» Download page and fetch the most recent
-version of LuaJIT. All you need is in the dynasm directory.
-For some complex examples take a peek at the
-*.dasc and *.dash files in LuaJIT, too.
-
-Note: yes, you usually get the assembler code as comments and proper
-CPP directives to match them up with the source. I've omitted
-them here for clarity. Oh and BTW: the pipe symbols probably
-line up much more nicely in your editor than in a browser.
-
-
-Here 123 is an offset into the action list buffer that
-holds the partially specified machine code. Without going
-into too much detail, the embedded C library implements a
-tiny bytecode engine that takes the action list as input and
-outputs machine code. It basically copies machine code snippets
-from the action list and merges them with the arguments
-passed in by dasm_put().
-
-
-The arguments can be any kind of C expressions. In practical
-use most of them evaluate to constants (e.g. structure offsets).
-Your C compiler should generate very compact code out of it.
-
-
-The embedded C library knows only what's absolutely needed to
-generate proper machine code for the target CPU (e.g. variable
-displacement sizes, variable branch offset sizes and so on).
-It doesn't have a clue about other atrocities like x86 opcode
-encodings — and it doesn't need to. This dramatically
-reduces the minimum required code size to around 2K [sic!].
-
-
-The action list buffer itself has a pretty compact encoding, too.
-E.g. the whole action list buffer for an early version of LuaJIT
-needs only around 3K.
-
-
-
Advanced Features
-
-Here's a real-life example taken from LuaJIT that shows some
-advanced features like type maps, macros and how to access
-C structures:
-
The toolchain is split into a portable subset and
-CPU-specific modules.
-
DynASM itself (the pre-processor) is written in Lua.
-
There is no machine-dependency for the pre-processor itself.
-It should work everywhere you can get Lua 5.1 up and running
-(i.e. Linux, *BSD, Solaris, Windows, ... you name it).
-
-
-
DynASM Assembler Features
-
-
C code and assembler code can be freely mixed.
-Readable, too.
-
All the usual syntax for instructions and operand modes
-you come to expect from a standard assembler.
-
Access to C variables and CPP defines in assembler statements.
-
Access to C structures and unions via type mapping.
-
Convenient shortcuts for accessing C structures.
-
Local and global labels.
-
Numbered labels (e.g. for mapping bytecode instruction numbers).
-
Multiple code sections (e.g. for tailcode).
-
Defines/substitutions (inline and from command line).
-
Conditionals (translation time) with proper nesting.
-
Macros with parameters.
-
Macros can mix assembler statements and C code.
-
Captures (output diversion for code reordering).
-
Simple and extensible template system for instruction definitions.
-
-
-
Restrictions
-
-Currently only a subset of x86 (i386+) instructions is supported.
-Unsupported instructions are either not usable in user-mode or
-are slow on modern CPUs (i.e. not suited for a code generator).
-SSE, SSE2, SSE3 and SSSE3 are fully supported. MMX is not supported.
-
-
-The whole toolchain has been designed to support multiple CPU
-architectures. As LuaJIT gets support for more architectures,
-DynASM will be extended with new CPU-specific modules.
-
-
-The assembler itself will be extended with more features on an
-as-needed basis. E.g. I'm thinking about vararg macros.
-
-
-Note that runtime conditionals are not really needed, since you can
-just use plain C code for that (and LuaJIT does this a lot).
-It's not going to be more (time-) efficient if conditionals are done
-by the embedded C library (maybe a bit more space-efficient).
-
-DynASM —
-a Dynamic Assembler for code generation engines.
-
-
-
-
-
-
More ...
-
-Please click on one of the links in the navigation bar to your left
-to learn more.
-
-
-Click on the Logo in the upper left corner to visit
-the LuaJIT project page on the web. All other links to online
-resources are marked with a '»'.
-
-LuaJIT is a Just-In-Time Compiler for the Lua
-programming language.
-
-
-Lua is a powerful, light-weight programming language designed
-for extending applications. Lua is also frequently used as a
-general-purpose, stand-alone language. More information about
-Lua can be found at: » http://www.lua.org/
-
-
-LuaJIT 1.x is based on the Lua 5.1.x virtual machine and bytecode interpreter
-from lua.org. It compiles bytecode to native x86 (i386+) machine code
-to speed up the execution of Lua programs.
-
-
-LuaJIT depends on Coco to allow yielding
-from coroutines for JIT compiled functions. Coco is part of the
-LuaJIT distribution.
-
-All standard library functions have the same behaviour as
-in the Lua distribution LuaJIT is based on.
-
-
-The Lua loader used by the standard require() library
-function has been modified to turn off compilation of the main
-chunk of a module. The main chunk is only run once when the module
-is loaded for the first time. There is no point in compiling it.
-
-
-You might want to adapt this behaviour if you use your own utility
-functions (and not require()) to load modules.
-
-
-Note that the subfunctions defined in a loaded module are
-of course compiled. See below if you want to override this.
-
-
-
The jit.* Library
-
-This library holds several functions to control the behaviour
-of the JIT engine.
-
-
-
jit.on()
-jit.off()
-
-Turns the JIT engine on (default) or off.
-
-
-These functions are typically used with the command line options
--j on or -j off.
-
-Enable (with jit.on, default) or disable (with jit.off)
-JIT compilation for a Lua function. The current function (the Lua function
-calling this library function) can be specified with true.
-
-
-If the second argument is true, JIT compilation is also
-enabled/disabled recursively for all subfunctions of a function.
-With false only the subfunctions are affected.
-
-
-Both library functions only set a flag which is checked when
-the function is executed for the first/next time. They do not
-trigger immediate compilation.
-
-
-Typical usage is jit.off(true, true) in the main chunk
-of a module to turn off JIT compilation for the whole module.
-Note that require() already turns off compilation for
-the main chunk itself.
-
-
-
status = jit.compile(func [,args...])
-
-Compiles a Lua function and returns the compilation status.
-Successful compilation is indicated with a nil status.
-Failure is indicated with a numeric status (see jit.util.status).
-
-
-The optimizer pass of the compiler tries to derive hints from the
-passed arguments. Not passing any arguments or passing untypical
-arguments (esp. the wrong types) reduces the efficiency of the
-optimizer. The compiled function will still run, but probably not
-with maximum speed.
-
-
-This library function is typically used for Ahead-Of-Time (AOT)
-compilation of time-critical functions or for testing/debugging.
-
-
-
status = jit.compilesub(func|true [,true])
-
-Recursively compile all subfunctions of a Lua function.
-The current function (the Lua function calling this library function)
-can be specified with true. Note that the function
-itself is not compiled (use jit.compile()).
-
-
-If the second argument is true, compilation will stop
-when the first error is encountered. Otherwise compilation will
-continue with the next subfunction.
-
-
-The returned status is nil, if all subfunctions have been
-compiled successfully. A numeric status (see jit.util.status)
-indicates that at least one compilation failed and gives the status
-of the last failure (this is only helpful when stop on error
-is true).
-
-
-
jit.debug([level])
-
-Set the debug level for JIT compilation. If no level is given,
-the maximum debug level is set.
-
-
-
Level 0 disables debugging: no checks for hooks are compiled
-into the code. This is the default when LuaJIT is started and
-provides the maximum performance.
-
Level 1 enables function call debugging: call hooks and
-return hooks are checked in the function prologue and epilogue.
-This slows down function calls somewhat (by up to 10%).
-
Level 2 enables full debugging: all hooks are checked.
-This slows down execution quite a bit, even when the hooks
-are not active.
-
-
-Note that some compiler optimizations are turned off when
-debugging is enabled.
-
-
-This function is typically used with the command line options
--j debug or -j debug=level.
-
-
-
jit.attach(handler [, priority])
-
-Attach a handler to the compiler pipeline with the given priority.
-The handler is detached if no priority is given.
-
-
-The inner workings of the compiler pipeline and the API for handlers
-are still in flux. Please see the source code for more details.
-
-
-
jit.version
-
-Contains the LuaJIT version string.
-
-
-
jit.version_num
-
-Contains the version number of the LuaJIT core. Version xx.yy.zz
-is represented by the decimal number xxyyzz.
-
-
-
jit.arch
-
-Contains the target architecture name (CPU and optional ABI).
-
-
-
-
The jit.util.* Library
-
-This library holds many utility functions used by the provided
-extension modules for LuaJIT (e.g. the optimizer). The API may
-change in future versions.
-
-
-
stats = jit.util.stats(func)
-
-Retrieves information about a function. Returns nil
-for C functions. Returns a table with the following fields for
-Lua functions:
-
-
-
status: numeric compilation status (see jit.util.status).
-
stackslots: number of stack slots.
-
params: number of fixed parameters (arguments).
-
consts: number of constants.
-
upvalues: number of upvalues.
-
subs: number of subfunctions (sub prototypes).
-
bytecodes: number of bytecode instructions.
-
isvararg: fixarg (false) or vararg (true) function.
-
env: function environment table.
-
mcodesize: size of the compiled machine code.
-
mcodeaddr: start address of the compiled machine code.
-
-
-mcodesize and mcodeaddr are not set if the
-function has not been compiled (yet).
-
-
-
op, a, b, c, test = jit.util.bytecode(func, pc)
-
-Returns the fields of the bytecode instruction at the given pc
-for a Lua function. The first instruction is at pc = 1.
-Nothing is returned if pc is out of range.
-
-
-The opcode name is returned as an uppercase string in op.
-The opcode arguments are returned as a, b and
-optionally c. Arguments that indicate an index into the
-array of constants are translated to negative numbers (the first
-constant is referred to with -1). Branch targets are signed numbers
-relative to the next instruction.
-
-
-test is true if the instruction is a test (i.e. followed
-by a JMP).
-
-
-
const, ok = jit.util.const(func, idx)
-
-Returns a constant from the array of constants for a Lua function.
-ok is true if idx is in range. Otherwise nothing
-is returned.
-
-
-Constants are numbered starting with 1. A negative idx
-is mapped to a positive index.
-
-
-
upvalue, ok = jit.util.upvalue(func, idx)
-
-Returns an upvalue from the array of upvalues for a Lua function.
-ok is true if idx is in range. Otherwise nothing
-is returned. Upvalues are numbered starting with 0.
-
-
-
nup = jit.util.closurenup(func, idx)
-
-Returns the number of upvalues for the subfunction prototype with
-the given index idx for a Lua function. Nothing is returned
-if idx is out of range. Subfunctions are numbered starting
-with 0.
-
-Returns the numeric start address, the compiled machine code
-(converted to a string) and an iterator for the machine code fragment map
-for the specified machine code block associated with a Lua function.
-
-
-Returns nil and a numeric status code (see jit.util.status)
-if the function has not been compiled yet or compilation has failed
-or compilation is disabled. Returns nothing if the selected
-machine code block does not exist.
-
-
-The machine code fragment map is used for debugging and error handling.
-The format may change between versions and is an internal implementation
-detail of LuaJIT.
-
-
-
addr [, mcode] = jit.util.jsubmcode([idx])
-
-If idx is omitted or nil:
-Returns the numeric start address and the compiled machine code
-(converted to a string) for internal subroutines used by the
-compiled machine code.
-
-
-If idx is given:
-Returns the numeric start address of the machine code for a specific
-internal subroutine (0 based). Nothing is returned if idx is
-out of range.
-
-
-
jit.util.status
-
-This is a table that bidirectionally maps status numbers and
-status names (strings):
-
-
-
-
Status Name
Description
-
OK
Ok, code has been compiled.
-
NONE
Nothing analyzed or compiled, yet (default).
-
OFF
Compilation disabled for this function.
-
ENGINE_OFF
JIT engine is turned off.
-
DELAYED
Compilation delayed (recursive invocation).
-
TOOLARGE
Bytecode or machine code is too large.
-
COMPILER_ERROR
Error from compiler frontend.
-
DASM_ERROR
Error from DynASM engine.
-
-
-
-
jit.util.hints
-jit.util.fhints
-
-These two tables map compiler hint names to internal hint numbers.
-
-
-The hint system is an internal implementation detail of LuaJIT.
-Please see the source code for more info.
-
Remove a (sometimes) wrong assertion in luaJIT_findpc().
-
DynASM now allows labels for displacements and .aword.
-
Fix some compiler warnings for DynASM glue (internal API change).
-
Correct naming for SSSE3 (temporarily known as SSE4) in DynASM and x86 disassembler.
-
The loadable debug modules now handle redirection to stdout
-(e.g. -j trace=-).
-
-
-
LuaJIT 1.1.2 — 2006-06-24
-
-
Fix MSVC inline assembly: use only local variables with
-lua_number2int().
-
Fix "attempt to call a thread value" bug on Mac OS X:
-make values of consts used as lightuserdata keys unique
-to avoid joining by the compiler/linker.
The C stack is kept 16 byte aligned (faster).
-Mandatory for Mac OS X on Intel, too.
-
Faster calling conventions for internal C helper functions.
-
Better instruction scheduling for function prologue, OP_CALL and
-OP_RETURN.
-
-
-
Miscellaneous optimizations:
-
-
Faster loads of FP constants. Remove narrow-to-wide store-to-load
-forwarding stalls.
-
Use (scalar) SSE2 ops (if the CPU supports it) to speed up slot moves
-and FP to integer conversions.
-
Optimized the two-argument form of OP_CONCAT (a..b).
-
Inlined OP_MOD (a%b).
-With better accuracy than the C variant, too.
-
Inlined OP_POW (a^b). Unroll x^k or
-use k^x = 2^(log2(k)*x) or call pow().
-
-
-
Changes in the optimizer:
-
-
Improved hinting for table keys derived from table values
-(t1[t2[x]]).
-
Lookup hinting now works with arbitrary object types and
-supports index chains, too.
-
Generate type hints for arithmetic and comparison operators,
-OP_LEN, OP_CONCAT and OP_FORPREP.
-
Remove several hint definitions in favour of a generic COMBINE hint.
-
Complete rewrite of jit.opt_inline module
-(ex jit.opt_lib).
-
-
-
Use adaptive deoptimization:
-
-
If runtime verification of a contract fails, the affected
-instruction is recompiled and patched on-the-fly.
-Regular programs will trigger deoptimization only occasionally.
-
This avoids generating code for uncommon fallback cases
-most of the time. Generated code is up to 30% smaller compared to
-LuaJIT 1.0.3.
-
Deoptimization is used for many opcodes and contracts:
-
-
OP_CALL, OP_TAILCALL: type mismatch for callable.
-
Inlined calls: closure mismatch, parameter number and type mismatches.
-
OP_GETTABLE, OP_SETTABLE: table or key type and range mismatches.
-
All arithmetic and comparison operators, OP_LEN, OP_CONCAT,
-OP_FORPREP: operand type and range mismatches.
-
-
Complete redesign of the debug and traceback info
-(bytecode ↔ mcode) to support deoptimization.
-Much more flexible and needs only 50% of the space.
-
The modules jit.trace, jit.dumphints and
-jit.dump handle deoptimization.
-
-
-
Inlined many popular library functions
-(for commonly used arguments only):
-
-
Most math.* functions (the 18 most used ones)
-[2x-10x faster].
-
string.len, string.sub and string.char
-[2x-10x faster].
-
table.insert, table.remove and table.getn
-[3x-5x faster].
-
coroutine.yield and coroutine.resume
-[3x-5x faster].
-
pairs, ipairs and the corresponding iterators
-[8x-15x faster].
-
-
-
Changes in the core and loadable modules and the stand-alone executable:
-
-
Added jit.version, jit.version_num
-and jit.arch.
-
Reorganized some internal API functions (jit.util.*mcode*).
-
The -j dump output now shows JSUB names, too.
-
New x86 disassembler module written in pure Lua. No dependency
-on ndisasm anymore. Flexible API, very compact (500 lines)
-and complete (x87, MMX, SSE, SSE2, SSE3, SSSE3, privileged instructions).
-
luajit -v prints the LuaJIT version and copyright
-on a separate line.
-
-
-
Added SSE, SSE2, SSE3 and SSSE3 support to DynASM.
-
Miscellaneous doc changes. Added a section about
-embedding LuaJIT.
-LuaJIT is a rather complex application. There will undoubtedly
-be bugs lurking in there. You have been warned. :-)
-
-
-If you came here looking for information on how to debug
-your application (and not LuaJIT itself) then please
-check out jit.debug()
-and the -j debug
-command line option.
-
-
-But if you suspect a problem with LuaJIT itself, then try
-any of the following suggestions (in order).
-
-
-
Is LuaJIT the Problem?
-
-Try to run your application in several different ways:
-
-
-
luajit app.lua
-
luajit -O1 app.lua
-
luajit -O app.lua
-
luajit -j off app.lua
-
lua app.lua (i.e. with standard Lua)
-
-
-If the behaviour is the same as with standard Lua then ...
-well ... that's what LuaJIT is about: doing the same things,
-just faster. Even bugs fly faster. :-)
-
-
-So this is most likely a bug in your application then. It may be easier
-to debug this with plain Lua — the remainder of this page
-is probably not helpful for you.
-
-
-But if the behaviour is different, there is some likelihood
-that you caught a bug in LuaJIT. Oh dear ...
-
-
-Ok, so don't just give up. Please read on and help the community
-by finding the bug. Thank you!
-
-Please check if a newer version is available. Maybe the bug
-you have encountered has been fixed already. Always download the
-latest version and try it with your application before continuing.
-
-
-
Reproduce the Bug
-
-First try to make the bug reproducible. Try to isolate the module
-and the function the bug occurs in:
-
-
-Either selectively turn off compilation for some modules with
- jit.off(true, true)
-until the bug disappears ...
-
-
-And/or turn the whole JIT engine off and selectively compile
-functions with
- jit.compile(func)
-until it reappears.
-
-
-If you have isolated the point where it happens, it's most helpful
-to reduce the affected Lua code to a short code snippet that
-still shows the problem. You may need to print() some
-variables until you can pinpoint the exact spot where it happens.
-
-
-If you've got a reproducible and short test
-you can either send it directly to me or the mailing list
-(see the Contact Information)
-or you can try to debug this a bit further.
-
-
-Well — if you are brave enough. :-)
-
-
-
Look at the Generated Code
-
-You may want to have a look at the output of -j dumphints
-first. Try to change things around until you can see which hint
-or which instruction is the cause of the bug. If you suspect
-an optimizer bug then have a look at the backend (*.das[ch])
-and check how the hint is encoded.
-
-
-Otherwise have a look at -j dump and see whether
-you can spot the problem around the affected instruction.
-It's helpful to have a good knowledge of assembler, though
-(sorry).
-
-
-
Locate a Crash
-
-If you get a crash, you should compile LuaJIT with debugging
-turned on:
-
-
-Add -g to CFLAGS and MYLDFLAGS
-or whatever is needed to turn on debugging. For Windows you
-need both an executable and a DLL built with debugging.
-
-
-Then start LuaJIT with your debugger. Run it with
--j dump=test.dump.
-
-
-Have a look at the backtrace and compare it with the generated
-dump file to find out exactly where it crashes. I'm sorry, but
-symbols or instructions for JIT compiled functions are not
-displayed in your debugger (this is really hard to solve).
-
-
-
Turn on Assertions
-
-Another way to debug LuaJIT is to turn on assertions.
-They can be turned on only for the JIT engine by adding
--DLUAJIT_ASSERT to JITCFLAGS in src/Makefile.
-Then recompile with make clean and make.
-
-
-Add these two lines to src/luaconf.h to turn on all assertions in the Lua core:
- #include <assert.h>
- #define lua_assert(x) assert(x)
-This turns on the JIT engine assertions, too.
-Recompile and see whether any assertions trigger.
-Don't forget to turn off the (slow) assertions when you're done!
-
-
-
Use Valgrind
-
-A tremendously useful (and free) tool for runtime code analysis
-is » Valgrind. Regularly
-run your applications with valgrind --memcheck and
-your life will be better.
-
-
-To run LuaJIT under Valgrind you must add
--DUSE_VALGRIND to MYCFLAGS
-and recompile LuaJIT. You will get random errors if you don't!
-Valgrind 3.x or later is required. Earlier versions
-do not work well with newly allocated C stacks.
-
-
-An executable built with this option runs fine without Valgrind
-and without a performance loss. But it needs the Valgrind header
-files for compilation (which is why it's not enabled by default).
-
-
-It's helpful to compile LuaJIT with debugging turned on, too
-(see above).
-
-
-If Valgrind spots many invalid memory accesses that involve
-memory allocation/free functions you've probably found a bug
-related to garbage collection. Some object reference must have
-gone astray.
-
-
-Try to find out which object is disappearing. You can force
-eager garbage collection with repeated calls to
-collectgarbage() or by setting a very low threshold
-with collectgarbage("setpause", 1).
-
-
-
Don't Despair
-
-If all of this doesn't help to find the bug, please send
-a summary of your findings to the mailing list. Describe as much
-of the circumstances you think are relevant.
-
-
-Please don't send your whole application to me
-(without asking first) and especially not to the mailing list.
-Code snippets should preferrably be less than 50 lines and
-up to the point.
-
-
-All bug reports are helpful, even if no immediate solution
-is available. Often enough someone else finds the same bug
-in a different setting and together with your bug report
-this may help to track it down.
-
-
-Finally I have to say a BIG THANK YOU
-to everyone who has helped to make LuaJIT better by finding
-and fixing bugs!
-
Adaptive deoptimization is used to recompile individual bytecode
-instructions with broken contracts. This avoids generating code for the
-generic fallback cases most of the time (faster compilation, reduced
-I-cache contention).
-
Special CPU features (such as conditional moves or SSE2)
-are automatically used when detected.
-
-
-The JIT compiler is very fast:
-
-
-
Compilation times vary a great deal (depending on the nature of
-the function to be compiled) but are generally in the
-microsecond range.
-
Even compiling large functions (hundreds of lines) with the
-maximum optimization level takes only a few milliseconds in the
-worst case.
-
-
-LuaJIT is very small:
-
-
-
The whole JIT compiler engine adds only around 32K
-of code to the Lua core (if compiled with -Os).
-
The optimizer is split into several optional modules that
-can be loaded at runtime if requested.
-
LuaJIT adds around 6.000 lines of C and assembler code and
-2.000 lines of Lua code to the Lua 5.1 core (17.000 lines of C).
-
Required build tools (DynASM)
-take another 2.500 lines of Lua code.
-
-
-
Compatibility
-
-LuaJIT is designed to be fully compatible with Lua 5.1.
-It accepts the same source code and/or precompiled bytecode.
-It supports all standard language semantics. In particular:
-
-
-
All standard types, operators and metamethods are supported.
-
Implicit type coercions (number/string) work as expected.
-
Full IEEE-754 semantics for floating point arithmetics
-(NaN, +-Inf, +-0, ...).
-
Full support for lexical closures.
-Proper tail calls do not consume a call frame.
No changes to the Lua 5.1 incremental garbage collector.
-
No changes to the standard Lua/C API.
-
Dynamically loaded C modules are link compatible with Lua 5.1
-(same ABI).
-
LuaJIT can be embedded
-into an application just like Lua.
-
-
-Some minor differences are related to debugging:
-
-
-
Debug hooks are only called if debug code generation is enabled.
-
There is no support for tailcall counting in JIT compiled code.
-HOOKTAILRET is not called, too. Note: this won't affect you unless
-you are writing a Lua debugger. *
-
-
-* There is not much I can do to improve this situation without undue
-complications. A suggestion to modify the behaviour of standard Lua
-has been made on the mailing list (it would be beneficial there, too).
-
-
-
Restrictions
-
-
Only x86 (i386+) CPUs are supported right now (but see below).
-
Only the default type for lua_Number is supported
-(double).
-
The interrupt signal (Ctrl-C) is ignored unless you enable
-debug hooks (with -j debug). But this will seriously
-slow down your application. I'm looking for better ways to handle
-this. In the meantime you have to press Ctrl-C twice to interrupt
-a currently running JIT compiled function (just like C functions).
-
GDB, Valgrind and other debugging tools can't report symbols
-or stack frames for JIT compiled code. This is rather difficult to solve.
-Have a look at Debugging LuaJIT, too.
-
-
-
Caveats
-
-
LuaJIT allocates executable memory for the generated machine code
-if your OS has support for it: either HeapCreate() for Windows or
-mmap() on POSIX systems.
-The fallback is the standard Lua allocator (i.e. malloc()).
-But this usually means the allocated memory is not marked executable.
-Running compiled code will trap on CPUs/OS with the NX (No eXecute)
-extension if you can only use the fallback.
-
DynASM is needed to regenerate the
-ljit_x86.h file. But only in case you want to modify
-the *.dasc/*.dash files. A pre-processed *.h
-file is supplied with LuaJIT.
-DynASM is written in Lua and needs a plain copy of Lua 5.1
-(installed as lua). Or you can run it with LuaJIT built from
-the *.h file supplied with the distribution (modify
-DASM= in src/Makefile). It's a good idea to install
-a known good copy of LuaJIT under a different name for this.
-
LuaJIT ships with LUA_COMPAT_VARARG turned off.
-I.e. the implicit arg parameter is not created anymore.
-Please have a look at the comments in luaconf.h for
-this configuration option. You can turn it on, if you really need it.
-Or better yet, convert your code to the new Lua 5.1 vararg syntax.
-LuaJIT is not much more difficult to install than Lua itself.
-Just unpack the distribution file, change into the newly created
-directory and follow the instructions below.
-
-
-For the impatient: make linux && sudo make install
-Replace linux with e.g. bsd or macosx depending on your OS.
-
-
-In case you've missed this in Features:
-LuaJIT only works on x86 (i386+) systems right now. Support for
-other architectures may be added in future versions.
-
-
-
Configuring LuaJIT
-
-LuaJIT is (deliberately) not autoconfigured — the
-defaults should work fine on most systems. But please check the
-system-specific instructions below.
-
-
-The following three files hold all configuration information:
-
-
-
Makefile holds settings for installing LuaJIT.
-
src/Makefile holds settings for compiling LuaJIT.
-
src/luaconf.h sets a multitude of configuration
-variables.
-
-
-If this is your first build then it's better not to give into
-the temptation to tweak every little setting. The standard
-configuration provides sensible defaults (IMHO).
-
-
-One particular setting you might want to change is the installation
-path. Note that you need to modify both the top-level Makefile
-and src/luaconf.h (right at the start) to take
-effect.
-
-
-If you have trouble getting Coco to work, you can disable it by
-uncommenting the COCOFLAGS= -DCOCO_DISABLE line in
-src/Makefile. But note that this effectively disables
-yielding from coroutines for JIT compiled functions.
-
-
-A few more settings need to be changed if you want to
-Debug LuaJITitself.
-Application debugging can be turned on/off at runtime.
-
-
-
Upgrading From Previous Versions
-
-It's important to keep the LuaJIT core and the add-on modules in sync.
-Be sure to delete any old versions of LuaJIT modules from the
-Lua module search path (check the current directory, too!).
-
-
-Lua files compiled to bytecode may be incompatible if the underlying
-Lua core has changed (like from Lua 5.1 alpha to Lua 5.1
-final between LuaJIT 1.0.3 and LuaJIT 1.1.0). The same
-applies to any
-» loadable C modules
-(shared libraries, DLLs) which need to be recompiled with the new
-Lua header files.
-
-
-Compiled bytecode and loadable C modules are fully compatible and
-can be freely exchanged between LuaJIT and the same
-version of Lua it is based on. Please verify that LUA_RELEASE
-in src/lua.h is the same in both distributions.
-
-
-
Building LuaJIT
-
-
Makefile Targets
-
-The Makefiles have a number of targets for various operating systems:
-
-You may want to enable interactive line editing for the stand-alone
-executable. There are extra targets for Linux, BSD and Mac OS X:
-make linux_rl, make bsd_rl
-and make macosx_rl.
-
-
-
MSVC (Win32)
-
-First check out etc\luavs.bat if it suits your needs. Then try
-running it from the MSVC command prompt (start it from the toplevel directory).
-
-
-Another option is to set up your own MSVC project:
-
-
-Change to the src directory
-and create a new DLL project for lua51.dll.
-Add all C files to it except for lua.c, luac.c
-and print.c. Add the ..\dynasm directory
-to the include path and build the DLL.
-
-
-Next create a new EXE project for luajit.exe.
-Add lua.c to it and link with the import library
-lua51.lib created for lua51.dll. Build
-the executable.
-
-
-
Installation
-
-
POSIX systems
-
-Run make install from the top-level directory.
-You probably need to be the root user before doing so, i.e. use
-sudo make install or su - root
-before the make install.
-
-
-By default this installs only:
- /usr/local/bin/luajit — The stand-alone executable.
- /usr/local/lib/lua/5.1 — C module directory.
- /usr/local/share/lua/5.1 — Lua module directory.
- /usr/local/share/lua/5.1/jit/*.lua —
-jit.* modules.
-
-
-The Lua docs and includes are not installed to avoid overwriting
-an existing Lua installation. In any case these are identical
-to the version of Lua that LuaJIT is based on. If you want
-to install them, edit the top-level makefile (look for ###).
-
-
-The stand-alone Lua bytecode compiler luac is neither
-built nor installed, for the same reason. If you really need it,
-you may be better off with luac built from the original Lua
-distribution (use the same version your copy of LuaJIT
-is based on). This avoids dragging in most of LuaJIT which is not
-needed for the pure bytecode compiler. You can also use the bare-bones
-Lua to bytecode translator luac.lua (look in the test
-directory of the original Lua distribution).
-
-
-
Windows
-
-Copy luajit.exe and lua51.dll
-to a newly created directory (any location is ok). Add lua
-and lua\jit directories below it and copy all Lua files
-from the jit directory of the distribution to the latter directory.
-
-
-There are no hardcoded
-absolute path names — all modules are loaded relative to the
-directory where luajit.exe is installed
-(see src/luaconf.h).
-
-
-
Embedding LuaJIT
-
-It's strongly recommended that you build the stand-alone executable
-with your toolchain and verify that it works before starting
-to embed LuaJIT into an application. The stand-alone executable is
-also useful later on, when you want to experiment with code snippets
-or try out some Lua files.
-
-
-Please consult the Lua docs for general information about how to
-embed Lua into your application. The following list only shows
-the additional steps needed for embedding LuaJIT:
-
-
-
You need to add the LuaJIT library functions by running
-luaopen_jit() after all the other standard library functions.
-The modified src/linit.c used by the stand-alone executable
-already does this for you.
-
Caveat: LuaJIT is based on Lua 5.1 which
-means the luaopen_*() functions must not
-be called directly. See src/linit.c for the proper way to
-run them. You'll get an error initializing the io library
-if you don't follow these instructions.
-
To use the optimizer (strongly recommended) you need to:
-
-
Install the optimizer modules jit.opt and
-jit.opt_inline relative to the Lua module path
-(you've probably modified it — see src/luaconf.h):
-jit/opt.lua
-jit/opt_inline.lua
-
If you want to ship a single executable then you may want to
-embed the optimizer modules into your application (but don't loose
-time with this during the early development phase). This involves:
-
-
Compile the two modules to bytecode
-(using luac -s from a plain Lua installation).
-
Convert them to C include files (search for "Lua bin2c").
-
On Windows you can also put the compiled bytecode into a resource
-(search for "Lua bin2res").
-
Load the bytecode with luaL_loadbuffer (but don't run it).
-
Put the resulting functions into package.preload["jit.opt"]
-and package.preload["jit.opt_inline"].
-
-
Activate the LuaJIT optimizer from Lua code to be run at startup:
- require("jit.opt").start()
-Or use equivalent C code. See dojitopt() in src/lua.c.
-
-
All other LuaJIT specific modules (jit.*) are for debugging only.
-They do not need to be shipped with an application. But they may be quite
-useful, anyway (especially jit.trace).
-
DynASM is only needed while building LuaJIT. It's not
-needed while running LuaJIT and there is no point in shipping or
-installing it together with an application.
-
In case you want to strip some of the standard libraries from
-your application: The optimizer modules need several functions from
-the base library and the string library (and of course the LuaJIT
-core libraries). The io library is only used to print a fatal error
-message (you may want to replace it). The optional modules
-for debugging depend on a few more library functions —
-please check the source.
-
-
-Although the very liberal LuaJIT
-» license
-does not require any acknowledgment whatsoever, it would be appreciated
-if you give some credit in the docs (or the "About" box) of your application.
-A simple line like:
- This product includes LuaJIT, http://luajit.org/
-would be nice. Please do not include any E-Mail addresses. Thank you!
-
-
-I'm always interested where LuaJIT can be put to good use in applications.
-Please tell me
-or better yet write a few lines about your project to the
-» Lua mailing list.
-Thank you!
-
-This is a little essay that tries to answer the question:
-'So, how does LuaJIT really work?'.
-
-
-I tried to avoid going into all the gory details, but at the
-same time provide a deep enough explanation, to let you find
-your way around LuaJIT's inner workings.
-
-
-The learning curve is maybe a little bit steep for newbies and
-compiler gurus will certainly fall asleep after two paragraphs.
-It's difficult to strike a balance here.
-
-
-
Acronym Soup
-
-As the name says LuaJIT is a Just-In-Time (JIT) compiler.
-This means that functions are compiled on demand, i.e. when they
-are run first. This ensures both a quick application startup
-and helps to avoid useless work, too. E.g. unused functions
-are not compiled at all.
-
-
-The other alternative is known as Ahead-Of-Time (AOT)
-compilation. Here everything is compiled before running any function.
-This is the classic way for many languages, such as C or C++.
-
-
-In fact plain Lua allows you to pre-compile Lua source code into
-Lua bytecode and store it in a binary file that can be run
-later on. This is used only in specific settings (e.g. memory limited
-embedded systems), because the Lua bytecode compiler is really fast.
-The ability to run source files right away is part of what makes
-a dynamic language (aka scripting language) so powerful.
-
-
-JIT compilation has a few other advantages for dynamic languages
-that AOT compilation can only provide with a massive amount
-of code analysis. More can be found in the literature.
-One particular advantage is explained later.
-
-
-
Quick, JIT — Run!
-
-JIT compilation happens mostly invisible. You'll probably never
-notice that a compilation is going on. Part of the secret is
-that everything happens in little pieces intermixed with running
-the application itself inbetween. The other part of the secret
-is that JIT compilation can be made pretty fast.
-
-
-Most applications quickly converge to a stable state where
-everything that really needs to be compiled is compiled
-right away. Only occasional isolated compiles happen later on.
-
-
-Even though the name doesn't suggest it, LuaJIT can operate
-in AOT mode, too. But this is completely under user control
-(see jit.compile())
-and doesn't happen automatically.
-
-
-Unless you have good reason to suspect that AOT compilation
-might help for a specific application, I wouldn't bother though.
-Compilation speed is usually a non-argument, because LuaJIT
-is extremely fast. Compilation times are typically in the
-microsecond range for individual Lua functions.
-
-
-
Starting Up
-
-The next few paragraphs may not be exactly breaking news to you,
-if you are familiar with JIT compilers. Still, please read on,
-because some terms are introduced that are used later on.
-
-
-When you start LuaJIT everything proceeds like in standard Lua:
-the Lua core is initialized, the standard libraries are loaded and
-the command line is analyzed. Then usually the first Lua source
-code file is loaded and is translated to Lua bytecode. And finally
-the function for the initial main chunk is run ...
-
-
-
Kicking the Compiler
-
-This is where LuaJIT kicks in:
-
-
-All Lua functions carry an additional status code for LuaJIT.
-Initially this is set to 'NONE', i.e. the function has not been
-looked at (yet). If a function is run with this setting,
-the LuaJIT compiler pipeline is started up.
-
-
-If you haven't loaded any special LuaJIT modules and optimization
-is not turned on, the compiler pipeline only consists of the
-compiler backend.
-
-
-The compiler backend is the low-level encoding engine that translates
-bytecode instructions to machine code instructions. Without any
-further hints from other modules, the backend more or less does a
-1:1 translation. I.e. a single variant of a bytecode instruction
-corresponds to a single piece of machine code.
-
-
-If all goes well, these little code pieces are put together,
-a function prologue is slapped on and voila: your Lua function
-has been translated to machine code. Of course things are not
-that simple when you look closer, but hey — this is
-the theory.
-
-
-Anyway, the status code for the function is set to 'OK' and the
-machine code is run. If this function runs another Lua function
-which has not been compiled, that one is compiled, too. And so on.
-
-
-
Call Gates
-
-Ok, so what happens when a function is called repeatedly? After all
-this is the most common case.
-
-
-Simple: The status code is checked again. This time it's set to 'OK',
-so the machine code can be run directly. Well — that's not the
-whole truth: for calls that originate in a JIT compiled function
-a better mechanism, tentatively named call gates is used.
-
-
-Every function has a call gate field (a function pointer). By default
-it's set to a function that does the above checks and runs the
-compiler. But as soon as a function is compiled, the call gate
-is modified to point to the just compiled machine code.
-
-
-Calling a function is then as easy as calling the code that the
-call gate points to. But due to special (faster) calling conventions
-this function pointer cannot be used directly from C. So calls from
-a non-compiled function or from a C function use an extra entry
-call gate which in turn calls the real call gate. But this is
-really a non-issue since most calls in typical applications
-are intra-JIT calls.
-
-
-
The Compiler Pipeline
-
-The compiler pipeline has already been mentioned. This sounds
-more complicated than it is. Basically this is a coroutine that
-runs a frontend function which in turn calls all functions
-from the pipeline table.
-
-
-The pipeline table is sorted by priorities. The standard
-backend has priority 0. Positive priorities are run before the
-backend and negative priorities are run after the backend. Modules
-can dynamically attach or detach themselves to the pipeline with
-the library function jit.attach().
-
-
-So a typical optimizer pass better have a positive priority,
-because it needs to be run before the backend is run. E.g. the
-LuaJIT optimizer module registers itself with priority 50.
-
-
-On the other hand a typical helper module for debugging —
-a machine code disassembler — needs to be run after the
-backend and is attached with a negative priority.
-
-
-One special case occurs when compilation fails. This can be due to
-an internal error (ouch) or on purpose. E.g. the optimizer module
-checks some characteristics of the function to be compiled and
-may decide that it's just not worth it. In this case a status
-other than OK is passed back to the pipeline frontend.
-
-
-The easiest thing would be to abort pipeline processing and just
-give up. But this would remove the ability to trace the progress
-of the compiler (which better include failed compilations, too).
-So there is a special rule that odd priorities are still run,
-but even priorities are not. That's why e.g. -j trace
-registers itself with priority -99.
-
-
-
The Optimizer
-
-Maybe it hasn't become clear from the above description,
-but a module can attach any Lua or C function to the compiler
-pipeline. In fact all of the loadable modules are Lua modules.
-Only the backend itself is written in C.
-
-
-So, yes — the LuaJIT optimizer is written in pure Lua!
-
-
-And no, don't worry, it's quite fast. One reason for this is
-that a very simple abstract interpretation algorithm
-is used. It mostly ignores control flow and/or basic block
-boundaries.
-
-
-Thus the results of the analysis are really only hints.
-The backend must check the preconditions (the contracts)
-for these hints (e.g. the object type). Still, the generated
-hints are pretty accurate and quite useful to speed up the
-compiled code (see below).
-
-
-Explaining how abstract interpretation works is not within the
-scope for this short essay. You may want to have a look at the
-optimizer source code and/or read some articles or books on
-this topic. The canonical reference is
-» Principles of Program Analysis.
-Ok, so this one is a bit more on the theoretical side (a gross
-understatement). Try a search engine with the keywords "abstract
-interpretation", too.
-
-
-Suffice to say the optimizer generates hints and passes these
-on to the backend. The backend then decides to encode different
-forms for the same bytecode instruction, to combine several
-instructions or to inline code for C functions. If the hints
-from the optimizer are good, the resulting code will perform
-better because shorter code paths are used for the typical cases.
-
-
-
The JIT Advantage
-
-One important feature of the optimizer is that it takes 'live'
-function arguments into account. Since the JIT compiler is
-called just before the function is run, the arguments for this
-first invocation are already present. This can be used to great
-advantage in a dynamically typed language, such as Lua.
-
-
-Here's a trivial example:
-
-
-function foo(t, k)
- return t[k]
-end
-
-
-Without knowing the most likely arguments for the function
-there's not much to optimize.
-
-
-Ok, so 't' is most likely a table. But it could be userdata, too.
-In fact it could be any type since the introduction of generic
-metatables for types.
-
-
-And more importantly 'k' can be a number, a string
-or any other type. Oh and let's not forget about metamethods ...
-
-
-If you know a bit about Lua internals, it should be clear by now
-that the code for this function could potentially branch to half
-of the Lua core. And it's of course impossible to inline all
-these cases.
-
-
-On the other hand if it's known (or there's a good hint)
-that 't' is a table and that 'k' is a positive integer, then there
-is a high likeliness that the key 'k' is in the array part
-of the table. This lookup can be done with just a few machine code
-instructions.
-
-
-Of course the preconditions for this fast path have to be checked
-(unless there are definitive hints). But if the hints are right,
-the code runs a lot faster (about a factor of 3 in this case
-for the pure table lookup).
-
-
-
Optimizing the Optimizer
-
-A question that surely popped up in your mind while reading
-the above section: does the optimizer optimize itself? I.e.
-is the optimizer module compiled?
-
-
-The current answer is no. Mainly because the compiler pipeline
-is single-threaded only. It's locked during compilation and
-any parallel attempt to JIT compile a function results in
-a 'DELAYED' status code. In fact all modules that attach to
-the compiler pipeline disable compilation for the entire
-module (because LuaJIT would do that anyway). The main chunk
-of modules loaded with require() is never compiled,
-so there is no chicken-and-egg problem here.
-
-
-Of course you could do an AOT compilation in the main chunk of
-the optimizer module. But then only with the plain backend.
-Recompiling it later on with the optimizer attached doesn't work,
-because a function cannot be compiled twice (I plan to lift
-this restriction).
-
-
-The other question is whether it pays off to compile the optimizer
-at all? Honestly, I haven't tried, because the current optimizer
-is really simple. It runs very quickly, even under the bytecode
-interpreter.
-
-
-
That's All Folks
-
-Ok, that's all for now. I'll extend this text later on with
-new topics that come up in questions. Keep on asking these
-on the mailing list if you are interested.
-
-As is always the case with benchmarks, care must be taken to
-interpret the results:
-
-
-First, the standard Lua interpreter is already very fast.
-It's commonly the fastest of it's class (interpreters) in the
-» Great Computer Language Shootout.
-Only true machine code compilers get a better overall score.
-
-
-Any performance improvements due to LuaJIT can only be incremental.
-You can't expect a speedup of 50x if the fastest compiled language
-is only 5x faster than interpreted Lua in a particular benchmark.
-LuaJIT can't do miracles.
-
-
-Also please note that most of the benchmarks below are not
-trivial micro-benchmarks, which are often cited with marvelous numbers.
-Micro-benchmarks do not realistically model the performance gains you
-can expect in your own programs.
-
-
-It's easy to make up a few one-liners like:
- local function f(...) end; for i=1,1e7 do f() end
-This is more than 30x faster with LuaJIT. But you won't find
-this in a real-world program.
-
-
-
Measurement Methods
-
-All measurements have been taken on a Pentium III 1.139 GHz
-running Linux 2.6. Both Lua and LuaJIT have been compiled with
-GCC 3.3.6 with -O3 -fomit-frame-pointer.
-You'll definitely get different results on different machines or
-with different C compiler options. *
-
-
-The base for the comparison are the user CPU times as reported by
-/usr/bin/time. The runtime of each benchmark is parametrized
-and has been adjusted to minimize the variation between several runs.
-The ratio between the times for LuaJIT and Lua gives the speedup.
-Only this number is shown because it's less dependent on a specific system.
-
-
-E.g. a speedup of 6.74 means the same benchmark runs almost 7 times
-faster with luajit -O than with standard Lua (or with
--j off). Your mileage may vary.
-
-
-* Yes, LuaJIT relies on quite a bit of the Lua core infrastructure
-like table and string handling. All of this is written in C and
-should be compiled with full optimization turned on, or performance
-will suffer.
-
-Note that many of these benchmarks have changed over time (both spec
-and code). Benchmark results shown in previous versions of LuaJIT
-are not directly comparable. The next section compares different
-versions with the current set of benchmarks.
-
-
-
Comparing LuaJIT Versions
-
-This shows the improvements between the following versions:
-
-
-
LuaJIT 1.0.x
-
LuaJIT 1.1.x
-
-
-
-
-
-
Benchmark
-
Speedup
-
-
-
-
-
-
fannkuch
-
3.96 → 5.37
-
-
-
-
chameneos
-
2.25 → 5.08
-
-
-
-
nsievebits
-
2.90 → 5.05
-
-
-
-
pidigits
-
3.58 → 4.94
-
-
-
-
nbody
-
4.16 → 4.63
-
-
-
-
cheapconcr
-
1.46 → 4.46
-
-
-
-
partialsums
-
1.71 → 3.73
-
-
-
-
fasta
-
2.37 → 2.68
-
-
-
-
cheapconcw
-
1.27 → 2.52
-
-
-
-
revcomp
-
1.45 → 1.92
-
-
-
-
knucleotide
-
1.32 → 1.59
-
-
-
-
-
-All other benchmarks show only minor performance differences.
-
-
-
Summary
-
-These results should give you an idea about what speedup
-you can expect depending on the nature of your Lua code:
-
-
-
-LuaJIT is really good at (floating-point) math and loops
-(mandelbrot, pidigits, spectralnorm, partialsums).
-
-
-Function calls (recursive), vararg calls, table lookups (nbody),
-table iteration and coroutine switching (chameneos, cheapconc)
-are a lot faster than with plain Lua.
-
-
-It's still pretty good for indexed table access (fannkuch, nsieve)
-and string processing (fasta, revcomp, knucleotide).
-But there is room for improvement in a future version.
-
-
-If your application spends most of the time in C code
-you won't see much of a difference (regexdna, sumfile).
-Ok, so write more code in pure Lua. :-)
-
-
-The real speedup may be shadowed by other dominant factors in a benchmark:
-
-
Common parts of the Lua core: e.g. memory allocation
-and GC (binarytrees).
-
Language characteristics: e.g. lack of bit operations (nsievebits).
-
System characteristics: e.g. CPU cache size and memory speed (nsieve).
-
-
-
-
-The best idea is of course to benchmark your own applications.
-Please report any interesting results you may find. Thank you!
-
-LuaJIT has only a single stand-alone executable, called luajit.
-It can be used to run simple Lua statements or whole Lua applications
-from the command line. It has an interactive mode, too.
-
-
-Note: The optimizer is not activated by default because it resides
-in an external module
-(see Installing LuaJIT).
-It's recommended to always use the optimizer, i.e.:luajit -O
-
-
-
Command Line Options
-
-The luajit stand-alone executable is just a slightly modified
-version of the regular lua stand-alone executable.
-It supports the same basic options, too. Please have a look at the
-Manual Page
-for the regular lua stand-alone executable.
-
-
-Two additional options control LuaJIT behaviour:
-
-
-
-j cmd[=value]
-
-This option performs a LuaJIT control command. LuaJIT has a small
-but extensible set of control commands. It's easy to add your own.
-
-
-The command is first searched for in the jit.* library.
-If no matching function is found, a module named jit.<cmd>
-is loaded. The module table must provide a start() function.
-
-
-For the -j cmd form the function is called without an argument.
-Otherwise the value is passed as the first argument (a string).
-
-
-Here are the built-in LuaJIT control commands:
-
-
-
-j on — Turns the JIT engine on (default).
-
-j off — Turns the JIT engine off.
-
-j debug[=level] — Set debug level. See
-jit.debug().
-
-
-The following control commands are loaded from add-on modules:
-
-
-
-j trace[=file] — Trace the progress of the JIT compiler.
-
-j dumphints[=file] — Dump bytecode + hints before compilation.
-
-j dump[=file] — Dump machine code after compilation.
-
-
-
-
-O[level]
-
-This option loads and runs the optimizer module jit.opt.
-The optimizer generates hints for the compiler backend to improve
-the performance of the compiled code. The optimizer slows down
-compilation slightly, but the end result should make up for it
-in almost every case.
-
-
-The -O form sets the default optimizer level, which is
-currently 2 (this may change in future versions
-of LuaJIT).
-
-
-The -Olevel form explicitly sets the optimizer level:
-
-
-
-O0 — disable the optimizer but leave it attached.
-
-O1 — perform standard optimizations (like hints for table lookups).
-
-O2 — like -O1 but also loads jit.opt_inline to enable result hints and inlining for standard library functions.