+Coco is a small extension to get True C Coroutine
+semantics for Lua 5.1.
+
+
+Coco is both available as a stand-alone release and integrated
+into LuaJIT 1.x.
+
+
+The stand-alone release is a patchset against the
+» standard Lua 5.1.4
+distribution. There are no dependencies on LuaJIT. However LuaJIT 1.x
+depends on Coco to allow yielding for JIT compiled functions.
+
+True C coroutine semantics mean you can yield from a coroutine
+across a C call boundary and resume back to it.
+
+
+Coco allows you to use a dedicated C stack for each coroutine.
+Resuming a coroutine and yielding from a coroutine automatically switches
+C stacks, too.
+
+
+In particular you can now:
+
+
+
Yield across all metamethods (not advised for __gc).
+
Yield across iterator functions (for x in func do).
+
Yield across callbacks (table.foreach(), dofile(), ...).
+
Yield across protected callbacks (pcall(), xpcall(), ...).
+
Yield from C functions and resume back to them.
+
+
+Best of all, you don't need to change your Lua or C sources
+and still get the benefits. It's fully integrated into the
+Lua core, but tries to minimize the required changes.
+
+
+
More ...
+
+Please visit the » Download page
+to fetch the current version of the stand-alone package.
+
+
+Coco needs some machine-specific features — please have a look
+at the Portability Requirements.
+
+
+Coco also provides some upwards-compatible
+API Extensions for Lua.
+
+The optional argument cstacksize specifies the size of the
+C stack to allocate for the coroutine:
+
+
+
A default stack size is used if cstacksize is not given
+or is nil or zero.
+
No C stack is allocated if cstacksize is -1.
+
Any other value is rounded up to the minimum size
+(i.e. use 1 to get the minimum size).
+
+
+Important notice for LuaJIT: JIT compiled functions cannot
+yield if a coroutine does not have a dedicated C stack.
+
+
+
olddefault = coroutine.cstacksize([newdefault])
+
+Returns the current default C stack size (may be 0 if the
+underlying context switch method has its own default).
+Sets a new default C stack size if newdefault is present.
+Use 0 to reset it to the default C stack size. Any other
+value is rounded up to the minimum size.
+
+
+
C API extensions
+
+All C API functions are either unchanged or upwards compatible.
+
+
+
int lua_yield(lua_State *L, int nresults)
+
+The semantics for lua_yield() have changed slightly.
+Existing programs should work fine as long as they follow
+the usage conventions from the Lua manual:
+
+
+return lua_yield(L, nresults);
+
+
+Previously lua_yield() returned a 'magic' value (-1) that
+indicated a yield. Your C function had to pass this value
+on to the Lua core and was not called again.
+
+
+Now, if the current coroutine has an associated C stack,
+lua_yield() returns the number of arguments passed back from
+the resume. This just happens to be the right convention for
+returning them as a result from a C function. I.e. if you
+used the above convention, you'll never notice the change.
+
+
+But the results are on the Lua stack when lua_yield()
+returns. So the C function can just continue and process them
+or retry an I/O operation etc. And your whole C stack frame
+(local variables etc.) is still there, too. You can yield from
+anywhere in your C program, even several call levels deeper.
+
+
+Of course all of this only works with Lua+Coco and not with standard Lua.
+
+
+
lua_State *lua_newcthread(lua_State *L, int cstacksize)
+
+This is an (optional) new function that allows you to create
+a coroutine with an associated C stack directly from the C API.
+Other than that it works the same as lua_newthread(L).
+
+
+You have to declare this function as extern
+yourself, since it's not part of the official Lua API.
+This means that a C module that uses this call cannot
+be loaded with standard Lua. This may be intentional.
+
+
+If you want your C module to work with both standard Lua
+and Lua+Coco you can check whether Coco is available with:
+
Fix compilation of the GCC inline assembler code on x64.
+Now works when compiled as C++ code (reported by Jonathan Sauer)
+or with -fPIC (reported by Jim Pryor).
+
Added GCC inline assembler for faster context switching on Sparc.
+Thanks to Takayuki Usui.
+
+
+
Coco 1.1.5 — 2008-10-25
+
+
Upgraded to patch cleanly into Lua 5.1.4.
+
Added GCC inline assembler for faster context switching on x64.
+Thanks to Robert G. Jakabosky.
+
+
+
Coco 1.1.4 — 2008-02-05
+
+
Upgraded to patch cleanly into Lua 5.1.3.
+
Fixed setjmp method for ARM with recent glibc versions.
+Thanks to the LuaTeX developers.
+
Fixed setjmp method for x86 on Mac OS X (rarely used,
+default is GCC inline assembler). Thanks to Jason Toffaletti.
+
+
+
Coco 1.1.3 — 2007-05-24
+
+
Upgraded to patch cleanly into Lua 5.1.2.
+
Merged patch from Zachary P. Landau for a Linux/ARM setjmp method (uClibc and glibc).
+
+
+
Coco 1.1.1 — 2006-06-20
+
+
Upgraded to patch cleanly into Lua 5.1.1.
+
C stacks are deallocated early: when a coroutine ends, and not when
+the coroutine object is collected. This mainly benefits Windows Fibers.
+
Windows threads get the required Fiber context when resuming
+a coroutine and not just on creation.
+
+
+
Coco 1.1.0 — 2006-02-18
+
+
Upgraded to patch cleanly into Lua 5.1 (final).
+
Added GCC inline assembler for context switching on x86 and MIPS32
+[up to 3x faster].
+
New targets for setjmp method:
+Mac OS X/x86, Solaris/x86 and x64 and Linux/MIPS32.
+
Workaround for WinXP problem with GetCurrentFiber().
+
The minimum C stack size has been increased to 32K+4K.
+
Removed lcocolib.c and integrated the (much smaller) changes
+into lbaselib.c.
+Note for embedders: this means you no longer need to call
+luaopen_coco().
+
Optional Valgrind support requires version 3.x.
+Renamed define to USE_VALGRIND.
+
C stacks are now registered with Valgrind.
+
+
+
Coco pre-release 51w6 — 2005-08-09
+
+This is the first pre-release of Coco. It targets Lua 5.1-work6 only
+and is no longer available for download.
+
+Coco needs some machine-specific features which are
+inherently non-portable. Although the coverage is pretty good,
+this means that Coco will probably never be a standard part
+of the Lua core (which is pure ANSI C).
+
+
+
Context Switching Methods
+
+Coco relies on four different machine-specific methods
+for allocating a C stack and switching context.
+The appropriate method is automatically selected at compile time.
+
+
+
GCC Inline Assembler
+
+This method is only available when GCC 3.x/4.x is used
+to compile the source.
+This is the fastest method for context switching, but only available
+for a few CPUs (see below).
+
+
+
Modified setjmp Buffer
+
+This method changes a few fields in the setjmp buffer to
+redirect the next longjmp to a new function with a new stack
+frame. It needs a bit of guesswork and lots of #ifdef's to
+handle the supported CPU/OS combinations, but this is quite
+manageable.
+
+
+This is the fallback method if inline assembler is not available.
+It's pretty fast because it doesn't have to save or restore signals
+(which is slow and generally undesirable for Lua coroutines).
+
+
+
POSIX ucontext
+
+The POSIX calls getcontext, makecontext and switchcontext
+are used to set up and switch between different C stacks.
+Although highly portable and even available for some
+esoteric platforms, it's slower than the setjmp method
+because it saves and restores signals, too (using at least one
+syscall for each context switch).
+
+
+You can force the use of ucontext (instead of setjmp) by enabling
+-DCOCO_USE_UCONTEXT in src/Makefile.
+
+
+
Windows Fibers
+
+This is the standard method to set up and switch between
+different C stacks on Windows. It's available on Windows 98
+and later.
+
+
+None of the other methods work for Windows because OS specific code
+is required to switch exception handling contexts.
+
+
+
Supported Platforms
+
+Coco has support for the following platforms:
+
+
+
+
CPU
+
System
+
Method
+
+
+
x86
(any OS)
gccasm
+
+
x86
Linux
setjmp
+
+
x86
FreeBSD
setjmp
+
+
x86
NetBSD
setjmp
+
+
x86
OpenBSD
setjmp
+
+
x86
Solaris
setjmp
+
+
x86
Mac OS X
setjmp
+
+
x64
(any OS)
gccasm
+
+
x64
Solaris
setjmp
+
+
MIPS32
(any OS)
gccasm
+
+
MIPS32
Linux
setjmp
+
+
ARM
Linux
setjmp
+
+
PPC32
Mac OS X
setjmp
+
+
(any CPU)
POSIX
ucontext
+
+
(any CPU)
Windows
fibers
+
+
+
+
+It should work pretty much anywhere where a correct
+POSIX ucontext implementation is available. It has been tested
+on every systems I could get hold of (e.g. Sparc, PPC32/PPC64,
+IA64, Alpha, HPPA with various operating systems).
+
+
+
Caveats
+
+
+Some older operating systems may have defective ucontext
+implementations because this feature is not widely used. E.g. some
+implementations don't mix well with other C library functions
+like malloc() or with native threads.
+This is really not the fault of Coco — please upgrade your OS.
+
+
+Note for Windows: Please read the explanation for the default
+» Thread Stack Size
+in case you want to create large numbers of Fiber-based coroutines.
+
+
+Note for MinGW/Cygwin: Older releases of GCC (before 4.0) generate
+wrong unwind information when -fomit-frame-pointer is used
+with stdcalls. This may lead to crashes when exceptions are thrown.
+The workaround is to always use two flags:
+-fomit-frame-pointer -maccumulate-outgoing-args.
+
+
+Note for MIPS CPUs without FPU: It's recommended to compile
+all sources with -msoft-float, even if you don't use
+any floating point ops anywhere. Otherwise context switching must
+save and restore FPU registers (which needs to go through
+the slow kernel emulation).
+
+
+To run Coco with » Valgrind
+(a memory debugger) you must add -DUSE_VALGRIND
+to MYCFLAGS and recompile. You will get random errors
+if you don't! Valgrind 3.x or later is required. Earlier versions
+do not work well with newly allocated C stacks.
+
+DynASM is a Dynamic Assembler for code generation
+engines.
+
+
+DynASM has been developed primarily as a tool for
+LuaJIT, but might be useful for other
+projects, too.
+
+
+If you are writing a just-in-time compiler or need to generate
+code on the fly (e.g. for high-performance graphics or other
+CPU-intensive computations), DynASM might be just what you
+are looking for.
+
+
+Please have a look at the list of Features
+to find out whether DynASM could be useful for your project.
+
+Sorry, right now there is no proper documentation available other
+than some Examples and of course
+the source code. The source is well documented, though (IMHO).
+
+
+I may add more docs in case someone actually finds DynASM to be
+useful outside of LuaJIT. If you do, I'd like to
+hear from you, please. Thank you!
+
+
+If you want to check it out please visit the
+» Download page and fetch the most recent
+version of LuaJIT. All you need is in the dynasm directory.
+For some complex examples take a peek at the
+*.dasc and *.dash files in LuaJIT, too.
+
+Note: yes, you usually get the assembler code as comments and proper
+CPP directives to match them up with the source. I've omitted
+them here for clarity. Oh and BTW: the pipe symbols probably
+line up much more nicely in your editor than in a browser.
+
+
+Here 123 is an offset into the action list buffer that
+holds the partially specified machine code. Without going
+into too much detail, the embedded C library implements a
+tiny bytecode engine that takes the action list as input and
+outputs machine code. It basically copies machine code snippets
+from the action list and merges them with the arguments
+passed in by dasm_put().
+
+
+The arguments can be any kind of C expressions. In practical
+use most of them evaluate to constants (e.g. structure offsets).
+Your C compiler should generate very compact code out of it.
+
+
+The embedded C library knows only what's absolutely needed to
+generate proper machine code for the target CPU (e.g. variable
+displacement sizes, variable branch offset sizes and so on).
+It doesn't have a clue about other atrocities like x86 opcode
+encodings — and it doesn't need to. This dramatically
+reduces the minimum required code size to around 2K [sic!].
+
+
+The action list buffer itself has a pretty compact encoding, too.
+E.g. the whole action list buffer for an early version of LuaJIT
+needs only around 3K.
+
+
+
Advanced Features
+
+Here's a real-life example taken from LuaJIT that shows some
+advanced features like type maps, macros and how to access
+C structures:
+
The toolchain is split into a portable subset and
+CPU-specific modules.
+
DynASM itself (the pre-processor) is written in Lua.
+
There is no machine-dependency for the pre-processor itself.
+It should work everywhere you can get Lua 5.1 up and running
+(i.e. Linux, *BSD, Solaris, Windows, ... you name it).
+
+
+
DynASM Assembler Features
+
+
C code and assembler code can be freely mixed.
+Readable, too.
+
All the usual syntax for instructions and operand modes
+you come to expect from a standard assembler.
+
Access to C variables and CPP defines in assembler statements.
+
Access to C structures and unions via type mapping.
+
Convenient shortcuts for accessing C structures.
+
Local and global labels.
+
Numbered labels (e.g. for mapping bytecode instruction numbers).
+
Multiple code sections (e.g. for tailcode).
+
Defines/substitutions (inline and from command line).
+
Conditionals (translation time) with proper nesting.
+
Macros with parameters.
+
Macros can mix assembler statements and C code.
+
Captures (output diversion for code reordering).
+
Simple and extensible template system for instruction definitions.
+
+
+
Restrictions
+
+Currently only a subset of x86 (i386+) instructions is supported.
+Unsupported instructions are either not usable in user-mode or
+are slow on modern CPUs (i.e. not suited for a code generator).
+SSE, SSE2, SSE3 and SSSE3 are fully supported. MMX is not supported.
+
+
+The whole toolchain has been designed to support multiple CPU
+architectures. As LuaJIT gets support for more architectures,
+DynASM will be extended with new CPU-specific modules.
+
+
+The assembler itself will be extended with more features on an
+as-needed basis. E.g. I'm thinking about vararg macros.
+
+
+Note that runtime conditionals are not really needed, since you can
+just use plain C code for that (and LuaJIT does this a lot).
+It's not going to be more (time-) efficient if conditionals are done
+by the embedded C library (maybe a bit more space-efficient).
+
+DynASM —
+a Dynamic Assembler for code generation engines.
+
+
+
+
+
+
More ...
+
+Please click on one of the links in the navigation bar to your left
+to learn more.
+
+
+Click on the Logo in the upper left corner to visit
+the LuaJIT project page on the web. All other links to online
+resources are marked with a '»'.
+
+LuaJIT is a Just-In-Time Compiler for the Lua
+programming language.
+
+
+Lua is a powerful, light-weight programming language designed
+for extending applications. Lua is also frequently used as a
+general-purpose, stand-alone language. More information about
+Lua can be found at: » http://www.lua.org/
+
+
+LuaJIT 1.x is based on the Lua 5.1.x virtual machine and bytecode interpreter
+from lua.org. It compiles bytecode to native x86 (i386+) machine code
+to speed up the execution of Lua programs.
+
+
+LuaJIT depends on Coco to allow yielding
+from coroutines for JIT compiled functions. Coco is part of the
+LuaJIT distribution.
+
+All standard library functions have the same behaviour as
+in the Lua distribution LuaJIT is based on.
+
+
+The Lua loader used by the standard require() library
+function has been modified to turn off compilation of the main
+chunk of a module. The main chunk is only run once when the module
+is loaded for the first time. There is no point in compiling it.
+
+
+You might want to adapt this behaviour if you use your own utility
+functions (and not require()) to load modules.
+
+
+Note that the subfunctions defined in a loaded module are
+of course compiled. See below if you want to override this.
+
+
+
The jit.* Library
+
+This library holds several functions to control the behaviour
+of the JIT engine.
+
+
+
jit.on()
+jit.off()
+
+Turns the JIT engine on (default) or off.
+
+
+These functions are typically used with the command line options
+-j on or -j off.
+
+Enable (with jit.on, default) or disable (with jit.off)
+JIT compilation for a Lua function. The current function (the Lua function
+calling this library function) can be specified with true.
+
+
+If the second argument is true, JIT compilation is also
+enabled/disabled recursively for all subfunctions of a function.
+With false only the subfunctions are affected.
+
+
+Both library functions only set a flag which is checked when
+the function is executed for the first/next time. They do not
+trigger immediate compilation.
+
+
+Typical usage is jit.off(true, true) in the main chunk
+of a module to turn off JIT compilation for the whole module.
+Note that require() already turns off compilation for
+the main chunk itself.
+
+
+
status = jit.compile(func [,args...])
+
+Compiles a Lua function and returns the compilation status.
+Successful compilation is indicated with a nil status.
+Failure is indicated with a numeric status (see jit.util.status).
+
+
+The optimizer pass of the compiler tries to derive hints from the
+passed arguments. Not passing any arguments or passing untypical
+arguments (esp. the wrong types) reduces the efficiency of the
+optimizer. The compiled function will still run, but probably not
+with maximum speed.
+
+
+This library function is typically used for Ahead-Of-Time (AOT)
+compilation of time-critical functions or for testing/debugging.
+
+
+
status = jit.compilesub(func|true [,true])
+
+Recursively compile all subfunctions of a Lua function.
+The current function (the Lua function calling this library function)
+can be specified with true. Note that the function
+itself is not compiled (use jit.compile()).
+
+
+If the second argument is true, compilation will stop
+when the first error is encountered. Otherwise compilation will
+continue with the next subfunction.
+
+
+The returned status is nil, if all subfunctions have been
+compiled successfully. A numeric status (see jit.util.status)
+indicates that at least one compilation failed and gives the status
+of the last failure (this is only helpful when stop on error
+is true).
+
+
+
jit.debug([level])
+
+Set the debug level for JIT compilation. If no level is given,
+the maximum debug level is set.
+
+
+
Level 0 disables debugging: no checks for hooks are compiled
+into the code. This is the default when LuaJIT is started and
+provides the maximum performance.
+
Level 1 enables function call debugging: call hooks and
+return hooks are checked in the function prologue and epilogue.
+This slows down function calls somewhat (by up to 10%).
+
Level 2 enables full debugging: all hooks are checked.
+This slows down execution quite a bit, even when the hooks
+are not active.
+
+
+Note that some compiler optimizations are turned off when
+debugging is enabled.
+
+
+This function is typically used with the command line options
+-j debug or -j debug=level.
+
+
+
jit.attach(handler [, priority])
+
+Attach a handler to the compiler pipeline with the given priority.
+The handler is detached if no priority is given.
+
+
+The inner workings of the compiler pipeline and the API for handlers
+are still in flux. Please see the source code for more details.
+
+
+
jit.version
+
+Contains the LuaJIT version string.
+
+
+
jit.version_num
+
+Contains the version number of the LuaJIT core. Version xx.yy.zz
+is represented by the decimal number xxyyzz.
+
+
+
jit.arch
+
+Contains the target architecture name (CPU and optional ABI).
+
+
+
+
The jit.util.* Library
+
+This library holds many utility functions used by the provided
+extension modules for LuaJIT (e.g. the optimizer). The API may
+change in future versions.
+
+
+
stats = jit.util.stats(func)
+
+Retrieves information about a function. Returns nil
+for C functions. Returns a table with the following fields for
+Lua functions:
+
+
+
status: numeric compilation status (see jit.util.status).
+
stackslots: number of stack slots.
+
params: number of fixed parameters (arguments).
+
consts: number of constants.
+
upvalues: number of upvalues.
+
subs: number of subfunctions (sub prototypes).
+
bytecodes: number of bytecode instructions.
+
isvararg: fixarg (false) or vararg (true) function.
+
env: function environment table.
+
mcodesize: size of the compiled machine code.
+
mcodeaddr: start address of the compiled machine code.
+
+
+mcodesize and mcodeaddr are not set if the
+function has not been compiled (yet).
+
+
+
op, a, b, c, test = jit.util.bytecode(func, pc)
+
+Returns the fields of the bytecode instruction at the given pc
+for a Lua function. The first instruction is at pc = 1.
+Nothing is returned if pc is out of range.
+
+
+The opcode name is returned as an uppercase string in op.
+The opcode arguments are returned as a, b and
+optionally c. Arguments that indicate an index into the
+array of constants are translated to negative numbers (the first
+constant is referred to with -1). Branch targets are signed numbers
+relative to the next instruction.
+
+
+test is true if the instruction is a test (i.e. followed
+by a JMP).
+
+
+
const, ok = jit.util.const(func, idx)
+
+Returns a constant from the array of constants for a Lua function.
+ok is true if idx is in range. Otherwise nothing
+is returned.
+
+
+Constants are numbered starting with 1. A negative idx
+is mapped to a positive index.
+
+
+
upvalue, ok = jit.util.upvalue(func, idx)
+
+Returns an upvalue from the array of upvalues for a Lua function.
+ok is true if idx is in range. Otherwise nothing
+is returned. Upvalues are numbered starting with 0.
+
+
+
nup = jit.util.closurenup(func, idx)
+
+Returns the number of upvalues for the subfunction prototype with
+the given index idx for a Lua function. Nothing is returned
+if idx is out of range. Subfunctions are numbered starting
+with 0.
+
+Returns the numeric start address, the compiled machine code
+(converted to a string) and an iterator for the machine code fragment map
+for the specified machine code block associated with a Lua function.
+
+
+Returns nil and a numeric status code (see jit.util.status)
+if the function has not been compiled yet or compilation has failed
+or compilation is disabled. Returns nothing if the selected
+machine code block does not exist.
+
+
+The machine code fragment map is used for debugging and error handling.
+The format may change between versions and is an internal implementation
+detail of LuaJIT.
+
+
+
addr [, mcode] = jit.util.jsubmcode([idx])
+
+If idx is omitted or nil:
+Returns the numeric start address and the compiled machine code
+(converted to a string) for internal subroutines used by the
+compiled machine code.
+
+
+If idx is given:
+Returns the numeric start address of the machine code for a specific
+internal subroutine (0 based). Nothing is returned if idx is
+out of range.
+
+
+
jit.util.status
+
+This is a table that bidirectionally maps status numbers and
+status names (strings):
+
+
+
+
Status Name
Description
+
OK
Ok, code has been compiled.
+
NONE
Nothing analyzed or compiled, yet (default).
+
OFF
Compilation disabled for this function.
+
ENGINE_OFF
JIT engine is turned off.
+
DELAYED
Compilation delayed (recursive invocation).
+
TOOLARGE
Bytecode or machine code is too large.
+
COMPILER_ERROR
Error from compiler frontend.
+
DASM_ERROR
Error from DynASM engine.
+
+
+
+
jit.util.hints
+jit.util.fhints
+
+These two tables map compiler hint names to internal hint numbers.
+
+
+The hint system is an internal implementation detail of LuaJIT.
+Please see the source code for more info.
+
Remove a (sometimes) wrong assertion in luaJIT_findpc().
+
DynASM now allows labels for displacements and .aword.
+
Fix some compiler warnings for DynASM glue (internal API change).
+
Correct naming for SSSE3 (temporarily known as SSE4) in DynASM and x86 disassembler.
+
The loadable debug modules now handle redirection to stdout
+(e.g. -j trace=-).
+
+
+
LuaJIT 1.1.2 — 2006-06-24
+
+
Fix MSVC inline assembly: use only local variables with
+lua_number2int().
+
Fix "attempt to call a thread value" bug on Mac OS X:
+make values of consts used as lightuserdata keys unique
+to avoid joining by the compiler/linker.
The C stack is kept 16 byte aligned (faster).
+Mandatory for Mac OS X on Intel, too.
+
Faster calling conventions for internal C helper functions.
+
Better instruction scheduling for function prologue, OP_CALL and
+OP_RETURN.
+
+
+
Miscellaneous optimizations:
+
+
Faster loads of FP constants. Remove narrow-to-wide store-to-load
+forwarding stalls.
+
Use (scalar) SSE2 ops (if the CPU supports it) to speed up slot moves
+and FP to integer conversions.
+
Optimized the two-argument form of OP_CONCAT (a..b).
+
Inlined OP_MOD (a%b).
+With better accuracy than the C variant, too.
+
Inlined OP_POW (a^b). Unroll x^k or
+use k^x = 2^(log2(k)*x) or call pow().
+
+
+
Changes in the optimizer:
+
+
Improved hinting for table keys derived from table values
+(t1[t2[x]]).
+
Lookup hinting now works with arbitrary object types and
+supports index chains, too.
+
Generate type hints for arithmetic and comparison operators,
+OP_LEN, OP_CONCAT and OP_FORPREP.
+
Remove several hint definitions in favour of a generic COMBINE hint.
+
Complete rewrite of jit.opt_inline module
+(ex jit.opt_lib).
+
+
+
Use adaptive deoptimization:
+
+
If runtime verification of a contract fails, the affected
+instruction is recompiled and patched on-the-fly.
+Regular programs will trigger deoptimization only occasionally.
+
This avoids generating code for uncommon fallback cases
+most of the time. Generated code is up to 30% smaller compared to
+LuaJIT 1.0.3.
+
Deoptimization is used for many opcodes and contracts:
+
+
OP_CALL, OP_TAILCALL: type mismatch for callable.
+
Inlined calls: closure mismatch, parameter number and type mismatches.
+
OP_GETTABLE, OP_SETTABLE: table or key type and range mismatches.
+
All arithmetic and comparison operators, OP_LEN, OP_CONCAT,
+OP_FORPREP: operand type and range mismatches.
+
+
Complete redesign of the debug and traceback info
+(bytecode ↔ mcode) to support deoptimization.
+Much more flexible and needs only 50% of the space.
+
The modules jit.trace, jit.dumphints and
+jit.dump handle deoptimization.
+
+
+
Inlined many popular library functions
+(for commonly used arguments only):
+
+
Most math.* functions (the 18 most used ones)
+[2x-10x faster].
+
string.len, string.sub and string.char
+[2x-10x faster].
+
table.insert, table.remove and table.getn
+[3x-5x faster].
+
coroutine.yield and coroutine.resume
+[3x-5x faster].
+
pairs, ipairs and the corresponding iterators
+[8x-15x faster].
+
+
+
Changes in the core and loadable modules and the stand-alone executable:
+
+
Added jit.version, jit.version_num
+and jit.arch.
+
Reorganized some internal API functions (jit.util.*mcode*).
+
The -j dump output now shows JSUB names, too.
+
New x86 disassembler module written in pure Lua. No dependency
+on ndisasm anymore. Flexible API, very compact (500 lines)
+and complete (x87, MMX, SSE, SSE2, SSE3, SSSE3, privileged instructions).
+
luajit -v prints the LuaJIT version and copyright
+on a separate line.
+
+
+
Added SSE, SSE2, SSE3 and SSSE3 support to DynASM.
+
Miscellaneous doc changes. Added a section about
+embedding LuaJIT.
+LuaJIT is a rather complex application. There will undoubtedly
+be bugs lurking in there. You have been warned. :-)
+
+
+If you came here looking for information on how to debug
+your application (and not LuaJIT itself) then please
+check out jit.debug()
+and the -j debug
+command line option.
+
+
+But if you suspect a problem with LuaJIT itself, then try
+any of the following suggestions (in order).
+
+
+
Is LuaJIT the Problem?
+
+Try to run your application in several different ways:
+
+
+
luajit app.lua
+
luajit -O1 app.lua
+
luajit -O app.lua
+
luajit -j off app.lua
+
lua app.lua (i.e. with standard Lua)
+
+
+If the behaviour is the same as with standard Lua then ...
+well ... that's what LuaJIT is about: doing the same things,
+just faster. Even bugs fly faster. :-)
+
+
+So this is most likely a bug in your application then. It may be easier
+to debug this with plain Lua — the remainder of this page
+is probably not helpful for you.
+
+
+But if the behaviour is different, there is some likelihood
+that you caught a bug in LuaJIT. Oh dear ...
+
+
+Ok, so don't just give up. Please read on and help the community
+by finding the bug. Thank you!
+
+Please check if a newer version is available. Maybe the bug
+you have encountered has been fixed already. Always download the
+latest version and try it with your application before continuing.
+
+
+
Reproduce the Bug
+
+First try to make the bug reproducible. Try to isolate the module
+and the function the bug occurs in:
+
+
+Either selectively turn off compilation for some modules with
+ jit.off(true, true)
+until the bug disappears ...
+
+
+And/or turn the whole JIT engine off and selectively compile
+functions with
+ jit.compile(func)
+until it reappears.
+
+
+If you have isolated the point where it happens, it's most helpful
+to reduce the affected Lua code to a short code snippet that
+still shows the problem. You may need to print() some
+variables until you can pinpoint the exact spot where it happens.
+
+
+If you've got a reproducible and short test
+you can either send it directly to me or the mailing list
+(see the Contact Information)
+or you can try to debug this a bit further.
+
+
+Well — if you are brave enough. :-)
+
+
+
Look at the Generated Code
+
+You may want to have a look at the output of -j dumphints
+first. Try to change things around until you can see which hint
+or which instruction is the cause of the bug. If you suspect
+an optimizer bug then have a look at the backend (*.das[ch])
+and check how the hint is encoded.
+
+
+Otherwise have a look at -j dump and see whether
+you can spot the problem around the affected instruction.
+It's helpful to have a good knowledge of assembler, though
+(sorry).
+
+
+
Locate a Crash
+
+If you get a crash, you should compile LuaJIT with debugging
+turned on:
+
+
+Add -g to CFLAGS and MYLDFLAGS
+or whatever is needed to turn on debugging. For Windows you
+need both an executable and a DLL built with debugging.
+
+
+Then start LuaJIT with your debugger. Run it with
+-j dump=test.dump.
+
+
+Have a look at the backtrace and compare it with the generated
+dump file to find out exactly where it crashes. I'm sorry, but
+symbols or instructions for JIT compiled functions are not
+displayed in your debugger (this is really hard to solve).
+
+
+
Turn on Assertions
+
+Another way to debug LuaJIT is to turn on assertions.
+They can be turned on only for the JIT engine by adding
+-DLUAJIT_ASSERT to JITCFLAGS in src/Makefile.
+Then recompile with make clean and make.
+
+
+Add these two lines to src/luaconf.h to turn on all assertions in the Lua core:
+ #include <assert.h>
+ #define lua_assert(x) assert(x)
+This turns on the JIT engine assertions, too.
+Recompile and see whether any assertions trigger.
+Don't forget to turn off the (slow) assertions when you're done!
+
+
+
Use Valgrind
+
+A tremendously useful (and free) tool for runtime code analysis
+is » Valgrind. Regularly
+run your applications with valgrind --memcheck and
+your life will be better.
+
+
+To run LuaJIT under Valgrind you must add
+-DUSE_VALGRIND to MYCFLAGS
+and recompile LuaJIT. You will get random errors if you don't!
+Valgrind 3.x or later is required. Earlier versions
+do not work well with newly allocated C stacks.
+
+
+An executable built with this option runs fine without Valgrind
+and without a performance loss. But it needs the Valgrind header
+files for compilation (which is why it's not enabled by default).
+
+
+It's helpful to compile LuaJIT with debugging turned on, too
+(see above).
+
+
+If Valgrind spots many invalid memory accesses that involve
+memory allocation/free functions you've probably found a bug
+related to garbage collection. Some object reference must have
+gone astray.
+
+
+Try to find out which object is disappearing. You can force
+eager garbage collection with repeated calls to
+collectgarbage() or by setting a very low threshold
+with collectgarbage("setpause", 1).
+
+
+
Don't Despair
+
+If all of this doesn't help to find the bug, please send
+a summary of your findings to the mailing list. Describe as much
+of the circumstances you think are relevant.
+
+
+Please don't send your whole application to me
+(without asking first) and especially not to the mailing list.
+Code snippets should preferrably be less than 50 lines and
+up to the point.
+
+
+All bug reports are helpful, even if no immediate solution
+is available. Often enough someone else finds the same bug
+in a different setting and together with your bug report
+this may help to track it down.
+
+
+Finally I have to say a BIG THANK YOU
+to everyone who has helped to make LuaJIT better by finding
+and fixing bugs!
+
Adaptive deoptimization is used to recompile individual bytecode
+instructions with broken contracts. This avoids generating code for the
+generic fallback cases most of the time (faster compilation, reduced
+I-cache contention).
+
Special CPU features (such as conditional moves or SSE2)
+are automatically used when detected.
+
+
+The JIT compiler is very fast:
+
+
+
Compilation times vary a great deal (depending on the nature of
+the function to be compiled) but are generally in the
+microsecond range.
+
Even compiling large functions (hundreds of lines) with the
+maximum optimization level takes only a few milliseconds in the
+worst case.
+
+
+LuaJIT is very small:
+
+
+
The whole JIT compiler engine adds only around 32K
+of code to the Lua core (if compiled with -Os).
+
The optimizer is split into several optional modules that
+can be loaded at runtime if requested.
+
LuaJIT adds around 6.000 lines of C and assembler code and
+2.000 lines of Lua code to the Lua 5.1 core (17.000 lines of C).
+
Required build tools (DynASM)
+take another 2.500 lines of Lua code.
+
+
+
Compatibility
+
+LuaJIT is designed to be fully compatible with Lua 5.1.
+It accepts the same source code and/or precompiled bytecode.
+It supports all standard language semantics. In particular:
+
+
+
All standard types, operators and metamethods are supported.
+
Implicit type coercions (number/string) work as expected.
+
Full IEEE-754 semantics for floating point arithmetics
+(NaN, +-Inf, +-0, ...).
+
Full support for lexical closures.
+Proper tail calls do not consume a call frame.
No changes to the Lua 5.1 incremental garbage collector.
+
No changes to the standard Lua/C API.
+
Dynamically loaded C modules are link compatible with Lua 5.1
+(same ABI).
+
LuaJIT can be embedded
+into an application just like Lua.
+
+
+Some minor differences are related to debugging:
+
+
+
Debug hooks are only called if debug code generation is enabled.
+
There is no support for tailcall counting in JIT compiled code.
+HOOKTAILRET is not called, too. Note: this won't affect you unless
+you are writing a Lua debugger. *
+
+
+* There is not much I can do to improve this situation without undue
+complications. A suggestion to modify the behaviour of standard Lua
+has been made on the mailing list (it would be beneficial there, too).
+
+
+
Restrictions
+
+
Only x86 (i386+) CPUs are supported right now (but see below).
+
Only the default type for lua_Number is supported
+(double).
+
The interrupt signal (Ctrl-C) is ignored unless you enable
+debug hooks (with -j debug). But this will seriously
+slow down your application. I'm looking for better ways to handle
+this. In the meantime you have to press Ctrl-C twice to interrupt
+a currently running JIT compiled function (just like C functions).
+
GDB, Valgrind and other debugging tools can't report symbols
+or stack frames for JIT compiled code. This is rather difficult to solve.
+Have a look at Debugging LuaJIT, too.
+
+
+
Caveats
+
+
LuaJIT allocates executable memory for the generated machine code
+if your OS has support for it: either HeapCreate() for Windows or
+mmap() on POSIX systems.
+The fallback is the standard Lua allocator (i.e. malloc()).
+But this usually means the allocated memory is not marked executable.
+Running compiled code will trap on CPUs/OS with the NX (No eXecute)
+extension if you can only use the fallback.
+
DynASM is needed to regenerate the
+ljit_x86.h file. But only in case you want to modify
+the *.dasc/*.dash files. A pre-processed *.h
+file is supplied with LuaJIT.
+DynASM is written in Lua and needs a plain copy of Lua 5.1
+(installed as lua). Or you can run it with LuaJIT built from
+the *.h file supplied with the distribution (modify
+DASM= in src/Makefile). It's a good idea to install
+a known good copy of LuaJIT under a different name for this.
+
LuaJIT ships with LUA_COMPAT_VARARG turned off.
+I.e. the implicit arg parameter is not created anymore.
+Please have a look at the comments in luaconf.h for
+this configuration option. You can turn it on, if you really need it.
+Or better yet, convert your code to the new Lua 5.1 vararg syntax.
+LuaJIT is not much more difficult to install than Lua itself.
+Just unpack the distribution file, change into the newly created
+directory and follow the instructions below.
+
+
+For the impatient: make linux && sudo make install
+Replace linux with e.g. bsd or macosx depending on your OS.
+
+
+In case you've missed this in Features:
+LuaJIT only works on x86 (i386+) systems right now. Support for
+other architectures may be added in future versions.
+
+
+
Configuring LuaJIT
+
+LuaJIT is (deliberately) not autoconfigured — the
+defaults should work fine on most systems. But please check the
+system-specific instructions below.
+
+
+The following three files hold all configuration information:
+
+
+
Makefile holds settings for installing LuaJIT.
+
src/Makefile holds settings for compiling LuaJIT.
+
src/luaconf.h sets a multitude of configuration
+variables.
+
+
+If this is your first build then it's better not to give into
+the temptation to tweak every little setting. The standard
+configuration provides sensible defaults (IMHO).
+
+
+One particular setting you might want to change is the installation
+path. Note that you need to modify both the top-level Makefile
+and src/luaconf.h (right at the start) to take
+effect.
+
+
+If you have trouble getting Coco to work, you can disable it by
+uncommenting the COCOFLAGS= -DCOCO_DISABLE line in
+src/Makefile. But note that this effectively disables
+yielding from coroutines for JIT compiled functions.
+
+
+A few more settings need to be changed if you want to
+Debug LuaJITitself.
+Application debugging can be turned on/off at runtime.
+
+
+
Upgrading From Previous Versions
+
+It's important to keep the LuaJIT core and the add-on modules in sync.
+Be sure to delete any old versions of LuaJIT modules from the
+Lua module search path (check the current directory, too!).
+
+
+Lua files compiled to bytecode may be incompatible if the underlying
+Lua core has changed (like from Lua 5.1 alpha to Lua 5.1
+final between LuaJIT 1.0.3 and LuaJIT 1.1.0). The same
+applies to any
+» loadable C modules
+(shared libraries, DLLs) which need to be recompiled with the new
+Lua header files.
+
+
+Compiled bytecode and loadable C modules are fully compatible and
+can be freely exchanged between LuaJIT and the same
+version of Lua it is based on. Please verify that LUA_RELEASE
+in src/lua.h is the same in both distributions.
+
+
+
Building LuaJIT
+
+
Makefile Targets
+
+The Makefiles have a number of targets for various operating systems:
+
+You may want to enable interactive line editing for the stand-alone
+executable. There are extra targets for Linux, BSD and Mac OS X:
+make linux_rl, make bsd_rl
+and make macosx_rl.
+
+
+
MSVC (Win32)
+
+First check out etc\luavs.bat if it suits your needs. Then try
+running it from the MSVC command prompt (start it from the toplevel directory).
+
+
+Another option is to set up your own MSVC project:
+
+
+Change to the src directory
+and create a new DLL project for lua51.dll.
+Add all C files to it except for lua.c, luac.c
+and print.c. Add the ..\dynasm directory
+to the include path and build the DLL.
+
+
+Next create a new EXE project for luajit.exe.
+Add lua.c to it and link with the import library
+lua51.lib created for lua51.dll. Build
+the executable.
+
+
+
Installation
+
+
POSIX systems
+
+Run make install from the top-level directory.
+You probably need to be the root user before doing so, i.e. use
+sudo make install or su - root
+before the make install.
+
+
+By default this installs only:
+ /usr/local/bin/luajit — The stand-alone executable.
+ /usr/local/lib/lua/5.1 — C module directory.
+ /usr/local/share/lua/5.1 — Lua module directory.
+ /usr/local/share/lua/5.1/jit/*.lua —
+jit.* modules.
+
+
+The Lua docs and includes are not installed to avoid overwriting
+an existing Lua installation. In any case these are identical
+to the version of Lua that LuaJIT is based on. If you want
+to install them, edit the top-level makefile (look for ###).
+
+
+The stand-alone Lua bytecode compiler luac is neither
+built nor installed, for the same reason. If you really need it,
+you may be better off with luac built from the original Lua
+distribution (use the same version your copy of LuaJIT
+is based on). This avoids dragging in most of LuaJIT which is not
+needed for the pure bytecode compiler. You can also use the bare-bones
+Lua to bytecode translator luac.lua (look in the test
+directory of the original Lua distribution).
+
+
+
Windows
+
+Copy luajit.exe and lua51.dll
+to a newly created directory (any location is ok). Add lua
+and lua\jit directories below it and copy all Lua files
+from the jit directory of the distribution to the latter directory.
+
+
+There are no hardcoded
+absolute path names — all modules are loaded relative to the
+directory where luajit.exe is installed
+(see src/luaconf.h).
+
+
+
Embedding LuaJIT
+
+It's strongly recommended that you build the stand-alone executable
+with your toolchain and verify that it works before starting
+to embed LuaJIT into an application. The stand-alone executable is
+also useful later on, when you want to experiment with code snippets
+or try out some Lua files.
+
+
+Please consult the Lua docs for general information about how to
+embed Lua into your application. The following list only shows
+the additional steps needed for embedding LuaJIT:
+
+
+
You need to add the LuaJIT library functions by running
+luaopen_jit() after all the other standard library functions.
+The modified src/linit.c used by the stand-alone executable
+already does this for you.
+
Caveat: LuaJIT is based on Lua 5.1 which
+means the luaopen_*() functions must not
+be called directly. See src/linit.c for the proper way to
+run them. You'll get an error initializing the io library
+if you don't follow these instructions.
+
To use the optimizer (strongly recommended) you need to:
+
+
Install the optimizer modules jit.opt and
+jit.opt_inline relative to the Lua module path
+(you've probably modified it — see src/luaconf.h):
+jit/opt.lua
+jit/opt_inline.lua
+
If you want to ship a single executable then you may want to
+embed the optimizer modules into your application (but don't loose
+time with this during the early development phase). This involves:
+
+
Compile the two modules to bytecode
+(using luac -s from a plain Lua installation).
+
Convert them to C include files (search for "Lua bin2c").
+
On Windows you can also put the compiled bytecode into a resource
+(search for "Lua bin2res").
+
Load the bytecode with luaL_loadbuffer (but don't run it).
+
Put the resulting functions into package.preload["jit.opt"]
+and package.preload["jit.opt_inline"].
+
+
Activate the LuaJIT optimizer from Lua code to be run at startup:
+ require("jit.opt").start()
+Or use equivalent C code. See dojitopt() in src/lua.c.
+
+
All other LuaJIT specific modules (jit.*) are for debugging only.
+They do not need to be shipped with an application. But they may be quite
+useful, anyway (especially jit.trace).
+
DynASM is only needed while building LuaJIT. It's not
+needed while running LuaJIT and there is no point in shipping or
+installing it together with an application.
+
In case you want to strip some of the standard libraries from
+your application: The optimizer modules need several functions from
+the base library and the string library (and of course the LuaJIT
+core libraries). The io library is only used to print a fatal error
+message (you may want to replace it). The optional modules
+for debugging depend on a few more library functions —
+please check the source.
+
+
+Although the very liberal LuaJIT
+» license
+does not require any acknowledgment whatsoever, it would be appreciated
+if you give some credit in the docs (or the "About" box) of your application.
+A simple line like:
+ This product includes LuaJIT, http://luajit.org/
+would be nice. Please do not include any E-Mail addresses. Thank you!
+
+
+I'm always interested where LuaJIT can be put to good use in applications.
+Please tell me
+or better yet write a few lines about your project to the
+» Lua mailing list.
+Thank you!
+
+This is a little essay that tries to answer the question:
+'So, how does LuaJIT really work?'.
+
+
+I tried to avoid going into all the gory details, but at the
+same time provide a deep enough explanation, to let you find
+your way around LuaJIT's inner workings.
+
+
+The learning curve is maybe a little bit steep for newbies and
+compiler gurus will certainly fall asleep after two paragraphs.
+It's difficult to strike a balance here.
+
+
+
Acronym Soup
+
+As the name says LuaJIT is a Just-In-Time (JIT) compiler.
+This means that functions are compiled on demand, i.e. when they
+are run first. This ensures both a quick application startup
+and helps to avoid useless work, too. E.g. unused functions
+are not compiled at all.
+
+
+The other alternative is known as Ahead-Of-Time (AOT)
+compilation. Here everything is compiled before running any function.
+This is the classic way for many languages, such as C or C++.
+
+
+In fact plain Lua allows you to pre-compile Lua source code into
+Lua bytecode and store it in a binary file that can be run
+later on. This is used only in specific settings (e.g. memory limited
+embedded systems), because the Lua bytecode compiler is really fast.
+The ability to run source files right away is part of what makes
+a dynamic language (aka scripting language) so powerful.
+
+
+JIT compilation has a few other advantages for dynamic languages
+that AOT compilation can only provide with a massive amount
+of code analysis. More can be found in the literature.
+One particular advantage is explained later.
+
+
+
Quick, JIT — Run!
+
+JIT compilation happens mostly invisible. You'll probably never
+notice that a compilation is going on. Part of the secret is
+that everything happens in little pieces intermixed with running
+the application itself inbetween. The other part of the secret
+is that JIT compilation can be made pretty fast.
+
+
+Most applications quickly converge to a stable state where
+everything that really needs to be compiled is compiled
+right away. Only occasional isolated compiles happen later on.
+
+
+Even though the name doesn't suggest it, LuaJIT can operate
+in AOT mode, too. But this is completely under user control
+(see jit.compile())
+and doesn't happen automatically.
+
+
+Unless you have good reason to suspect that AOT compilation
+might help for a specific application, I wouldn't bother though.
+Compilation speed is usually a non-argument, because LuaJIT
+is extremely fast. Compilation times are typically in the
+microsecond range for individual Lua functions.
+
+
+
Starting Up
+
+The next few paragraphs may not be exactly breaking news to you,
+if you are familiar with JIT compilers. Still, please read on,
+because some terms are introduced that are used later on.
+
+
+When you start LuaJIT everything proceeds like in standard Lua:
+the Lua core is initialized, the standard libraries are loaded and
+the command line is analyzed. Then usually the first Lua source
+code file is loaded and is translated to Lua bytecode. And finally
+the function for the initial main chunk is run ...
+
+
+
Kicking the Compiler
+
+This is where LuaJIT kicks in:
+
+
+All Lua functions carry an additional status code for LuaJIT.
+Initially this is set to 'NONE', i.e. the function has not been
+looked at (yet). If a function is run with this setting,
+the LuaJIT compiler pipeline is started up.
+
+
+If you haven't loaded any special LuaJIT modules and optimization
+is not turned on, the compiler pipeline only consists of the
+compiler backend.
+
+
+The compiler backend is the low-level encoding engine that translates
+bytecode instructions to machine code instructions. Without any
+further hints from other modules, the backend more or less does a
+1:1 translation. I.e. a single variant of a bytecode instruction
+corresponds to a single piece of machine code.
+
+
+If all goes well, these little code pieces are put together,
+a function prologue is slapped on and voila: your Lua function
+has been translated to machine code. Of course things are not
+that simple when you look closer, but hey — this is
+the theory.
+
+
+Anyway, the status code for the function is set to 'OK' and the
+machine code is run. If this function runs another Lua function
+which has not been compiled, that one is compiled, too. And so on.
+
+
+
Call Gates
+
+Ok, so what happens when a function is called repeatedly? After all
+this is the most common case.
+
+
+Simple: The status code is checked again. This time it's set to 'OK',
+so the machine code can be run directly. Well — that's not the
+whole truth: for calls that originate in a JIT compiled function
+a better mechanism, tentatively named call gates is used.
+
+
+Every function has a call gate field (a function pointer). By default
+it's set to a function that does the above checks and runs the
+compiler. But as soon as a function is compiled, the call gate
+is modified to point to the just compiled machine code.
+
+
+Calling a function is then as easy as calling the code that the
+call gate points to. But due to special (faster) calling conventions
+this function pointer cannot be used directly from C. So calls from
+a non-compiled function or from a C function use an extra entry
+call gate which in turn calls the real call gate. But this is
+really a non-issue since most calls in typical applications
+are intra-JIT calls.
+
+
+
The Compiler Pipeline
+
+The compiler pipeline has already been mentioned. This sounds
+more complicated than it is. Basically this is a coroutine that
+runs a frontend function which in turn calls all functions
+from the pipeline table.
+
+
+The pipeline table is sorted by priorities. The standard
+backend has priority 0. Positive priorities are run before the
+backend and negative priorities are run after the backend. Modules
+can dynamically attach or detach themselves to the pipeline with
+the library function jit.attach().
+
+
+So a typical optimizer pass better have a positive priority,
+because it needs to be run before the backend is run. E.g. the
+LuaJIT optimizer module registers itself with priority 50.
+
+
+On the other hand a typical helper module for debugging —
+a machine code disassembler — needs to be run after the
+backend and is attached with a negative priority.
+
+
+One special case occurs when compilation fails. This can be due to
+an internal error (ouch) or on purpose. E.g. the optimizer module
+checks some characteristics of the function to be compiled and
+may decide that it's just not worth it. In this case a status
+other than OK is passed back to the pipeline frontend.
+
+
+The easiest thing would be to abort pipeline processing and just
+give up. But this would remove the ability to trace the progress
+of the compiler (which better include failed compilations, too).
+So there is a special rule that odd priorities are still run,
+but even priorities are not. That's why e.g. -j trace
+registers itself with priority -99.
+
+
+
The Optimizer
+
+Maybe it hasn't become clear from the above description,
+but a module can attach any Lua or C function to the compiler
+pipeline. In fact all of the loadable modules are Lua modules.
+Only the backend itself is written in C.
+
+
+So, yes — the LuaJIT optimizer is written in pure Lua!
+
+
+And no, don't worry, it's quite fast. One reason for this is
+that a very simple abstract interpretation algorithm
+is used. It mostly ignores control flow and/or basic block
+boundaries.
+
+
+Thus the results of the analysis are really only hints.
+The backend must check the preconditions (the contracts)
+for these hints (e.g. the object type). Still, the generated
+hints are pretty accurate and quite useful to speed up the
+compiled code (see below).
+
+
+Explaining how abstract interpretation works is not within the
+scope for this short essay. You may want to have a look at the
+optimizer source code and/or read some articles or books on
+this topic. The canonical reference is
+» Principles of Program Analysis.
+Ok, so this one is a bit more on the theoretical side (a gross
+understatement). Try a search engine with the keywords "abstract
+interpretation", too.
+
+
+Suffice to say the optimizer generates hints and passes these
+on to the backend. The backend then decides to encode different
+forms for the same bytecode instruction, to combine several
+instructions or to inline code for C functions. If the hints
+from the optimizer are good, the resulting code will perform
+better because shorter code paths are used for the typical cases.
+
+
+
The JIT Advantage
+
+One important feature of the optimizer is that it takes 'live'
+function arguments into account. Since the JIT compiler is
+called just before the function is run, the arguments for this
+first invocation are already present. This can be used to great
+advantage in a dynamically typed language, such as Lua.
+
+
+Here's a trivial example:
+
+
+function foo(t, k)
+ return t[k]
+end
+
+
+Without knowing the most likely arguments for the function
+there's not much to optimize.
+
+
+Ok, so 't' is most likely a table. But it could be userdata, too.
+In fact it could be any type since the introduction of generic
+metatables for types.
+
+
+And more importantly 'k' can be a number, a string
+or any other type. Oh and let's not forget about metamethods ...
+
+
+If you know a bit about Lua internals, it should be clear by now
+that the code for this function could potentially branch to half
+of the Lua core. And it's of course impossible to inline all
+these cases.
+
+
+On the other hand if it's known (or there's a good hint)
+that 't' is a table and that 'k' is a positive integer, then there
+is a high likeliness that the key 'k' is in the array part
+of the table. This lookup can be done with just a few machine code
+instructions.
+
+
+Of course the preconditions for this fast path have to be checked
+(unless there are definitive hints). But if the hints are right,
+the code runs a lot faster (about a factor of 3 in this case
+for the pure table lookup).
+
+
+
Optimizing the Optimizer
+
+A question that surely popped up in your mind while reading
+the above section: does the optimizer optimize itself? I.e.
+is the optimizer module compiled?
+
+
+The current answer is no. Mainly because the compiler pipeline
+is single-threaded only. It's locked during compilation and
+any parallel attempt to JIT compile a function results in
+a 'DELAYED' status code. In fact all modules that attach to
+the compiler pipeline disable compilation for the entire
+module (because LuaJIT would do that anyway). The main chunk
+of modules loaded with require() is never compiled,
+so there is no chicken-and-egg problem here.
+
+
+Of course you could do an AOT compilation in the main chunk of
+the optimizer module. But then only with the plain backend.
+Recompiling it later on with the optimizer attached doesn't work,
+because a function cannot be compiled twice (I plan to lift
+this restriction).
+
+
+The other question is whether it pays off to compile the optimizer
+at all? Honestly, I haven't tried, because the current optimizer
+is really simple. It runs very quickly, even under the bytecode
+interpreter.
+
+
+
That's All Folks
+
+Ok, that's all for now. I'll extend this text later on with
+new topics that come up in questions. Keep on asking these
+on the mailing list if you are interested.
+
+As is always the case with benchmarks, care must be taken to
+interpret the results:
+
+
+First, the standard Lua interpreter is already very fast.
+It's commonly the fastest of it's class (interpreters) in the
+» Great Computer Language Shootout.
+Only true machine code compilers get a better overall score.
+
+
+Any performance improvements due to LuaJIT can only be incremental.
+You can't expect a speedup of 50x if the fastest compiled language
+is only 5x faster than interpreted Lua in a particular benchmark.
+LuaJIT can't do miracles.
+
+
+Also please note that most of the benchmarks below are not
+trivial micro-benchmarks, which are often cited with marvelous numbers.
+Micro-benchmarks do not realistically model the performance gains you
+can expect in your own programs.
+
+
+It's easy to make up a few one-liners like:
+ local function f(...) end; for i=1,1e7 do f() end
+This is more than 30x faster with LuaJIT. But you won't find
+this in a real-world program.
+
+
+
Measurement Methods
+
+All measurements have been taken on a Pentium III 1.139 GHz
+running Linux 2.6. Both Lua and LuaJIT have been compiled with
+GCC 3.3.6 with -O3 -fomit-frame-pointer.
+You'll definitely get different results on different machines or
+with different C compiler options. *
+
+
+The base for the comparison are the user CPU times as reported by
+/usr/bin/time. The runtime of each benchmark is parametrized
+and has been adjusted to minimize the variation between several runs.
+The ratio between the times for LuaJIT and Lua gives the speedup.
+Only this number is shown because it's less dependent on a specific system.
+
+
+E.g. a speedup of 6.74 means the same benchmark runs almost 7 times
+faster with luajit -O than with standard Lua (or with
+-j off). Your mileage may vary.
+
+
+* Yes, LuaJIT relies on quite a bit of the Lua core infrastructure
+like table and string handling. All of this is written in C and
+should be compiled with full optimization turned on, or performance
+will suffer.
+
+Note that many of these benchmarks have changed over time (both spec
+and code). Benchmark results shown in previous versions of LuaJIT
+are not directly comparable. The next section compares different
+versions with the current set of benchmarks.
+
+
+
Comparing LuaJIT Versions
+
+This shows the improvements between the following versions:
+
+
+
LuaJIT 1.0.x
+
LuaJIT 1.1.x
+
+
+
+
+
+
Benchmark
+
Speedup
+
+
+
+
+
+
fannkuch
+
3.96 → 5.37
+
+
+
+
chameneos
+
2.25 → 5.08
+
+
+
+
nsievebits
+
2.90 → 5.05
+
+
+
+
pidigits
+
3.58 → 4.94
+
+
+
+
nbody
+
4.16 → 4.63
+
+
+
+
cheapconcr
+
1.46 → 4.46
+
+
+
+
partialsums
+
1.71 → 3.73
+
+
+
+
fasta
+
2.37 → 2.68
+
+
+
+
cheapconcw
+
1.27 → 2.52
+
+
+
+
revcomp
+
1.45 → 1.92
+
+
+
+
knucleotide
+
1.32 → 1.59
+
+
+
+
+
+All other benchmarks show only minor performance differences.
+
+
+
Summary
+
+These results should give you an idea about what speedup
+you can expect depending on the nature of your Lua code:
+
+
+
+LuaJIT is really good at (floating-point) math and loops
+(mandelbrot, pidigits, spectralnorm, partialsums).
+
+
+Function calls (recursive), vararg calls, table lookups (nbody),
+table iteration and coroutine switching (chameneos, cheapconc)
+are a lot faster than with plain Lua.
+
+
+It's still pretty good for indexed table access (fannkuch, nsieve)
+and string processing (fasta, revcomp, knucleotide).
+But there is room for improvement in a future version.
+
+
+If your application spends most of the time in C code
+you won't see much of a difference (regexdna, sumfile).
+Ok, so write more code in pure Lua. :-)
+
+
+The real speedup may be shadowed by other dominant factors in a benchmark:
+
+
Common parts of the Lua core: e.g. memory allocation
+and GC (binarytrees).
+
Language characteristics: e.g. lack of bit operations (nsievebits).
+
System characteristics: e.g. CPU cache size and memory speed (nsieve).
+
+
+
+
+The best idea is of course to benchmark your own applications.
+Please report any interesting results you may find. Thank you!
+
+LuaJIT has only a single stand-alone executable, called luajit.
+It can be used to run simple Lua statements or whole Lua applications
+from the command line. It has an interactive mode, too.
+
+
+Note: The optimizer is not activated by default because it resides
+in an external module
+(see Installing LuaJIT).
+It's recommended to always use the optimizer, i.e.:luajit -O
+
+
+
Command Line Options
+
+The luajit stand-alone executable is just a slightly modified
+version of the regular lua stand-alone executable.
+It supports the same basic options, too. Please have a look at the
+Manual Page
+for the regular lua stand-alone executable.
+
+
+Two additional options control LuaJIT behaviour:
+
+
+
-j cmd[=value]
+
+This option performs a LuaJIT control command. LuaJIT has a small
+but extensible set of control commands. It's easy to add your own.
+
+
+The command is first searched for in the jit.* library.
+If no matching function is found, a module named jit.<cmd>
+is loaded. The module table must provide a start() function.
+
+
+For the -j cmd form the function is called without an argument.
+Otherwise the value is passed as the first argument (a string).
+
+
+Here are the built-in LuaJIT control commands:
+
+
+
-j on — Turns the JIT engine on (default).
+
-j off — Turns the JIT engine off.
+
-j debug[=level] — Set debug level. See
+jit.debug().
+
+
+The following control commands are loaded from add-on modules:
+
+
+
-j trace[=file] — Trace the progress of the JIT compiler.
+
-j dumphints[=file] — Dump bytecode + hints before compilation.
+
-j dump[=file] — Dump machine code after compilation.
+
+
+
+
-O[level]
+
+This option loads and runs the optimizer module jit.opt.
+The optimizer generates hints for the compiler backend to improve
+the performance of the compiled code. The optimizer slows down
+compilation slightly, but the end result should make up for it
+in almost every case.
+
+
+The -O form sets the default optimizer level, which is
+currently 2 (this may change in future versions
+of LuaJIT).
+
+
+The -Olevel form explicitly sets the optimizer level:
+
+
+
-O0 — disable the optimizer but leave it attached.
+
-O1 — perform standard optimizations (like hints for table lookups).
+
-O2 — like -O1 but also loads jit.opt_inline to enable result hints and inlining for standard library functions.