A Simple Example
++To get you started, here is a simple code snippet to be pre-processed. +The lines starting with '|' (the pipe symbol) are for DynASM: +
++ if (ptr != NULL) { + | mov eax, foo+17 + | mov edx, [eax+esi*2+0x20] + | add ebx, [ecx+bar(ptr, 9)] + } ++
+After pre-processing you get: +
++ if (ptr != NULL) { + dasm_put(Dst, 123, foo+17, bar(ptr, 9)); + } ++
+Note: yes, you usually get the assembler code as comments and proper +CPP directives to match them up with the source. I've omitted +them here for clarity. Oh and BTW: the pipe symbols probably +line up much more nicely in your editor than in a browser. +
++Here 123 is an offset into the action list buffer that +holds the partially specified machine code. Without going +into too much detail, the embedded C library implements a +tiny bytecode engine that takes the action list as input and +outputs machine code. It basically copies machine code snippets +from the action list and merges them with the arguments +passed in by dasm_put(). +
++The arguments can be any kind of C expressions. In practical +use most of them evaluate to constants (e.g. structure offsets). +Your C compiler should generate very compact code out of it. +
++The embedded C library knows only what's absolutely needed to +generate proper machine code for the target CPU (e.g. variable +displacement sizes, variable branch offset sizes and so on). +It doesn't have a clue about other atrocities like x86 opcode +encodings — and it doesn't need to. This dramatically +reduces the minimum required code size to around 2K [sic!]. +
++The action list buffer itself has a pretty compact encoding, too. +E.g. the whole action list buffer for an early version of LuaJIT +needs only around 3K. +
+ +Advanced Features
++Here's a real-life example taken from LuaJIT that shows some +advanced features like type maps, macros and how to access +C structures: +
++|.type L, lua_State, esi // L. +|.type BASE, TValue, ebx // L->base. +|.type TOP, TValue, edi // L->top. +|.type CI, CallInfo, ecx // L->ci. +|.type LCL, LClosure, eax // L->ci->func->value. +|.type UPVAL, UpVal + +|.macro copyslot, D, S, R1, R2, R3 +| mov R1, S.value; mov R2, S.value.na[1]; mov R3, S.tt +| mov D.value, R1; mov D.value.na[1], R2; mov D.tt, R3 +|.endmacro + +|.macro copyslot, D, S; copyslot D, S, ecx, edx, eax; .endmacro + +|.macro getLCL, reg +||if (!J->pt->is_vararg) { +| mov LCL:reg, BASE[-1].value +||} else { +| mov CI, L->ci +| mov TOP, CI->func +| mov LCL:reg, TOP->value +||} +|.endmacro + +|.macro getLCL; getLCL eax; .endmacro + +[...] + +static void jit_op_getupval(jit_State *J, int dest, int uvidx) +{ + | getLCL + | mov UPVAL:ecx, LCL->upvals[uvidx] + | mov TOP, UPVAL:ecx->v + | copyslot BASE[dest], TOP[0] +} ++
+And here is the pre-processed output (stripped a bit for clarity): +
++#define Dt1(_V) (int)&(((lua_State *)0)_V) +[...] +static void jit_op_getupval(jit_State *J, int dest, int uvidx) +{ + if (!J->pt->is_vararg) { + dasm_put(Dst, 1164, Dt2([-1].value)); + } else { + dasm_put(Dst, 1168, Dt1(->ci), Dt4(->func), Dt3(->value)); + } + dasm_put(Dst, 1178, Dt5(->upvals[uvidx]), DtF(->v), Dt3([0].value), + Dt3([0].value.na[1]), Dt3([0].tt), Dt2([dest].value), + Dt2([dest].value.na[1]), Dt2([dest].tt)); +} ++
+