This document specifies some internal details about the implementation of malc (Mal compiler).
%mal_obj
is a 64-bit value which can hold one of the following:
- Integer
- Constant (nil, false, true)
- Pointer to object
Integers are represent as a %mal_obj
value which has its least significant
bit set. The upper 63 bits hold the integer itself. To represent an integer X
as %mal_obj
:
mal_obj_value = (X << 1) | 0x1
The three constant values are represented by the following 64-bit values:
- 2 -
nil
- 4 -
false
- 6 -
true
If a %mal_obj
is not an integer and not a constant, it is treated as a
pointer to a %mal_obj_header_t
struct (16 bytes struct):
struct %mal_obj_header_t {
uint32_t type;
uint32_t len;
%mal_obj meta;
void* data;
}
The data
field can point into either a bytearray (char*
) or an elementarray
(%mal_obj*
), depending on the value of type
.
Mal types symbol, string and keyword are held internally as bytearray
objects. For these objects, the len
field in the object header holds the
length of the string pointed by data
in bytes (not including a terminating
null char). data
of course points to the byte array itself, which is of size
len + 1
bytes, the last byte is always \x00
.
The type field indicates the Mal type:
- 17 - Symbol
- 18 - String
- 19 - Keyword
All other Mal types (list, vector, hash-map and atom), are held internally as
elementarray objects. In those objects, the len
fields holds the number of
elements in the data
, and data
points to an array of %mal_obj
elements
(which in turn can be of any type - integer, boolean, string, vector, etc.).
The type field indicates the Mal type:
- 33 - List
- 34 - Vector
- 35 - Hash-map
- 49 - Atom
Lists and vectors are simple - an N-element list (or vector) is stored as an elementarray with length N.
Hash-maps are stored as an element array of alternating keys and values (K1,
V1, K2, V2, ...); note that the len
field for hash-maps will be twice than
the number of keys in the hash. Also note that this is a very naive (read:
slow) implementation; key lookup is performed with an O(n) linear search.
Atoms are stored as an elementarray with exactly one element (the atom's value).
The three internal types Env, Func and NativeFunc are held as elementarrays with special designation for each "slot" (poor man's struct).
For Env objects, type
is 65 and len
is 2. The data
elements are:
- Index 0:
outer
- Outer environment (Env object), ornil
if this is the root environment - Index 1:
data
- Hash-map from variables names (symbols) to their values (any Mal value)
For Func objects, type
is 66 and len
is 3. The data
elements are:
- Index 0:
arg_names
- Arguments names (vector of symbols) - Index 1:
env
- Env object - Index 2:
func_ptr
- Function pointer (of type:%mal_obj fn(%mal_obj env)
)
For NativeFunc objects, type
is 67 and len
is 3. The data
elements are:
- Index 0:
arg_names
- Arguments names (vector of symbols) - Index 1:
name
- Function name (symbol) - Index 2:
func_ptr
- Function pointer, of one of the following types (according to length ofarg_names
):- 0 arguments:
%mal_obj()
- 1 arguments:
%mal_obj(%mal_obj)
- 2 arguments:
%mal_obj(%mal_obj,%mal_obj)
- 3 arguments:
%mal_obj(%mal_obj,%mal_obj,%mal_obj)
- 0 arguments:
The LLVM optimizer recognizes recursive tail calls and optimizes them to jumps; therefore malc performs no special compilation to produce iterative code.
malc uses the C++ exception handling library (unwind
which is part of
libstdc++
). The mal exception struct (%mal_exn_t
) consists of one
%mal_obj
field which holds the Mal value of the thrown object (the argument
of the throw
function) and a standard _Unwind_Exception
struct.
A try
clause is extracted to a function which is called with the invoke
LLVM instruction. The function might throw an exception; throw
is
implemented with _Unwind_RaiseException
. In such a case, control will
continue at the landingpad
(catch). Using pointer calculation the program
retrieves the original %mal_obj
that was thrown which is available then to
the catch clause (which is also extracted into a separate function).