Contents
Introduction Hello World Fizz-buzz Statements Types Operators Clauses Automatic parallelization Caching Functions Lists Arrays Strings Exceptions I/O Units Type classes Threads FFI Performance considerations Verifying Ajla programs with Z3 Hacking Ajla
Introduction
Ajla is a purely functional programming language that has look-and-feel like traditional imperative languages. The return value of every function in Ajla depends only on its …
Contents
Introduction Hello World Fizz-buzz Statements Types Operators Clauses Automatic parallelization Caching Functions Lists Arrays Strings Exceptions I/O Units Type classes Threads FFI Performance considerations Verifying Ajla programs with Z3 Hacking Ajla
Introduction
Ajla is a purely functional programming language that has look-and-feel like traditional imperative languages. The return value of every function in Ajla depends only on its arguments. Ajla has mutable local variables and control flow statements (if, while, for, goto) that are known from imperative languages. Ajla doesn’t have mutable global variables because they break purity and they may be subject to race conditions.
Ajla is memory-safe — i.e. you can’t create a segmentation fault in Ajla. Ajla doesn’t have garbage collection; it uses reference counts to track allocated memory. Ajla has mutable arrays — if the array reference count is one, it is mutated in place, if it is different from one, a copy of the array is created.
Ajla has the possibility to verify program correctness using the Z3 library.
Unpacking Ajla
Before compiling Ajla, install the packages libgmp-dev, libffi-dev, libnuma-dev. If libgmp-dev is not installed, Ajla will use slow built-in version. If libffi-dev is not installed, the possibility to call C functions from Ajla will be disabled. If libnuma-dev is not installed, the optimization for Non-uniform memory access (NUMA) will be disabled.
Download the newest Ajla version from the downloads directory. Unpack the tar.gz archive and compile it with "./configure && make". Install it with make install.
Alternatively, you can download the current development version with
"git
clone git://repo.or.cz/ajla.git"
. There is no "./configure" script in the git repository, you can create it by running "./autogen.sh". You need autoconf, automake and ed installed.
It is recommended to use gcc — the compilation will take several minutes and it will require about 9GB memory when compiling the files ipret.c and ipretc.c in parallel. Compilation with clang works, but it may be very slow, it may take an hour or more to compile the files ipret.c and ipretc.c. If you want to reduce the compilation time and memory consumption, here are some tips:
- Pass the flag 
--disable-computed-gototo./configure— this will make the interpreter use one bigswitchstatement instead of a table of labels — it will degrade performance, but it will decrease compilation time and memory usage. - Pass the flag “-j1” to make, so that the files 
ipret.candipretc.care not compiled in parallel — this will reduce memory consumption (compilation of each of these files consumes about 4GB RAM), but it will double compilation time. - Run “
CFLAGS=-O1 ./configure” instead of"./configure"— this will reduce optimizations performed by the compiler — the resulting code will be slower, but the compilation will take less time and consume less memory. - If you are using 
clang, use version 19 or 20 — the older versions compile Ajla very slowly, it may take an hour or more. - If you are using 
clang, don’t use the options-Wallor-Wuninitialized— the uninitialized variable check on clang is very inefficient and it may add about 30 minutes to the compilation time. 
When running Ajla programs, it is recommended to set the cpu governor to ‘performance’. Ajla starts and stops threads rapidly and the ‘ondemand’ governor underclocks the cores, resulting in slower performance.
You can compile and run a program directly by executing “
ajla
program.ajla
“. The program will be compiled as it runs and the result will be saved in the file ~/.cache/ajla/program.sav. When you run the program second time, it will be faster because there will be no compilation. If you change the source code file, the cache file will be invalidated and the program will be compiled again.
There’s a --compile switch — it will compile the whole program without running it and save it to the ~/.cache/ajla directory. You should use this switch when you are developing a program, because it will scan all the source code units and look for syntax errors.
The standard library is placed in the stdlib subdirectory, you can read it to find out what types and functions are there. The system.ajla file is implicitly included before compiling any Ajla source file.
In “programs/acmd/” there’s Ajla Commander — a Midnight Commander clone written in Ajla. You can run it with “
./ajla
programs/acmd/acmd.ajla
“.
Hello World
Programming language tutorials usually start with a program that prints “Hello World”. Let’s have look at “Hello World” in Ajla:
fn main(w : world, d : dhandle, h : list(handle), args : list(bytes),
env : treemap(bytes, bytes), prereq len(h) >= 3) : world
[
w := write(w, h[1], "Hello World!" + nl);
return w;
]
Copy this piece of code to a file “hello.ajla” and run “ajla hello.ajla” to execute it.
Here we declare a function main with the following arguments. The symbol before the dot is the name of the argument and the expression after the dot is the type of the argument.
w : world This is a token that must be passed to and returned from all functions that do I/O to sequence the I/O properly. d : dhandle This is a handle to the current working directory. The handle can be used when the user needs to open files relative to the working directory. h : list(handle) The list of handles that were passed to the program when it was executed. Usually, it contains 3 entries, the first for standard input, the second for standard output and the third for standard error. args : list(bytes) The arguments that were passed to the program. The type “bytes” represents a sequence of bytes, so list(bytes) is a list of sequences of bytes. env : treemap(bytes, bytes) This represents the environment variables for the program. The environment is represented as a tree that maps a sequence of bytes (i.e. the variable name) to another sequence of bytes (i.e. the variable content). This is implemented as an AVL tree, so that searching it is efficient. prereq len(h) >= 3 This is not an argument — it is a precondition that says that the list “h” has at least three entries. It is used when verification is enabled (it will be explained later).
The function contains the following statements:
w := write(w, h[1], "Hello World!" + nl); This statement writes the string “Hello World!” and a newline to the handle 1 (standard output). The statement takes a w variable as an I/O token and returns the w variable back. Passing the world variable back and forth between functions that do I/O is required to maintain I/O ordering. return w; This statement returns the world variable to the caller.
Passing the world variable
In order to show how the world passing works, let’s split the “Hello World” program to three write statements.
fn main(w : world, d : dhandle, h : list(handle), args : list(bytes),
env : treemap(bytes, bytes), prereq len(h) >= 3) : world
[
w := write(w, h[1], "Hello ");
w := write(w, h[1], "World!");
w := write(w, h[1], nl);
return w;
]
In a functional language that has non-strict evaluation, we can’t just do I/O anywhere because the I/Os could be reordered or not executed at all. We need some mechanism to maintain I/O ordering. Haskell uses monads to maintain I/O ordering, Ajla uses a different mechanism — world passing. Every function that performs I/O takes “world” as an argument and returns “world” as a return value (functions can have multiple return values in Ajla). The “world” variable makes sure that the functions can’t be reordered. In this example,
w :=
write(w, h[1], nl)
may be executed only after
w := write(w, h[1],
"World!")
finished. And w := write(w, h[1], "World!") may be executed only after w := write(w, h[1], "Hello ") finished.
Using implicit variables
Now, let’s have look at another Ajla feature — implicit variables. We add the implicit keyword to the function argument
w :
world
and we drop the w variable from the code that does writes.
fn main(implicit w : world, d : dhandle, h : list(handle), args : list(bytes),
env : treemap(bytes, bytes), prereq len(h) >= 3) : world
[
write(h[1], "Hello ");
write(h[1], "World!");
write(h[1], nl);
]
If a variable is declared as implicit, the compiler will automatically add the variable to the function calls where does it fit. The write function should have three arguments, here it has only two arguments, so the compiler will search for all implicit variables and it will use an implicit variable that fits into the remaining argument.
If we don’t specify a return value for the write function call, the compiler will search for all implicit variables and assign the return value to a variable where does it fit.
If we don’t use the return statement at the end of the function, the compiler will search for implicit variables and automatically return a variable that fits into the return value.
Here we can see that with the implicit variables the code really looks as if it were written in a procedural programming language. But it is not procedural — the code is translated to this and that is what is running on the virtual machine.
Omitting the arguments
The compiler already knows what arguments should the main function have, so you can omit them. So, we can simplify our program to this:
fn main
[
write(h[1], "Hello World!" + nl);
]
This is the simplest way how to write a “Hello World” program in Ajla. Internally, it is translated to this.
Fizz-buzz
Fizz-buzz is another standard programming test. The goal is to write a program that iterates from 1 to 100. If the number is divisible by 3, “Fizz” is written; if the number is divisible by 5, “Buzz” is written, otherwise the number is written.
fn main
[
for i := 1 to 101 do [
if i mod 15 = 0 then write(h[1], "Fizz Buzz ");
else if i mod 3 = 0 then write(h[1], "Fizz ");
else if i mod 5 = 0 then write(h[1], "Buzz ");
else write(h[1], ntos(i) + " ");
]
write(h[1], nl);
]
The for statement iterates a variable over a given range. The starting value is inclusive, the ending value is exclusive — thus, if we want to iterate from 1 to 100, we need to specify 1 and 101. The operator mod is an arithmetic remainder, the if statements test if the value is divisible by 15, 3 and 5. If it is not divisible by these numbers, we print the number; the ntos function converts a number to a string.
You can see that it looks very much like a procedural language.
Statements
Ajla has the following statements:
var x := 10; Creates a variable and assigns a value to it. The type is inferred, if we don’t want to infer the type, we use “var x : int := 10;” x := 20; Modifies the variable. const x := 10; Creates a constant and assigns a value to it. A constant can’t be modified. if condition then statement;
if condition then statement; else statement; The “if” statement. for i := 0 to 10 do statement; The “for” statement. It iterates from 0 to 9. for i in [ 10, 20, 30, 40 ] do statement; The “for in” statement can iterate over a collection. In this example it iterates over a list that holds four values: 10, 20, 30, 40. while condition do statement; The “while” statement. break; Exits the current “for” or “while” loop. continue; Starts a new iteration of the current “for” or “while” loop. return expression; Exits the current function and returns the expression. goto label; The “goto” statement. It doesn’t break purity, so it is allowed.
The if and while statements can accept multiple expressions separated by a comma — in this case, the expressions are evaluated from the first to the last and if one of them is evaluated as false, the evaluation is terminated and the conditional branch is not taken.
The assignment also accepts multiple expressions — for example, “
x,
y := y, x
“ will swap the variables x and y.
Types
Ajla has the following primitive types:
int A general integer number with arbitrary length. It is implemented as a 32-bit or 64-bit signed number. If some arithmetic operation overflows, it is converted to a long integer (using the gmp library) and the arithmetic is done with arbitrary precision.
On 32-bit architectures, int is 32-bit. On 64-bit architectures, int is 64-bit. On 64-bit architectures when the “--ptrcomp” switch is used, int is 32-bit. Note that these cases are semantically equivalent, because overflows are handled transparently. nat A general unsigned integer number with arbitrary length. If we are not verifying (i.e. when not using the “--verify” switch), it is equivalent to int. If we are verifying, the verifier will assume that the value is non-negative when reading it and it will verify that the value is non-negative when assigning to it. int8, int16, int32, int64, int128 These types behave in the same way as the “int” type. The difference is in their implementation. int8 is implemented as an 8-bit integer and if it overflows, long integers using the gmp library are used. int16 is implemented as a 16-bit integer (with overflows handled by the gmp library), etc. If the compiler that was used to build Ajla doesn’t support 128-bit integers, int128 is equivalent to int64. nat8, nat16, nat32, nat64, nat128 Unsigned equivalents of int8, int16, int32, int64, int128 sint8, sint16, sint32, sint64, sint128 This is a signed integer with a given size. If some arithmetic operation overflows, it is wrapped modulo the size. If the compiler doesn’t support 128-bit integers, emulation using gmp is used. uint8, uint16, uint32, uint64, uint128 This is an unsigned integer with a given size. If some arithmetic operation overflows, it is wrapped modulo the size. If the compiler doesn’t support 128-bit integers, emulation using gmp is used. real16, real32, real64, real80, real128 A floating point number with a given size. If the compiler has 128-bit floating point numbers and doesn’t have 80-bit floating point numbers, then real80 is an alias for real128. If the compiler has neither 80-bit nor 128-bit floating point numbers, a slow software emulation is used.
Floating point constants can have a suffix that specifies the type — ‘h’ for real16, ‘s’ for real32, no suffix for real64, ‘l’ for real80, ‘q’ for real128. bool A Boolean type — it can hold values true or false. type Ajla can pass types to functions as well as other values. We use the keyword type to specify that an argument is a type.
The following types are defined in the standard library:
byte An alias for uint8. char An alias for int32. real An alias for real64. rational A rational number — with an integer numerator and denominator. It is declared as:
record rational [
num den : int;
]
fixed_point(base, digits) A fixed point number with the specified base and the specified number of digits after the dot. The number of digits before the dot may be large — if it doesn’t fit, the gmp library is used. decimal(digits) An alias for fixed_point(10, digits). sint(bits) A signed integer with the specified number of bits. If some arithmetic operation overflows, it is wrapped modulo the size. uint(bits) A unsigned integer with the specified number of bits. If some arithmetic operation overflows, it is wrapped modulo the size. floating(ex_bits, sig_bits) An arbitrary-precision floating-point number. The exponent has ex_bits and the mantissa has sig_bits.
Ajla has the following composite types:
list(t) A list of elements where each element has a type t. Lists can be appended or sliced. array(t, [ 10, 20, 30 ]) An array with arbitrary number of dimensions. In this example, it has three dimensions with the sizes 10, 20 and 30 elements. Arrays can’t change their size after they are created.
record [ element1 : type1; element2 : type2; element3 : type3; ...
]
Record — it groups different types into a single type.
option [ element1 : type1; element2 : type2; element3 : type3; ...
]
An option holds only one of the specified types. In this example, it can hold either an element of the type type1 or an element of the type type2 or an element of the type type3.
There’s an operator “is” that tests if the option holds a specified value. For example “o is element2” returns “true” if the option holds the value “element2”.
There’s an operator “ord” that returns the ordinal number of the value that the option holds (starting from 0). For example, if “o” holds the value “element3”, then “ord o” returns the value 2.
The following composite types are defined in the standard library:
bytes An alias for list(byte). string An alias for list(char). maybe(t) It can hold either the value of type t or nothing. It is declared as:
option maybe(t : type) [
j : t;
n;
]
tuple2(t1, t2) A tuple holding 2 values of types t1 and t2. tuple3(t1, t2, t3) A tuple holding 3 values of types t1, t2 and t3. tuple4(t1, t2, t3, t4) A tuple holding 4 values of the specified types. tuple5(t1, t2, t3, t4, t5) A tuple holding 5 values of the specified types. If you need larger tuples, you must declare them on your own with a record type. treemap(key_type, value_type) A key-value store with the specified key type and value type. It is implemented as an AVL tree. treeset(key_type) A set containing values of the specified type. It is implemented as an AVL tree. heap(key_type) A binary heap that can quickly insert an element or extract the lowest element. It is implemented as a list. unit_type This type may hold only one value — unit_value. bottom_type This type can’t hold any value, it can only hold exceptions. It is used for functions that never return (for example for message loops). It is declared as:
option bottom_type [
]
Operators
| Operator | Priority | Description | 
|---|---|---|
Unary + | 1000 | It just returns the passed value | 
Unary - | 1000 | Negation | 
* | 2000 | Multiplication | 
/ | 2000 | Floating point division | 
div | 2000 | Integer division | 
mod | 2000 | Integer remainder | 
+ | 3000 | Addition (or append when applied to lists) | 
- | 3000 | Subtraction | 
x +< y | 3000 | Append a value y to the list x | 
shl | 4000 | Bit shift left | 
shr | 4000 | Bit shift right | 
rol | 4000 | Bit rotation left | 
ror | 4000 | Bit rotation right | 
x bts y | 4000 | Set y-th bit in x | 
x btr y | 4000 | Clear y-th bit in x | 
x btc y | 4000 | Invert y-th bit in x | 
x bt y | 4000 | Test if y-th bit in x is set | 
Unary bswap | 4000 | Reverse bytes in a number | 
Unary brev | 4000 | Reverse bits in a number | 
Unary bsf | 4000 | Finds the lowest set bit | 
Unary bsr | 4000 | Finds the highest set bit | 
Unary popcnt | 4000 | Count the number of set bits | 
Unary is_negative | 5000 | Test if a real number is negative | 
Unary is_infinity | 5000 | Test if a real number is infinite | 
Unary is_exception | 5000 | Test if a value is an exception | 
= | 6000 | Test for equality | 
<> | 6000 | Test for non-equality | 
> | 6000 | Test if the first argument is greater | 
>= | 6000 | Test if the first argument is greater or equal | 
< | 6000 | Test if the first argument is less | 
<= | 6000 | Test if the first argument is less or equal | 
not | 7000 | Logical negation | 
and | 8000 | Logical and | 
xor | 9000 | Logical exclusive or | 
or | 10000 | Logical or | 
==> | 11000 | Logical implication | 
If we pass different types to an operator, the second argument is converted to a type of the first argument. For example 2.5 + 1 will return a floating point value 3.5. 1 + 2.5 will return an integer value 3.
Clauses
Every Ajla source file consists of clauses. This is the list of the clauses:
fn Declares a function. For example:
fn maximum(a b : int) : int := select(a < b, a, b);
or
fn maximum(a b : int) : int
[
if a < b then
return b;
else
return a;
]
operator Declares an operator with a given priority. For example, this declares a unary postfix operator “!” that calculates a factorial:
operator postfix ! 1000 (n : int) : int
[
var v := 1;
for i := 2 to n + 1 do
v *= i;
return v;
]
const Declares a constant. A constant is a function that has no arguments. For example:
const ten := 10;
const hello := `Hello`;
type Declares a type. For example type byte := uint8; declares the type “byte” as an alias to “uint8”. record Declares a record. For example:
record person [
name : string;
surname : string;
age : int;
]
option Declares an option. For example:
option bool [
false;
true;
]
uses Imports a unit from the standard library or from the program directory. For example “uses heap;” imports the file “stdlib/heap.ajla”. define Defines a macro. See for example “define int_instance” from stdlib/system.ajla.
Function, const and type declarations may be prefixed with:
private This declaration is only usable in the unit where it appears. It will not be imported to other units. implicit If you pass less arguments to a function than what was specified in the function header, the compiler will attempt to infer the remaining arguments. The “implicit” keyword makes this function a candidate for inferring. conversion This function converts one type to another. If there is a type mismatch, the compiler will scan all the “conversion” functions and try to resolve the mismatch automatically by adding the appropriate conversion.
Automatic parallelization
Let’s have a look at this program that does Fibonacci number calculation. It is deliberately written in an inefficient recursive way.
fn fib(n : int) : int
[
if n <= 1 then
return n;
else
return fib(n - 2) + fib(n - 1);
]
fn main
[
var x := ston(args[0]);
var f := fib(x);
write(h[1], "fib " + ntos(x) + " = " + ntos(f) + nl);
]
If you run this program with some higher value, for example 45, you will notice that all the cores are busy. That’s because Ajla does automatic parallelization.
How does automatic parallelization work? We should parallelize only functions that take long time. If we parallelized every function call, the overhead of the parallelization would cause massive slowdown.
Ajla scans the stack every tick (by default, the tick is 10ms, it can be changed with the --tick argument). If some function stays on the stack for two ticks, it took long enough and it can be parallelized. For example, suppose that “Frame 4” in this diagram is there for 2 timer ticks. The stack is broken down into two stacks and both of these stacks are executed concurrently. The function “Frame 3” in the upper stack needs some return value, but we don’t know the return value yet (the return value would be returned by the topmost function in the lower stack) — so we return a structure called thunk to the upper stack. If the lowermost function in the upper stack attempts to evaluate the thunk, it waits for the lower stack to finish (in this situation, parallelization is not possible). If the lowermost function in the upper stack doesn’t attempt to evaluate the thunk, both stacks run concurrently.
Automatic parallelization can be disabled with the “--strict-calls” switch.
Caching
Let’s have a look at the Fibonacci number example again:
fn fib~cache(n : int) : int
[
if n <= 1 then
return n;
else
return fib(n - 2) + fib(n - 1);
]
fn main
[
var x := ston(args[0]);
var f := fib(x);
write(h[1], "fib " + ntos(x) + " = " + ntos(f) + nl);
]
We added a ~cache specifier to the function fib. Because Ajla is purely functional, every function will return the same value if the same arguments are supplied. Thus, we can cache the return values. This is what the ~cache specifier does.
Now, you can see that you can pass large values to the function and the function will complete quickly. That’s because the Ajla virtual machine remembers what value was returned for what argument and if you call the function again with the same argument, it will just return a cached value.
In this example, the caching just turned an algorithm with O(2n) complexity to an algorithm with O(n log n) complexity. The cache is implemented as a red-black tree, so operations on it have logarithmic complexity.
Functions
Function call specifiers
Ajla has the following function call specifiers:
~normal Default — attempt to parallelize after two timer ticks ~strict Don’t attempt to parallelize ~spark Parallelize immediately ~lazy Evaluate when needed (like in Haskell) ~inline Inline the function (i.e. insert it into the caller) ~cache Cache the results ~save Cache the results and save them to ~/.cache/ajla/
The specifiers may be specified either at function declaration or at function call. If different specifiers are specified at function declaration and at function call, the specifier from the function call wins.
~spec, ~nospec ~spec can be specified in a function declaration after parameter name. It makes the function specialize on this parameter. If we pass a constant as this parameter, a new version of the function is created and the constant is inlined into it.
This can also be specified at a function call (after the argument expression), in order to specialize just this call. The ~nospec specifier may be specified at a function call and it undoes the specialization.
Nested functions
Functions may be nested. The nested function has access to variables of the parent function that were visible at the point where the nested function was declared. If the variable is later changed in the parent function, the change is not promoted to the nested function. If the variable is later changed in the nested function, the change is not promoted to the parent function. This is an example of a nested function:
fn main
[
fn sum(a b : int) : int
[
return a + b;
]
write(h[1], ntos(sum(10, 20)) + nl);
]
Nested functions can’t be recursive.
Lambda functions
Lambda functions are anonymous functions that are declared inside an expression in the parent function. Like nested functions, they may use parent function variables that were visible when the lambda function was declared. This is an example of lambda functions:
fn main
[
var l := [ 10, 15, 20, 25, 30, 35, 40 ];
l := list_filter(l, lambda(x : int) [ return not x bt 0; ]);
var add := 1;
l := map(l, lambda(x : int) [ return x + add; ]);
var m := map(l, lambda(x : int) [ return ntos(x); ]);
write(h[1], list_join(m, nl));
]
We start with a list of seven elements: 10, 15, 20, 25, 30, 35, 40. The function “list_filter” takes a list and a function that returns a Boolean value and returns the elements for which the function returned "true". In this example, it selects even numbers (the operator “bt 0” tests if bit 0 is set). So, the list has now only four elements: 10, 20, 30, 40. The function “map” takes a list and a function, applies the function to every element of the list and returns the list of the results. In this example, the first “map” function will add 1 to every element of the list. The next “map” takes a list and applies the “ntos” function to every element of the list — i.e. it converts the list of numbers to the list of byte strings. The “list_join” function joins the byte strings and separates them with the second arguments — that is a newline. The program will print this:
11
21
31
41
Currying
Currying is the operation where we take a function, pass fewer arguments to the function than what was specified in the function header and create a new function that takes the remaining arguments. In Ajla, currying is done by passing empty arguments from the right end of the argument list.
fn main
[
fn sum(a b : int) : int
[
return a + b;
]
var add_ten := sum(10,);
write(h[1], ntos(add_ten(20)) + nl);
]
For example, here we have a function “sum” that takes two arguments and returns their sum. If we write “sum(10,)”, we create a new function that takes one argument and adds the value 10 to the argument. We assign this new function to the variable “add_ten”. Finally, we call “add_ten(20)”, which returns the value 30.
Lists
list(t) represents a list type whose elements have a type of t. All the elements in a list must have the same type.
var l := list(int).[ 10, 20, 30, 40 ]; Creates a list with four members — 10, 20, 30, 40. var l := [ 10, 20, 30, 40 ]; Creates a list with four members — 10, 20, 30, 40. We can omit the type of the list — the type will be derived from the type of the first member. var l := empty(int); Creates an empty list of integers. var l := fill('a', 10); Creates a list with 10 elements equal to ‘a’. var l := sparse('a', 1000000000); Functionally, it is equivalent to fill. But unlike fill, sparse creates a compressed list that consumes little memory even when it is very large. If you modify the compressed list, it will be stored as a b+tree with consecutive runs of the same value compressed into a single b+tree node. Sparse lists are slower than flat lists because the virtual machine has to walk the b+tree on every access. var l := [ 10, 20, 30, 40 ] + [ 50, 60, 70, 80 ]; Append two lists. var l := [ 10, 20, 30, 40 ] <+ 50; Append one value to a list. var m := l[3]; Pick a member at index 3. The indices start from 0. l[3] := 100; Modify the list. If the list has a reference count different from 1, the copy of the list is created and modified. If the list has a reference count 1, it is modified in place. var m := l[2 .. 4]; Take a slice of the list, starting with member 2 (inclusive) and ending with member 4 (exclusive). var m := l[ .. 4]; Take a slice of the list, starting at the beginning of the list and ending with member 4 (exclusive). var m := l[4 .. ]; Take a slice of the list, starting with member 4 (inclusive) and ending at the list end. var a := len(l); Get a length of the list. var b := len_at_least(l, 10); Returns true if the length is 10 or more elements. var b := len_greater_than(l, 10); Returns true if the length is greater than 10 elements.
len_at_least and len_greater_than are useful when dealing with infinite lists. We cannot use “
if len(l) >= 10 then
...
“ on an infinite list, because it would attempt to evaluate the whole list and get into an infinite loop and memory hog. If we use “
if
len_at_least(l, 10) then ...
“, the virtual machine will attempt to evaluate the first 10 entries and it returns true without attempting to evaluate further entries. for i in [ 10, 20, 30, 40 ] do ... Iterate over a list. The loop body will be executed 4 times, with i being 10, 20, 30 and 40.
Infinite lists
This is an example that creates an infinite list, iterates over it and prints the result.
fn inf_list~lazy(i : int) : list(int)
[
return [ i ] + inf_list(i + 1);
]
fn main
[
var l := inf_list(0);
for e in l do
write(h[1], ntos(e) + nl);
]
Note that we must not use len(list) because it would force evaluation of the whole list — such evaluation never finishes and it blows memory.
Infinite lists can be created with these functions:
infinite(10) Creates an infinite list containing the values 10. infinite_repeat([ 1, 2, 3]) Creates an infinite list containing the values 1, 2, 3, 1, 2, 3, 1, 2, 3 ... etc. infinite_uninitialized(int) Creates an infinite lists with all members being exceptions. It may be useful to create associative arrays. You can test if a member is uninitialized with the function is_uninitialized.
Arrays
array(t, shape) represents an array type whose elements have a type of t. shape is a list of integers that represents dimensions of the array.
var a := array_fill(1, [ 3, 3, 3 ]); Creates a three-dimensional array and fill it with value 1. var a := array_sparse(1, [ 3, 3, 3 ]); Functionally, it is equivalent to array_fill. But it creates a compressed array. m := a[0, 1, 2]; Pick a value at a give index. a[0, 1, 2] := 100; Modify a value at a give index. list_to_array Converts a list to an array. array_to_list Converts an array to a list.
Note: arrays are just syntactic sugar for lists. Internally, the virtual machine treats arrays as if they were lists.
Strings
Ajla has two kinds of strings. Byte strings are represented by the type “bytes” which is an alias for “list(byte)” which is an alias for “list(uint8)”. Character strings are represented by the type “string”, which is an alias for “list(char)” which is an alias for “list(int32)”.
Character constants are specified using single quotes, for example 'C'. Byte constants can be specified using quotation marks, for example "hello". String constants are specified using backquotes, for example \Hello``. String constants are always considered as UTF-8, regardless of the system locale — so that if the user moves the source file between systems with different locales, we get consistent result.
For byte strings, the characters are stored system-defined locale. It is usually UTF-8, but it may be different, depending on the operating system and the “LANG” variable.
The character strings are stored in Unicode. They use the “int32” type — that is arbitrary-precision integer. If there are Unicode combining characters, they are not stored as a separate character, they are superimposed to the character they belong to. In the unit charset, there is “const combining_shift : char := 21 — that means that a combining character is shifted by 21 bits to the left and added to the base character. The reason why is it done this way is to make sure that text editors can treat each “char” as one visible character and they don’t have to deal with combining characters in their logic.
The unit charset (stdlib/charset.ajla) contains the conversion routines between ascii, utf-8, locale-specific encoding and strings. If we want to write or read strings, we need to convert them to or from bytes using the system locale. The system locale is obtained with the function locale_init or locale_console_init. These functions are almost equivalent, the only difference is on Windows 9x, where locale_init returns the ANSI character set and locale_console_init returns the OEM character set. On Windows NT, both of these functions return UTF-8 locale and the Ajla runtime will translate UTF-8 names to UTF-16 names that are used by the Windows NT syscalls. On Unix-based systems, both of these functions return the character set as set by the variables “LC_ALL”, “LC_CTYPE” or “LANG” and there is no translation of byte strings when they are passed to syscalls.
For example, this program converts my name to the system locale and prints it:
uses charset;
fn main
[
var loc := locale_console_init(env);
write(h[1], string_to_locale(loc, `Mikuláš Patočka`) + nl);
]
The first statement loads the current locale based on the environment variables being set. The function string_to_locale will covert the string to bytes represented by the current locale. It will work not only on UTF-8 system, but on ISO-8895-2 system as well. If the system locale doesn’t have the characters ‘á’, ‘š’ or ‘č’, they are converted to appropriate ascii characters.
Exceptions
Because Ajla can parallelize or reorder function calls, exceptions as we know them from Java or C++ wouldn’t be useful because they could be triggered at random points. Exceptions in Ajla are implemented differently. Exception is just a special value that can be stored in any variable.
For example “var x := 0 div 0;” will store the “invalid operation” exception into the variable x.
If we don’t use the variable x, the exception is quietly discarded.
If we perform arithmetic using the exception, the exception is propagated. For example, if we execute “var y := x + 1;”, the variable y will hold the exception as well. There is one exception to this rule — the operators “and” and “or” don’t always propagate exception if one of the arguments is known. “
false and
exception
“ or “exception and false” evaluates to “false”. “true or exception” or “
exception or
true
“ evaluates as “true”.
If we attempt to perform a conditional branch that depends on the exception value, the current function is terminated and the exception is returned to the caller. For example “if x = 3 then something;” will terminate the current function.
There’s an operator “is_exception” that returns true if the argument is an exception and that returns false otherwise. It allows us to “handle” the exception. For example, we could write this code to report the exception to the user:
if is_exception x then
write(h[1], "Exception occurred" + nl);
else
write(h[1], "There's no exception, the value is " + ntos(x) + nl);
Floating point “NaN” values are treated like exceptions — is_exception will return true if the value is a NaN.
Every exception contains three values:
Class ec_sync, ec_async, ec_syscall or ec_exit
ec_sync The exception happened due to execution of the program. For example, invalid numeric calculation or index out of array/list size. ec_async The exception happened due to conditions not related to the program. For example, memory allocation failure falls into this category. ec_syscall The exception happened because some syscall failed. ec_exit The exception holds the return value that should be returned when the program exits.
Type This is an exception code. Exception types are listed in the file stdlib/ex_codes.ajla — see the constants “error_*”. Code This is auxiliary value. It’s meaning depends on the exception type.
Type: error_system Code is one of the system_error_* values. Type: error_errno Code is the errno value. Type: error_os2 Code is the OS/2 error number. Type: error_os2_socket Code is the OS/2 socket error number. Type: error_win32 Code is the Windows error number. Type: error_h_errno Code is the h_errno error number. Type: error_gai Code is the getaddrinfo return value. Type: error_subprocess Code is the subprocess exit number, if the code is negative, it is the signal number that terminated the subprocess. Type: error_exit Code is the return value that should be returned from the current process.
Note that because different systems (POSIX, OS/2, Windows) have different error codes, Ajla tries to translate common error codes to one of the system_error_* values. For example, a “file exists” error gets translated to system_error_eexist, so that the program that tests for it can be portable. However, not all error codes could be translated and if an unknown error code is received, it is reported as error_errno, error_os2 or error_win32 depending on the operating system.
Additionally, exceptions may contain an optional error string and an optional stack trace, so that the user can determine where did the exception happen.
Operators that examine exceptions
is_exception Returns true if the argument is an exception. exception_class Returns class of an exception. exception_type Returns type of an exception. exception_aux Returns auxiliary value of an exception. exception_string Returns a string representing the type and aux values, you can use it to display the exception to the user. exception_payload Returns the raw string attached to the exception. exception_stack Returns the stack trace attached to the exception.
Functions that manipulate exceptions
The unit exception (located in the file stdlib/exception.ajla) contains the following functions:
exception_make Makes an exception with the given class, type and code and optional stack trace. exception_make_str Makes an exception with the given class, type, code and string and optional stack trace. exception_copy Copies the exception from a variable s that has a type src_type to the return value that has a type dst_type. The variable s must hold an exception, if not, invalid operation exception is returned.
Statements that manipulate exceptions
eval expression Evaluate a given expression (or more expressions separated by a comma), and discards the result. It may be used to print debugging messages, for example eval debug("message"). The debug statement writes the message to the standard error handle. xeval expression Evaluate a given expression (or more expressions separated by a comma). If the result is non-exception, the result is discarded. If the result is an exception, the current function is terminated and the exception is returned as a return value. abort Terminate the current function with with ec_sync, error_abort. abort expression Evaluate a given expression (or more expressions separated by a comma). If the result is an exception, the current function is terminated and the exception is returned as a return value. If the result is non-exception, the current function exits with ec_sync, error_abort. It may be used with the statement “internal” to terminate the whole process if some internal error happens —
abort internal("this shouldn't
happen")
. keep variables Doesn’t evaluate the variables, it just marks the variables as live, so that the optimizer won’t discard them.
Syntax errors
Note that syntax errors are also treated as exceptions — if the function with syntax error is never called, the error is ignored; if it is called, the exception is returned as a return value. In this program, we define a function “syntax_error” that contains a syntax error:
fn syntax_error : int
[
bla bla bla;
]
fn main
[
var q := syntax_error;
if is_exception q then [
write(h[1], "Exception happened" + nl);
]
]
This program will write: “Exception happened”.
Reporting exceptions lazily when the function is called may not be useful during program development — when you are developing a program, it is recommended to use the “--compile” flag. It will attempt to compile all the functions in the program and it will report an error if any of them fails.
I/O
We have already seen the function write to perform an I/O. Let’s have a look at other I/O functions. I/O functions take and return the value of type world, this ensures that they are properly sequenced and that they are not reordered or executed in parallel. I/O functions specify a handles on which the I/O is to be performed, handle represents a handle to a file (or pipe, character device, block device, socket). dhandle represents a handle to a directory. Handles are automatically closed when the handle variable is no longer referenced by the program.
The function main receives a dhandle argument that is the handle to the current working directory and a list of handle values that represents the standard input, output and error streams.
Handles may be manipulated in three modes:
Read mode The handle is being read sequentially Write mode The handle is being written sequentially Block mode You can perform read and write operations on arbitrary offsets in the file. This mode only works for files and block devices.
You shouldn’t mix these modes on a single file because it may result in bugs — for example, some operating systems have the functions pread and pwrite and they use them when doing I/O on the handle in the block mode. However, other operations systems don’t have these functions, so they will lock the file handle, perform lseek, perform read or write and unlock the file handle. If you mixed block mode with read mode, the read mode would read from invalid file offsets that were set up by the block mode.
The I/O functions are defined in the unit io. This unit is automatically included in the main program. If you need I/O in other units, you must import the io unit explicitly.
fn ropen(w : world, d : dhandle, f : bytes, flags : int) : (world,
handle);
This function will open a file in a read mode and return a handle to the file. w is the world token, d is the base directory that is used for file name lookup, f is the file name, flags is one of the open_flag_* flags. For this function, only the flag open_flag_no_follow is allowed. fn read(w : world, h : handle, size : int) : (world, bytes); Read the specified number of bytes from the handle. If end-of-file is detected, it returns less bytes. If not enough bytes is available (in case of a pipe or a character device), the function will sleep until the specified number of bytes is read.
fn read_partial(w : world, h : handle, size : int) : (world,
bytes);
Read the specified number of bytes from the handle. If not enough bytes are available, the function returns less bytes. If no bytes are available, the function sleeps until at least one byte is returned.
fn wopen(w : world, d : dhandle, f : bytes, flags : int, mode : int) :
(world, handle);
Opens a file in a write mode and return a handle to it. flags contain one or more of open_flag_append, open_flag_create, open_flag_must_create, open_flag_no_follow. open_flag_append specifies that we want to append to the file rather than overwrite it, open_flag_create specifies that we want to create the file if it doesn’t exist, open_flag_must_create specifies that we want to fail with an error if the file exists, open_flag_no_follow suppresses the dereferencing of the last symlink in a file name. mode represents the permissions of the file if it is created, it may be open_mode_ro_current_user, open_mode_ro_all_users, open_mode_rw_current_user, open_mode_read_all_users, open_mode_default or other value. fn write(w : world, h : handle, s : bytes) : world; Writes the bytes to the write handle. fn wcontiguous(w : world, h : handle, size : int64) : world; Allocates a contiguous space for size bytes. Only some of the operating systems (Linux and OS/2) support preallocation of file data. If the operating system doesn’t support it, the function returns with success and does nothing. fn pipe(w : world) : (world, handle, handle); Creates a pipe and returns two handles. The first handle is used for reading from the pipe and the second handle is used for writing to the pipe.
fn bopen(w : world, d : dhandle, f : bytes, flags : int, mode : int) :
(world, handle);
Opens a file in a block mode. flags is a combination of open_flag_*. mode represents the permissions of the file if it is created.
fn bread(w : world, h : handle, position : int64, size : int) :
(world, bytes);
Read bytes from the specified position. If end-of-file is encountered, the function returns less bytes.
fn bwrite(w : world, h : handle, position : int64, s : bytes) :
world;
Write bytes to the specified position. If we write beyond file end, the file is extended. fn bsize(w : world, h : handle) : (world, int64); Returns the size of the file. fn bdata(w : world, h : handle, off : int64) : (world, int64); Skips over a hole in the file. Returns a next offset where some data are allocated. Some filesystems do not support holes; for them this function returns off. fn bhole(w : world, h : handle, off : int64) : (world, int64); Skips over data in the file. Returns a next offset where there is a hole in the file. Some filesystems do not support holes; for them this function returns the size of the file. fn bsetsize(w : world, h : handle, size : int64) : world; Truncates a file to the specified size or extends it.
fn bcontiguous(w : world, h : handle, pos : int64, size : int64) :
world;
Allocates a contiguous space at the specified offset. Some operating systems do not support file preallocation, for them, this function return success without doing anything.
fn bclone(w : world, src_h : handle, src_pos : int64, dst_h : handle,
dst_pos : int64, size : int64) : world;
Clones a byte range from src_h starting at src_pos to dst_h starting at dst_pos. Only some filesystems support cloning, if the filesystem doesn’t support it, an error is reported.
fn droot(w : world) : dhandle; Returns a handle to the root directory. On Windows or OS/2, it returns a handle to C:\. fn dnone(w : world) : dhandle; Returns an invalid directory handle. It is useful if you want to open a file and you know that the file path is absolute — in this case, you can pass dnone() to ropen, wopen, bopen or dopen.
fn dopen(w : world, d : dhandle, f : bytes, flags