Playing Around with ARM Assembly

I started out programming in C, but had no idea what I was doing at the time. Since then, I’ve gone on to a career as a backend developer. But I’ve always missed the simplicity of C, and the awareness of memory and hardware that it gives you.

For fun, I decided to dive back into low level programming, and write some C mixed with some handwritten assembly.

Like any good side project, I started out by spending hours setting up my dev environment so that everything was just right.

I’m more familiar with bash over Make, even though I know Makefiles are very popular in the C community.

After several iterations I ended up with a very simple bash script containing all of the dev commands needed:

test - build and run test suite
fmt - autoformatter
clean - delete build artifacts…

For fun, I decided to dive back into low level programming, and write some C mixed with some handwritten assembly.

Like any good side project, I started out by spending hours setting up my dev environment so that everything was just right.

I’m more familiar with bash over Make, even though I know Makefiles are very popular in the C community.

After several iterations I ended up with a very simple bash script containing all of the dev commands needed:

test - build and run test suite
fmt - autoformatter
clean - delete build artifacts

This isn’t going to production or anything, so that’s all I need!

Here’s the “test” command I use to build and run my test suite:

test() {
clean
echo "\n---Building and running tests---\n"
mkdir -p $BUILD_DIR
$CC $CFLAGS -c $ASM_DIR/*.s
mv *.o $BUILD_DIR/
$CC $CFLAGS $SRC_DIR/*.c $TEST_DIR/*.c $BUILD_DIR/*.o -o $TARGET
./build/app
}

With this I can just run ‘test’ in my terminal, and it will build and link all of my C and assembly files, and then run the test executable!

One more fun note about my build process is the compiler flags I used.

I knew from past experience that the C community has a strange relationship with the C standard. Most languages have 1 major implementation, and most learning resources are centered around the newest version of that language (ok not SQL you got me there). Meanwhile C has many compiler implementations, and as far as I’m aware they all have some slight differences in what they allow and what features they support. Don’t even get me started on standard libraries, because I’m still confused by that situation. Anyways, it also seems like a lot of people prefer to use older versions of C, presumably for it’s portability. The classic version of C that I hear about most often is C89, so I decided to stick with that.

With my dev environment setup, I didn’t want to jump straight to assembly, so I started out by writing a few simple data structures in C.

You can see the full implementation for all data structures I made here: Github Repo

I won’t go over it in too much detail, but I am amazed at how productive C can be even today, by someone who admittedly is not very experienced with it. Especially when you consider the context of when the language was created, where lots of programmers handwrote assembly, I can’t imagine how productive it must have felt back then. I write a fair amount of Python, and difference in productivity between Python and C is much, much smaller than the difference between C and assembly. I feel like with more experience, I could start to become almost as productive as I am with Python. Of course, this ignores the realities of production where I would care a lot more about memory bugs and security issues. But while I’m in my honeymoon phase with C, it feels great!

Here’s a snippet of my hash map implementation:


typedef enum { OK, ERR } ResultTag;

typedef struct ResultInt {
ResultTag tag;
union {
int ok;
const char* err;
};
} ResultInt;

ResultInt map_get(Map map, int key) {
int index = _hash(key, (int)map.size);
ResultInt r;
MapNode* node = map.backing_array[index];
while (node != 0) {
if (node->key == key) {
r.tag = OK;
r.ok = node->val;
return r;
}
node = node->next;
}
r.tag = ERR;
r.err = "Not found";
return r;
}

This function finds the value for a given key. The hash map has a backing array that is initialized to a user provided size, and then to handle collisions, just adds nodes into a linked list. This code calculates the hash, checks the array, and walks the linked list until finding a value. It returns a tagged union so that the caller can see if the key was present before checking for the value. This leaves error handling up to the caller.

It’s not the most cache efficient hash map design. But it’s super simple since the array never needs to be resized.

After implementing a few data structures (stack, dynamic array, and hash map), I was ready for some assembly. In college I remember doing some projects in assembly, but we always ran it in emulators of some kind, so it never felt like I was really interacting with a CPU, it just felt pretend. It makes sense since getting every student access to the same hardware is a lot less cost effective than just using an emulator, but for this project I really wanted to poke a real CPU.

I had a couple false starts at first. I started out with inline assembly, which required me switching from c89 to gnu89 (apparently c89 doesn’t have inline assembly). But the compiler started yelling at me the moment I wanted to use labels for loops. So I switched back to c89, and after trying about 6 different things landed on the build script I shared earlier, that turns assembly files into object files and links them at build time. I’m working from an M series Macbook, so of course I have to use ARM assembly. Here was my first program:


.global _asm_add

_asm_add:
// x0 is return value
// x0, x1, etc are args in order
add x0, x0, x1
ret

Yep, it just adds two numbers together. This was a good first test to make sure I could build and run my project. The “.global” section exposes the “_asm_add” symbol to link against, allowing me to call this function from C code!

My next program was more ambitious, calculating the Nth fibonacci number!


.global _asm_fib

_asm_fib:

// fib numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34...
// so each number is the sum of the last two numbers
// all numbers 1 or below should return 1 (my rules)

mov x1, #0 // last number
mov x2, #1 // curr number

cmp x0, #1
ble exit

loop:
add x3, x1, x2
mov x1, x2
mov x2, x3

sub x0, x0, 1

cmp x0, #0
bgt loop

b exit
exit:
mov x0, x2
ret

Now I have loops, conditionals, we’re really in business! Poking at the hardware like this is really satisfying. I’m sure I have an off by one error here, but we’re just gonna say it’s Orion’s Fibonacci sequence because I’m happy with it. It was a challenge to get back into the mindset of writing assembly, but I got used to it faster than I thought. My next challenge was to implement the FNV hash algorithm on a string, which involves reading from memory. I simplified it a bit by not using the exact constants that wikipedia says to use since they were too big. So this is probably a sucky hash algorithm, but I still had fun! I’m guessing I have some overflow issues here depending on the input, I don’t really understand why this is a good hash algorithm, but that’s a later problem!


.global _asm_fnv_hash

_asm_fnv_hash:
// args
// x0 is pointer to string
// x1 is len of string

// we return a number

// x0 is the addr to the next char to read
// x1 is max byte addr to read
// x2 holds loaded bytes
// x3 holds result until the end

// addr of last byte to load
add x1, x1, x0
mov x4, #97
mov x3, #113
loop:

// load first byte
ldrb w2, [x0]

// increment address
add x0, x0, #1

mul x3, x3, x4
// extend to wider register
uxtb x2, w2
eor x3, x3, x2


// check loop end
cmp x0, x1
blt loop

mov x0, x3
ret

Anyways that’s all for now. Feel free to do whatever you want with my code. I encourage you to try to poke at your hardware a little bit if you’ve never done it. It’s a lot of fun!

Similar Posts