Vulkan Graphics in OCaml vs. C

I convert my Vulkan test program from C to OCaml and compare the results, then continue the Vulkan tutorial in OCaml, adding 3D, textures and depth buffering.

Table of Contents

Introduction
Running it yourself
The direct port
Labelled arguments
Enums and bit-fields
Optional fields
Loading shaders
Logging
Error handling
Refactored version
Olivine wrappers
Using fibers / effects for control flow
Using the CPU and GPU in parallel
[Resizing and resource lifeti…

I convert my Vulkan test program from C to OCaml and compare the results, then continue the Vulkan tutorial in OCaml, adding 3D, textures and depth buffering.

Table of Contents

Introduction
Running it yourself
The direct port
Labelled arguments
Enums and bit-fields
Optional fields
Loading shaders
Logging
Error handling
Refactored version
Olivine wrappers
Using fibers / effects for control flow
Using the CPU and GPU in parallel
Resizing and resource lifetimes
The 3D version
Garbage collection
Conclusions

Introduction

In Investigating Linux graphics, I wrote a little C program to help me learn about GPUs by drawing a triangle. But I wondered if using OCaml instead would make my life easier. It didn’t, because there were no released OCaml Vulkan bindings, but I found some unfinished ones by Florian Angeletti. The bindings are generated mostly automatically from the Vulkan XML specification, and with a bit of effort I got them working well enough to continue with the Vulkan tutorial, which resulted in this nice Viking room:

Vulkan tutorial in OCaml

In this post, I’ll be looking at how the C code compares to the OCaml. First, I did a direct line-by-line port of the C, then I refactored it to take better advantage of OCaml.

(Note: the Vulkan tutorial is actually using C++, but I’m comparing my C version to OCaml)

Running it yourself

If you want to try it yourself (note: it requires Wayland):

git clone https://github.com/talex5/vulkan-test -b ocaml
cd vulkan-test
nix develop
dune exec -- ./src/main.exe 200

As the OCaml Vulkan bindings (Olivine) are unreleased, I included a copy of my patched version in vendor/olivine. The dune exec command will build them automatically.

The ocaml branch above just draws one triangle. If you want to see the 3D room pictured above, use ocaml-3d instead:

git clone https://github.com/talex5/vulkan-test -b ocaml-3d
cd vulkan-test
nix develop
make download-example
dune exec -- ./src/main.exe 10000 viking_room.obj viking_room.png

The direct port

Porting the code directly, line by line, was pretty straight-forward:

Comparing the code with meld

The code ended up slightly shorter, but not by much:

28 files changed, 1223 insertions(+), 1287 deletions(-)

This is only approximate; sometimes I added or removed blank lines, etc. Some things were a bit easier and others a bit harder. It mostly balanced out.

As an example, one thing that makes the OCaml shorter is that arrays are passed as a single item, whereas C takes the length separately. On the other hand, single-item arrays can be passed in C by just giving the address of the pointer, whereas OCaml requires an array to be constructed separately. Also, I had to include some bindings for the libdrm C library.

Labelled arguments

The OCaml bindings use labelled arguments (e.g. the VK_TRUE argument in the screenshot above became ~wait_all:true in the OCaml), which is longer but clearer.

The OCaml code uses functions to create C structures, which looks pretty similar due to labels. For example:


```
1
2
3
4
5

| const VkSemaphoreGetFdInfoKHR get_fd_info = { .sType = VK_STRUCTURE_TYPE_SEMAPHORE_GET_FD_INFO_KHR, .semaphore = semaphore, .handleType = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT, };


becomes:


|   |   |
| - | - |
| ```
1
2
3
4

``` | ```
let get_fd_info = Vkt.Semaphore_get_fd_info_khr.make ()
~semaphore
~handle_type:Vkt.External_semaphore_handle_type_flags.sync_fd
in

``` |

An advantage is that the `sType` field gets filled in automatically\.

### Enums and bit-fields

Enumerations and bit-fields are namespaced, which is a lot clearer as you can see which part is the name of the enum and which part is the particular value\. For example, `VK_ATTACHMENT_STORE_OP_STORE` becomes `Vkt.Attachment_store_op.Store`\. Also, OCaml usually knows the expected type and you can omit the module, so:


|   |   |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10

``` | ```
VkAttachmentDescription colorAttachment = {
.format = format,
.samples = VK_SAMPLE_COUNT_1_BIT,
.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,  // Clear framebuffer before rendering
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
};

``` |

becomes


|   |   |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10

``` | ```
let color_attachment = Vkt.Attachment_description.make ()
~format:format
~samples:Vkt.Sample_count_flags.n1
~load_op:Clear	(* Clear framebuffer before rendering *)
~store_op:Store
~stencil_load_op:Dont_care
~stencil_store_op:Dont_care
~initial_layout:Undefined
~final_layout:General
in

``` |

Bit-fields and enums get their own types \(they're not just integers\), so you can't use them in the wrong place or try to combine things that aren't bit-fields \(and so the `_BIT` suffix isn't needed\)\. One particularly striking example of the difference is that


|   |   |
| - | - |
| ```
1
2
3
4
5

``` | ```
.colorWriteMask =
VK_COLOR_COMPONENT_R_BIT |
VK_COLOR_COMPONENT_G_BIT |
VK_COLOR_COMPONENT_B_BIT |
VK_COLOR_COMPONENT_A_BIT,

``` |

becomes


|   |   |
| - | - |
| ```
1

``` | ```
~color_write_mask:Vkt.Color_component_flags.(r + g + b + a)

``` |

The `Vkt.Color_component_flags.(...)` brings all the module's symbols into scope, including the `+` operator for combining the flags\.

### Optional fields

The specification says which fields are optional\. In C you can ignore that, but OCaml enforces it\. This can be annoying sometimes, e\.g\.


|   |   |
| - | - |
| ```
1

``` | ```
.blendEnable = VK_FALSE,

``` |

becomes


|   |   |
| - | - |
| ```
1
2
3
4
5
6
7

``` | ```
~blend_enable:false
~src_color_blend_factor:One
~dst_color_blend_factor:Zero
~color_blend_op:Add
~src_alpha_blend_factor:One
~dst_alpha_blend_factor:Zero
~alpha_blend_op:Add

``` |

because the spec says these are all non-optional, rather than that they are only needed when blending is enabled\.

There's a similar situation with the Wayland code: the OCaml compiler requires you to provide a handler for all possible events\. For example, OCaml forced me to write a handler for the window `close` event \(and so closing the window works in the OCaml version, but not in the C one\)\. Likewise, if the compositor returns an error from `create_immed` the OCaml version logs it, while the C version ignored the error message, because the C compiler didn't remind me about that\.

### Loading shaders

Loading the shaders was easier\. The C version has code to load the shader bytecode from disk, but in the OCaml I used [ppx\_blob](https://github.com/johnwhitington/ppx_blob) to include it at compile time, producing a self-contained executable file:


|   |   |
| - | - |
| ```
1

``` | ```
load_shader_module device [%blob "./vert.spv"]

``` |

### Logging

OCaml has a somewhat standard logging library, so I was able to get the logs messages shown as I wanted without having to pipe the output through `awk`\. And, as a bonus, the log messages get written in the correct order now\. e\.g\. the C libwayland logs:

wl_display#1.delete_id(3) … wl_callback#3.done(59067)

which appears to show a callback firing some time after it was deleted, while [ocaml-wayland](https://github.com/talex5/ocaml-wayland) logs:

<- wl_callback@3.done callback_data:1388855 <- wl_display@1.delete_id id:3

### Error handling

The OCaml bindings return a `result` type for functions that can return errors, using polymorphic variants to say exactly which errors can be returned by each function\. That's clever, but I found it pretty useless in practice and I followed the Olivine example code in immediately turning every `Error` result into an exception\. You can then handle errors at a higher level \(unlike the C, which just calls `exit`\)\. Maybe Olivine should be changed to do that itself\.

I thought I'd been rigorous about checking for errors in the C, but I missed some places \(e\.g\. `vkMapMemory`\)\. The OCaml compiler forced me to handle those too, of course\.

## Refactored version

One reason to switch to OCaml was because I was finding it hard to see how all the C code fit together\. I felt that the overall structure was getting lost in the noise\. While the initial OCaml version was similar to the C, I think [the refactored version](https://github.com/talex5/vulkan-test/tree/ocaml/src) is quite a bit easier to read\.

Moving code to separate files is much easier than in C\. There, you typically need to write a header file too, and then include it from the other files\. But in the OCaml I could just move e\.g\. `export_semaphore` to `export` in a new file called `semaphore.ml` and refer to it as `Semaphore.export`\. Because each file gets its own namespace, you don't have to guess where functions are defined, and you don't get naming conflicts between symbols in different files\. The build system \(dune\) automatically builds all modules in the correct order\.

### Olivine wrappers

I added a `vulkan` directory with wrappers around the auto-generated Vulkan functions with the aim of removing some noise\. For example, the wrappers take OCaml lists and convert them to C arrays as needed, and raise exceptions on error instead of returning a result type\.

Sometimes they do more, as in the case of `queue_submit`\. That took separate `wait_semaphores` and `wait_dst_stage_mask` arrays, requiring them to be the same length\. By taking a list of tuples, the wrapper avoids the possibility of this error\. The old submit code:


|   |   |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10
11

``` | ```
let wait_semaphores = Vkt.Semaphore.array [t.image_available] in
let wait_stages = [Vkt.Pipeline_stage_flags.color_attachment_output] in
let submit_info = Vkt.Submit_info.make ()
~wait_semaphores
~wait_dst_stage_mask:(A.of_list Vkt.Pipeline_stage_flags.ctype wait_stages)
~command_buffers:(Vkt.Command_buffer.array [t.command_buffer])
~signal_semaphores:(Vkt.Semaphore.array [frame_state.render_finished])
in
Vkc.queue_submit t.graphics_queue ()
~submits:(Vkt.Submit_info.array [submit_info])
~fence:t.in_flight_fence <?> "queue_submit";

``` |

becomes:


|   |   |
| - | - |
| ```
1
2
3
4

``` | ```
Vulkan.Cmd.submit device t.command_buffer
~wait:[t.image_available, Vkt.Pipeline_stage_flags.color_attachment_output]
~signal_semaphores:[frame_state.render_finished]
~fence:t.in_flight_fence;

``` |

Sometimes the new API drops features I don't use \(or don't currently understand\)\. For example, my new `submit` only lets you submit one command buffer at a time \(though each buffer can have many commands\)\.

I moved various generic helper functions like `find_memory_type` to the wrapper library, getting them out of the main application code\.

Separating out these libraries made the code longer, but I think it makes it easier to read:

20 files changed, 843 insertions(+), 663 deletions(-)

### Using fibers / effects for control flow

The C code has a single thread with a single stack, using callbacks to redraw when the compositor is ready\. OCaml has fibers \(light-weight cooperative threads\), so we can use a plain loop:


|   |   |
| - | - |
| ```
1
2
3
4
5
6

``` | ```
while t.frame < frame_limit do
let next_frame_due = Window.frame window in
draw_frame t;
Promise.await next_frame_due;
t.frame <- t.frame + 1
done

``` |

The `Promise.await` suspends this fiber, allowing e\.g\. the Wayland code to handle incoming events\. I find that makes the logic easier to follow\.

### Using the CPU and GPU in parallel

Next I split off the input handling from the huge `render.ml` file into [input\.ml](https://github.com/talex5/vulkan-test/tree/ocaml/src/input.ml)\.

The Vulkan tutorial creates one uniform buffer for the input data for each frame-buffer, but this seems wasteful\. I think we only need at most two: one for the GPU to read, and one for the CPU to write for the next frame, if we want to do that in parallel\.

To allow this parallel operation I also had to create a pair of command buffers\. The [duo\.ml](https://github.com/talex5/vulkan-test/tree/ocaml/src/duo.ml) module holds the two \(input, command-buffer\) jobs and swaps them on submit\.

### Resizing and resource lifetimes

When the window size changes we need to destroy the old swap-chain and recreate all the images, views and framebuffers\. My C code didn't bother, and just kept things at 640x480\.

The main problem here is how to clean up the old resources\. We could use the garbage collector, but the framebuffers are rather large and I'd like to get them freed promptly\. Also, Vulkan requires things to be freed in the correct order, which the GC wouldn't ensure\.

I added code to free resources by having each constructor take a `sw` switch argument\. When the switch is turned off, all resources attached to it are freed\. That makes it easy to scope things to the stack: when the `Switch.run` block ends, all resources it created are freed\.

But the life-cycle of the swap-chain is a little complicated\. I don't want to clutter the main application loop with the logic of adapting to size changes\. Again, OCaml's fibers system makes it easy to have multiple stacks so I have another fiber run:


|   |   |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

``` | ```
let render_loop t duo =
while true do
let geometry = Window.geometry t.window in
Switch.run @@ fun sw ->
let framebuffers = create_swapchain ~sw t geometry in
while geometry = Window.geometry t.window do
let fb = Vulkan.Swap_chain.get_framebuffer framebuffers in
let redraw_needed = next_as_promise t.redraw_needed in
let job = Duo.get duo in
record_commands t job fb;
Duo.submit duo fb job.command_buffer;
Window.attach t.window ~buffer:fb.wl_buffer;
Promise.await redraw_needed
done
done

``` |

The C code created a fixed set of 4 framebuffers on each resize, but the OCaml only creates them as needed\. When dragging the window to resize that means we may only need to create one at each size, and when keeping a steady size, it seems I only need 3 framebuffers with Sway\.

The main loop changes slightly so that it just triggers the `render_loop` fiber:


|   |   |
| - | - |
| ```
1
2
3
4
5
6

``` | ```
while render.frame < frame_limit do
let next_frame_due = Window.frame window in
Render.trigger_redraw render;
Promise.await next_frame_due;
render.frame <- render.frame + 1
done

``` |

I'm not sure if freeing the framebuffers immediately is safe, since in theory the GPU might still be using them if the display server requests a new frame at a new size before the previous one has finished rendering on the GPU\. Possibly freed OCaml resources should instead get added to a list of things to free on the C side the next time the GPU is idle\.

## The 3D version

Although it looks a lot more impressive, the [3D version](https://github.com/talex5/vulkan-test/commit/ocaml-3d) isn't that much more work than the 2D triangle\.

I used the [Cairo](https://github.com/Chris00/ocaml-cairo) library to load the PNG file with the textures and then added a Vulkan *sampler* for it\. The shader code has to be modified to read the colour from the texture\. The most complex bit is that the texture needs to be copied from Cairo's memory to host memory that's visible to the GPU, and from there to fast local memory on the GPU \(see [texture\.ml](https://github.com/talex5/vulkan-test/tree/ocaml-3d/src/texture.ml)\)\.

Other changes needed:

- There's a bit of matrix stuff to position the model and project it in 3D\.
- I added [obj\_format\.ml](https://github.com/talex5/vulkan-test/tree/ocaml-3d/src/obj_format.ml) to parse the model data\.
- The pipeline adds a depth buffer so near things obscure things behind them, regardless of the drawing order\.
I didn't get my C version to do the 3D bits, but for comparison here's the Vulkan tutorial's official [C\+\+ version](https://docs.vulkan.org/tutorial/latest/_attachments/28_model_loading.cpp)\.

## Garbage collection

To render smoothly at 60Hz, we have about 16ms for each frame\. You might wonder if using a garbage collector would introduce pauses and cause us to miss frames, but this doesn't seem to be a problem\.

In C, you can improve performance for frame-based applications by using a [bump allocator](https://en.wikipedia.org/wiki/Region-based_memory_management):

1. Create a fixed buffer with enough space for every allocation needed for one frame\.
1. Allocate memory just by allocating sequentially in the region \(bumping the next-free-address pointer\)\.
1. At the end of each frame, reset the pointer\.
This makes allocation really fast and freeing things at the end costs nothing\. Implementing this in C requires special code, but OCaml works this way by default, allocating new values sequentially onto the *minor heap*\. At the end of each frame, we can call `Gc.minor` to reset the heap\.

`Gc.minor` scans the stack looking for pointers to values that are still in use and copies any it finds to the *major heap*\. However, since we're at the end of the frame, the stack is pretty much empty and there's almost nothing to scan\. I captured a trace of running the 3D room version with a forced minor GC at the end of every frame:

make && eio-trace run ./_build/default/src/main.exe

[Tracing the full 3D version](https://roscidus.com/blog/images/vulkan-ocaml/trace-3d.png)

The four long grey horizontal bars are the main fibers\. From top to bottom they are:

- The main application loop \(incrementing the frame counter and triggering the render loop fiber\)\.
- An [ocaml-wayland](https://github.com/talex5/ocaml-wayland) fiber, receiving messages from the display server \(and spawning some short-lived sending fibers\)\.
- The `render_loop` fiber \(sending graphics commands to the GPU\)\.
- A fiber used internally by the IO system\.
The green sections show when each fiber is running and the yellow background indicates when the process is sleeping\. The thin red columns indicate time spent in GC \(which we're here triggering after every frame\)\.

If I remove the forced `Gc.minor` after each frame then the GC happens less often, but can take a bit longer when it does\. Still not nearly long enough to miss the deadline for rendering the frame though\.

Collection of the major heap is done incrementally in small slices and doesn't cause any trouble\.

So, we're only using a tiny fraction of the available time\. Also, I suspect the CPU is running in a slow power-saving mode due to all the sleeping; if we had more work to do then it would probably speed up\.

## Conclusions

Doing Vulkan programming in OCaml has advantages \(clearer code, easier refactoring\), but also disadvantages \(unfinished and unreleased Vulkan bindings, some friction using a C API from OCaml, and I had to write more support code, such as some bindings for libdrm\)\.

As a C API, Vulkan is not safe and will happily segfault if passed incorrect arguments\. The OCaml bindings do not fix this, and so care is still needed\. I didn't bother about that because it wasn't a problem in practice, and properly protecting against use-after-free will probably require some changes to OCaml \(e\.g\. [unmapping memory](https://github.com/ocaml/ocaml/pull/389) isn't safe without something like the "modes" being prototyped in [OxCaml](https://oxcaml.org/)\)\.

I'm slowly upstreaming my changes to Olivine; hopefully this will all be easier to use one day\!

Introduction

Running it yourself

The direct port

Labelled arguments

Similar Posts