I convert my Vulkan test program from C to OCaml and compare the results, then continue the Vulkan tutorial in OCaml, adding 3D, textures and depth buffering.
Table of Contents
I convert my Vulkan test program from C to OCaml and compare the results, then continue the Vulkan tutorial in OCaml, adding 3D, textures and depth buffering.
Table of Contents
- Introduction
- Running it yourself
- The direct port
- Labelled arguments
- Enums and bit-fields
- Optional fields
- Loading shaders
- Logging
- Error handling
- Refactored version
- Olivine wrappers
- Using fibers / effects for control flow
- Using the CPU and GPU in parallel
- Resizing and resource lifetimes
- The 3D version
- Garbage collection
- Conclusions
Introduction
In Investigating Linux graphics, I wrote a little C program to help me learn about GPUs by drawing a triangle. But I wondered if using OCaml instead would make my life easier. It didn’t, because there were no released OCaml Vulkan bindings, but I found some unfinished ones by Florian Angeletti. The bindings are generated mostly automatically from the Vulkan XML specification, and with a bit of effort I got them working well enough to continue with the Vulkan tutorial, which resulted in this nice Viking room:
In this post, I’ll be looking at how the C code compares to the OCaml. First, I did a direct line-by-line port of the C, then I refactored it to take better advantage of OCaml.
(Note: the Vulkan tutorial is actually using C++, but I’m comparing my C version to OCaml)
Running it yourself
If you want to try it yourself (note: it requires Wayland):
git clone https://github.com/talex5/vulkan-test -b ocaml
cd vulkan-test
nix develop
dune exec -- ./src/main.exe 200
As the OCaml Vulkan bindings (Olivine) are unreleased, I included a copy of my patched version in vendor/olivine
. The dune exec
command will build them automatically.
The ocaml
branch above just draws one triangle. If you want to see the 3D room pictured above, use ocaml-3d
instead:
git clone https://github.com/talex5/vulkan-test -b ocaml-3d
cd vulkan-test
nix develop
make download-example
dune exec -- ./src/main.exe 10000 viking_room.obj viking_room.png
The direct port
Porting the code directly, line by line, was pretty straight-forward:
The code ended up slightly shorter, but not by much:
28 files changed, 1223 insertions(+), 1287 deletions(-)
This is only approximate; sometimes I added or removed blank lines, etc. Some things were a bit easier and others a bit harder. It mostly balanced out.
As an example, one thing that makes the OCaml shorter is that arrays are passed as a single item, whereas C takes the length separately. On the other hand, single-item arrays can be passed in C by just giving the address of the pointer, whereas OCaml requires an array to be constructed separately. Also, I had to include some bindings for the libdrm C library.
Labelled arguments
The OCaml bindings use labelled arguments (e.g. the VK_TRUE
argument in the screenshot above became ~wait_all:true
in the OCaml), which is longer but clearer.
The OCaml code uses functions to create C structures, which looks pretty similar due to labels. For example:
``` | |
1 | |
2 | |
3 | |
4 | |
5 |
|
const VkSemaphoreGetFdInfoKHR get_fd_info = {
.sType = VK_STRUCTURE_TYPE_SEMAPHORE_GET_FD_INFO_KHR,
.semaphore = semaphore,
.handleType = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT,
};
becomes:
| | |
| - | - |
| ```
1
2
3
4
``` | ```
let get_fd_info = Vkt.Semaphore_get_fd_info_khr.make ()
~semaphore
~handle_type:Vkt.External_semaphore_handle_type_flags.sync_fd
in
``` |
An advantage is that the `sType` field gets filled in automatically\.
### Enums and bit-fields
Enumerations and bit-fields are namespaced, which is a lot clearer as you can see which part is the name of the enum and which part is the particular value\. For example, `VK_ATTACHMENT_STORE_OP_STORE` becomes `Vkt.Attachment_store_op.Store`\. Also, OCaml usually knows the expected type and you can omit the module, so:
| | |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10
``` | ```
VkAttachmentDescription colorAttachment = {
.format = format,
.samples = VK_SAMPLE_COUNT_1_BIT,
.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR, // Clear framebuffer before rendering
.storeOp = VK_ATTACHMENT_STORE_OP_STORE,
.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
.finalLayout = VK_IMAGE_LAYOUT_GENERAL,
};
``` |
becomes
| | |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10
``` | ```
let color_attachment = Vkt.Attachment_description.make ()
~format:format
~samples:Vkt.Sample_count_flags.n1
~load_op:Clear (* Clear framebuffer before rendering *)
~store_op:Store
~stencil_load_op:Dont_care
~stencil_store_op:Dont_care
~initial_layout:Undefined
~final_layout:General
in
``` |
Bit-fields and enums get their own types \(they're not just integers\), so you can't use them in the wrong place or try to combine things that aren't bit-fields \(and so the `_BIT` suffix isn't needed\)\. One particularly striking example of the difference is that
| | |
| - | - |
| ```
1
2
3
4
5
``` | ```
.colorWriteMask =
VK_COLOR_COMPONENT_R_BIT |
VK_COLOR_COMPONENT_G_BIT |
VK_COLOR_COMPONENT_B_BIT |
VK_COLOR_COMPONENT_A_BIT,
``` |
becomes
| | |
| - | - |
| ```
1
``` | ```
~color_write_mask:Vkt.Color_component_flags.(r + g + b + a)
``` |
The `Vkt.Color_component_flags.(...)` brings all the module's symbols into scope, including the `+` operator for combining the flags\.
### Optional fields
The specification says which fields are optional\. In C you can ignore that, but OCaml enforces it\. This can be annoying sometimes, e\.g\.
| | |
| - | - |
| ```
1
``` | ```
.blendEnable = VK_FALSE,
``` |
becomes
| | |
| - | - |
| ```
1
2
3
4
5
6
7
``` | ```
~blend_enable:false
~src_color_blend_factor:One
~dst_color_blend_factor:Zero
~color_blend_op:Add
~src_alpha_blend_factor:One
~dst_alpha_blend_factor:Zero
~alpha_blend_op:Add
``` |
because the spec says these are all non-optional, rather than that they are only needed when blending is enabled\.
There's a similar situation with the Wayland code: the OCaml compiler requires you to provide a handler for all possible events\. For example, OCaml forced me to write a handler for the window `close` event \(and so closing the window works in the OCaml version, but not in the C one\)\. Likewise, if the compositor returns an error from `create_immed` the OCaml version logs it, while the C version ignored the error message, because the C compiler didn't remind me about that\.
### Loading shaders
Loading the shaders was easier\. The C version has code to load the shader bytecode from disk, but in the OCaml I used [ppx\_blob](https://github.com/johnwhitington/ppx_blob) to include it at compile time, producing a self-contained executable file:
| | |
| - | - |
| ```
1
``` | ```
load_shader_module device [%blob "./vert.spv"]
``` |
### Logging
OCaml has a somewhat standard logging library, so I was able to get the logs messages shown as I wanted without having to pipe the output through `awk`\. And, as a bonus, the log messages get written in the correct order now\. e\.g\. the C libwayland logs:
wl_display#1.delete_id(3) … wl_callback#3.done(59067)
which appears to show a callback firing some time after it was deleted, while [ocaml-wayland](https://github.com/talex5/ocaml-wayland) logs:
<- wl_callback@3.done callback_data:1388855 <- wl_display@1.delete_id id:3
### Error handling
The OCaml bindings return a `result` type for functions that can return errors, using polymorphic variants to say exactly which errors can be returned by each function\. That's clever, but I found it pretty useless in practice and I followed the Olivine example code in immediately turning every `Error` result into an exception\. You can then handle errors at a higher level \(unlike the C, which just calls `exit`\)\. Maybe Olivine should be changed to do that itself\.
I thought I'd been rigorous about checking for errors in the C, but I missed some places \(e\.g\. `vkMapMemory`\)\. The OCaml compiler forced me to handle those too, of course\.
## Refactored version
One reason to switch to OCaml was because I was finding it hard to see how all the C code fit together\. I felt that the overall structure was getting lost in the noise\. While the initial OCaml version was similar to the C, I think [the refactored version](https://github.com/talex5/vulkan-test/tree/ocaml/src) is quite a bit easier to read\.
Moving code to separate files is much easier than in C\. There, you typically need to write a header file too, and then include it from the other files\. But in the OCaml I could just move e\.g\. `export_semaphore` to `export` in a new file called `semaphore.ml` and refer to it as `Semaphore.export`\. Because each file gets its own namespace, you don't have to guess where functions are defined, and you don't get naming conflicts between symbols in different files\. The build system \(dune\) automatically builds all modules in the correct order\.
### Olivine wrappers
I added a `vulkan` directory with wrappers around the auto-generated Vulkan functions with the aim of removing some noise\. For example, the wrappers take OCaml lists and convert them to C arrays as needed, and raise exceptions on error instead of returning a result type\.
Sometimes they do more, as in the case of `queue_submit`\. That took separate `wait_semaphores` and `wait_dst_stage_mask` arrays, requiring them to be the same length\. By taking a list of tuples, the wrapper avoids the possibility of this error\. The old submit code:
| | |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10
11
``` | ```
let wait_semaphores = Vkt.Semaphore.array [t.image_available] in
let wait_stages = [Vkt.Pipeline_stage_flags.color_attachment_output] in
let submit_info = Vkt.Submit_info.make ()
~wait_semaphores
~wait_dst_stage_mask:(A.of_list Vkt.Pipeline_stage_flags.ctype wait_stages)
~command_buffers:(Vkt.Command_buffer.array [t.command_buffer])
~signal_semaphores:(Vkt.Semaphore.array [frame_state.render_finished])
in
Vkc.queue_submit t.graphics_queue ()
~submits:(Vkt.Submit_info.array [submit_info])
~fence:t.in_flight_fence <?> "queue_submit";
``` |
becomes:
| | |
| - | - |
| ```
1
2
3
4
``` | ```
Vulkan.Cmd.submit device t.command_buffer
~wait:[t.image_available, Vkt.Pipeline_stage_flags.color_attachment_output]
~signal_semaphores:[frame_state.render_finished]
~fence:t.in_flight_fence;
``` |
Sometimes the new API drops features I don't use \(or don't currently understand\)\. For example, my new `submit` only lets you submit one command buffer at a time \(though each buffer can have many commands\)\.
I moved various generic helper functions like `find_memory_type` to the wrapper library, getting them out of the main application code\.
Separating out these libraries made the code longer, but I think it makes it easier to read:
20 files changed, 843 insertions(+), 663 deletions(-)
### Using fibers / effects for control flow
The C code has a single thread with a single stack, using callbacks to redraw when the compositor is ready\. OCaml has fibers \(light-weight cooperative threads\), so we can use a plain loop:
| | |
| - | - |
| ```
1
2
3
4
5
6
``` | ```
while t.frame < frame_limit do
let next_frame_due = Window.frame window in
draw_frame t;
Promise.await next_frame_due;
t.frame <- t.frame + 1
done
``` |
The `Promise.await` suspends this fiber, allowing e\.g\. the Wayland code to handle incoming events\. I find that makes the logic easier to follow\.
### Using the CPU and GPU in parallel
Next I split off the input handling from the huge `render.ml` file into [input\.ml](https://github.com/talex5/vulkan-test/tree/ocaml/src/input.ml)\.
The Vulkan tutorial creates one uniform buffer for the input data for each frame-buffer, but this seems wasteful\. I think we only need at most two: one for the GPU to read, and one for the CPU to write for the next frame, if we want to do that in parallel\.
To allow this parallel operation I also had to create a pair of command buffers\. The [duo\.ml](https://github.com/talex5/vulkan-test/tree/ocaml/src/duo.ml) module holds the two \(input, command-buffer\) jobs and swaps them on submit\.
### Resizing and resource lifetimes
When the window size changes we need to destroy the old swap-chain and recreate all the images, views and framebuffers\. My C code didn't bother, and just kept things at 640x480\.
The main problem here is how to clean up the old resources\. We could use the garbage collector, but the framebuffers are rather large and I'd like to get them freed promptly\. Also, Vulkan requires things to be freed in the correct order, which the GC wouldn't ensure\.
I added code to free resources by having each constructor take a `sw` switch argument\. When the switch is turned off, all resources attached to it are freed\. That makes it easy to scope things to the stack: when the `Switch.run` block ends, all resources it created are freed\.
But the life-cycle of the swap-chain is a little complicated\. I don't want to clutter the main application loop with the logic of adapting to size changes\. Again, OCaml's fibers system makes it easy to have multiple stacks so I have another fiber run:
| | |
| - | - |
| ```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
``` | ```
let render_loop t duo =
while true do
let geometry = Window.geometry t.window in
Switch.run @@ fun sw ->
let framebuffers = create_swapchain ~sw t geometry in
while geometry = Window.geometry t.window do
let fb = Vulkan.Swap_chain.get_framebuffer framebuffers in
let redraw_needed = next_as_promise t.redraw_needed in
let job = Duo.get duo in
record_commands t job fb;
Duo.submit duo fb job.command_buffer;
Window.attach t.window ~buffer:fb.wl_buffer;
Promise.await redraw_needed
done
done
``` |
The C code created a fixed set of 4 framebuffers on each resize, but the OCaml only creates them as needed\. When dragging the window to resize that means we may only need to create one at each size, and when keeping a steady size, it seems I only need 3 framebuffers with Sway\.
The main loop changes slightly so that it just triggers the `render_loop` fiber:
| | |
| - | - |
| ```
1
2
3
4
5
6
``` | ```
while render.frame < frame_limit do
let next_frame_due = Window.frame window in
Render.trigger_redraw render;
Promise.await next_frame_due;
render.frame <- render.frame + 1
done
``` |
I'm not sure if freeing the framebuffers immediately is safe, since in theory the GPU might still be using them if the display server requests a new frame at a new size before the previous one has finished rendering on the GPU\. Possibly freed OCaml resources should instead get added to a list of things to free on the C side the next time the GPU is idle\.
## The 3D version
Although it looks a lot more impressive, the [3D version](https://github.com/talex5/vulkan-test/commit/ocaml-3d) isn't that much more work than the 2D triangle\.
I used the [Cairo](https://github.com/Chris00/ocaml-cairo) library to load the PNG file with the textures and then added a Vulkan *sampler* for it\. The shader code has to be modified to read the colour from the texture\. The most complex bit is that the texture needs to be copied from Cairo's memory to host memory that's visible to the GPU, and from there to fast local memory on the GPU \(see [texture\.ml](https://github.com/talex5/vulkan-test/tree/ocaml-3d/src/texture.ml)\)\.
Other changes needed:
- There's a bit of matrix stuff to position the model and project it in 3D\.
- I added [obj\_format\.ml](https://github.com/talex5/vulkan-test/tree/ocaml-3d/src/obj_format.ml) to parse the model data\.
- The pipeline adds a depth buffer so near things obscure things behind them, regardless of the drawing order\.
I didn't get my C version to do the 3D bits, but for comparison here's the Vulkan tutorial's official [C\+\+ version](https://docs.vulkan.org/tutorial/latest/_attachments/28_model_loading.cpp)\.
## Garbage collection
To render smoothly at 60Hz, we have about 16ms for each frame\. You might wonder if using a garbage collector would introduce pauses and cause us to miss frames, but this doesn't seem to be a problem\.
In C, you can improve performance for frame-based applications by using a [bump allocator](https://en.wikipedia.org/wiki/Region-based_memory_management):
1. Create a fixed buffer with enough space for every allocation needed for one frame\.
1. Allocate memory just by allocating sequentially in the region \(bumping the next-free-address pointer\)\.
1. At the end of each frame, reset the pointer\.
This makes allocation really fast and freeing things at the end costs nothing\. Implementing this in C requires special code, but OCaml works this way by default, allocating new values sequentially onto the *minor heap*\. At the end of each frame, we can call `Gc.minor` to reset the heap\.
`Gc.minor` scans the stack looking for pointers to values that are still in use and copies any it finds to the *major heap*\. However, since we're at the end of the frame, the stack is pretty much empty and there's almost nothing to scan\. I captured a trace of running the 3D room version with a forced minor GC at the end of every frame:
make && eio-trace run ./_build/default/src/main.exe
[Tracing the full 3D version](https://roscidus.com/blog/images/vulkan-ocaml/trace-3d.png)
The four long grey horizontal bars are the main fibers\. From top to bottom they are:
- The main application loop \(incrementing the frame counter and triggering the render loop fiber\)\.
- An [ocaml-wayland](https://github.com/talex5/ocaml-wayland) fiber, receiving messages from the display server \(and spawning some short-lived sending fibers\)\.
- The `render_loop` fiber \(sending graphics commands to the GPU\)\.
- A fiber used internally by the IO system\.
The green sections show when each fiber is running and the yellow background indicates when the process is sleeping\. The thin red columns indicate time spent in GC \(which we're here triggering after every frame\)\.
If I remove the forced `Gc.minor` after each frame then the GC happens less often, but can take a bit longer when it does\. Still not nearly long enough to miss the deadline for rendering the frame though\.
Collection of the major heap is done incrementally in small slices and doesn't cause any trouble\.
So, we're only using a tiny fraction of the available time\. Also, I suspect the CPU is running in a slow power-saving mode due to all the sleeping; if we had more work to do then it would probably speed up\.
## Conclusions
Doing Vulkan programming in OCaml has advantages \(clearer code, easier refactoring\), but also disadvantages \(unfinished and unreleased Vulkan bindings, some friction using a C API from OCaml, and I had to write more support code, such as some bindings for libdrm\)\.
As a C API, Vulkan is not safe and will happily segfault if passed incorrect arguments\. The OCaml bindings do not fix this, and so care is still needed\. I didn't bother about that because it wasn't a problem in practice, and properly protecting against use-after-free will probably require some changes to OCaml \(e\.g\. [unmapping memory](https://github.com/ocaml/ocaml/pull/389) isn't safe without something like the "modes" being prototyped in [OxCaml](https://oxcaml.org/)\)\.
I'm slowly upstreaming my changes to Olivine; hopefully this will all be easier to use one day\!