Swift for C++ Practitioners: Flexible Array Members

Flexible array members are an odd feature of C that is intended to capture the pattern of a variable-length structure consisting of a header followed by an array of some element type. It’s a very specific pattern where the last element of a structure can be an array with no size information, like this:

struct Path {
unsigned num_points;
_Bool isClosed;
struct Point points[];
};

struct Point {
double x, y;
};

Flexible array members are not technically part of C++, but are available as an extension in most every C++ compiler because they’re common enough in C that you need them for compatibility with some C libraries.

I recently ran across some Swift code that was using a C API based on flexible array members. The c…

struct Path {
unsigned num_points;
_Bool isClosed;
struct Point points[];
};

struct Point {
double x, y;
};

I recently ran across some Swift code that was using a C API based on flexible array members. The code was necessarily laden with unsafe pointers, and I came across it because it was running into an issue with memory alignment. This post will walk through what flexible array members are and how to make use of them (or do something similar) in Swift. It will show off some of Swift’s unsafe pointer APIs along the way.

Spoiler: I ended up deciding that working with flexible array members was too hard, so I added a new module to the swift-collections package with some types that make it far easier to work with flexible array members. This post will introduce those new types as well.

Memory layout of structs with flexible array members

Structs with flexible array members are laid out “flat” in memory, as if it were a fixed-size array at the end of the struct. So if you had two points in the Path above, it would be laid out the same way as this struct:

struct Path2 {
unsigned num_points;
_Bool isClosed;
struct Point points[2];
};

The trick, of course, is that it’s not a fixed-sized array: C doesn’t know how long the array is, all it knows is that points is the beginning of an array of Point instances in memory. It’s up to you to figure out how to create such a variable-sized structure and to stay within the bounds.

Working with flexible array members in C

C structs with flexible array members can be pretty hard to use, even in C. You can’t “just” create a new instance of the Path structure again, because that won’t have storage for anything that goes into the array. The only valid length for the flexible array member in this C declaration:

struct Path my_path;

is zero. If you want to allocate for some number of points, you’re going to need to allocate the memory explicitly, like this:

struct Path *my_path = (struct Path*)malloc(sizeof(struct Path) + num_points * sizeof(struct Point));

Of course, you have to remember to free that later. If you don’t want to allocate on the heap, you can allocate storage on the stack using alloca... just remember not to use the pointer after you exit the current stack frame. There are other tricks, such as creating a local variable-length array, with similar restrictions to the alloc implementation.

Once you’ve allocated the pointer, you can access the elements of my_path->points within the bounds. Just be careful to never make a shallow copy of struct Path, because it won’t have space allocated for the elements.

Cheating with Swift inline arrays

Swift can model something like the Path struct using the new (in Swift 6.2) InlineArray type by making Path generic. The result would look like this:

struct FixedPath<let numPoints: Int> {
var isClosed: Bool
var points: [numPoints of Point]
}

This lets us create paths of different fixed lengths, e.g., FixedPath<2> or FixedPath<16>, using the same memory layout you would get from C types with inline fixed arrays. It’s nicer than the C Path2 type because at least it’s generic over the length, but it’s still the case that the length of the path needs to be a compile-time constant. So, while this approach is neat, it doesn’t actually solve the same problem as flexible array members.

Aside: There is a possible future for Swift where one can take a run-time value and plug it in as an argument to the FixedPath type. Swift already supports this for values with runtime type (any types, covered in part 5 on type erasure), but because this feature doesn’t exist today and would also have limitations in Embedded Swift, I’m going to ignore it.

The `ManagedBuffer` type

The Swift standard library provides a ManagedBuffer type that looks perfect for flexible array members. ManagedBuffer is a generic class that is parameterized over Header and Element types, like this:

class ManagedBuffer<Header, Element> where Element : ~Copyable { ... }

A ManagedBuffer instance is allocated by calling the static create method, which allocates enough storage for an instance of the Header type and some number of instances of Element. For our example path type, we’d do something like this:

struct PathHeader {
var isClosed: Bool
}

let myPath = ManagedBuffer<PathHeader, Point>.create(minimumCapacity: numPoints) { buffer in
return PathHeader(isClosed: false)
}
myPath[0] = Point(x: 3.14159, y: 2.71828)

ManagedBuffer is a class, so Swift provides automatic memory management (via reference counting) for its instances. The create method tail-allocates all of the storage, so it’s a single heap allocation (whereas having a separate array would require two allocations). You can subscript the instance to access the elements, access the header with the header property, and access the number of elements with the capacity property. Overall, it pretty much does what we want!

However, there are a few reasons why ManagedBuffer doesn’t quite work as a replacement for C flexible array members.

The layout of the managed buffer does not match that of a C flexible array member, so it cannot be used for interoperating with a C struct that has a flexible array member.
It is always allocated on the heap, which might not be acceptable in very low-level or high-performance code.
It uses reference counting, which might not be acceptable in very low-level or high-performance code.

Manually managing flexible array members, but in Swift

All of the ugly pointer manipulation we did in C, we can also do in Swift. It’ll look a bit like this:

let storage = UnsafeMutableRawPointer.allocate(
bytes: MemoryLayout<Path>.size + numPoints * MemoryLayout<Point>.stride,
alignment: MemoryLayout<Path>.alignment
)
let myPath = storage.assumingMemoryBound(to: Path.self) // UnsafeMutablePointer<Path>

The above is the Swift equivalent to malloc with the appropriate size (or, more specifically, the POSIX aligned_alloc that allows one to specify alignment as well), followed by a cast to a pointer to Path. UnsafeMutableRawPointer is effectively a void*. Then, assumingMemoryBound(to:) is a cast to an a typed unsafe pointer, e.g., UnsafeMutablePointer<Path>. If that looks uncharacteristically long-winded for Swift, it’s working as intended: unsafe pointer types in Swift area meant to be highly explicit.

Now, we want to get at the points stored after the type. We can do so like this:

let myPathPoints = storage.advanced(by: MemoryLayout<Path>.size)
.alignedUp(for: Point.self)

The advanced(by:) operation performs pointer arithmetic, so advancing by the size of the Path will go to the end of the value. Then, alignedUp(for:) will adjust the pointer to the next address that is suitable for a value of that type, so the pointer is ready to start an array. If you forget this second step and the alignment doesn’t work out, Swift will trap at runtime to indicate the bug.

Aside: In the size computations above, we used both the size and stride static properties in MemoryLayout. These two properties are related: size is the size of the object in memory, in bytes. stride is the size of the object plus any padding needed so that the next address in memory can point to another object of the same type, accounting for the type’s alignment. If we just want to store a a value of a type T, the size is sufficient. If we want an array of values of type T, we use stride, as in the allocation call above. Note that this distinction doesn’t exist in C, where structures automatically get padded out to their alignment, so sizeof in C is effectively the same as stride. C also doesn’t have the notion of a zero-length type like Swift does (try MemoryLayout<Void> or an empty structure to see what I mean).

Now, this alignment discussion points out a bug in the original allocation: we allocated storage for the Path (using its size) and the right number of Points (using their stride), but we didn’t account for the extra bytes that might be needed for alignment! To actually account for it takes a bit more work, like this:

let pathAlignment = MemoryLayout<Path>.alignment
let pointAlignment = MemoryLayout<Point>.alignment

// If the point requires greater alignment than the path, we need
// some buffer for alignment.
let padding: Int
if pointAlignment > pathAlignment {
padding = elementAlignment - headerAlignment
} else {
padding = 0
}

let storage = UnsafeMutableRawPointer.allocate(
bytes: MemoryLayout<Path>.stride + numPoints * MemoryLayout<Point>.stride + padding,
alignment: MemoryLayout<Path>.alignment
)

Here, if the Point‘s’ alignment is greater than that of the Path, we add the difference as additional padding bytes, in addition to allocating based on the stride of the Path. It can over-allocate, but that’s better than under-allocating and creating a use-after-free.

The `TrailingArray` type

Alignment math can be tricky, so it’s best to do it once, generally. The new TrailingArray type in the swift-collections package looks roughly like this:

public struct TrailingArray<Header: TrailingElements>: ~Copyable
where Header: ~Copyable
{
public typealias Element = Header.Element

public var header: Header { get set }
public subscript(index: Int) -> Element { get set }
}

The Header in this case would be used with a type like Path, which has some data of its own and is followed by an array of an element type. The TrailingElements protocol has two requirements: the element type, and a property that gets the number of trailing elements:

public protocol TrailingElements: ~Copyable {
associatedtype Element

var trailingCount: Int { get }
}

We can make our Path type conform to this protocol like this:

extension Path: TrailingElements {
typealias Element = Point
var trailingCount: Int { num_points }
}

Now we can create a TrailingArray<Path>, which includes both the Path instance (accessible via the header property) as well as the points that follow (via subscripting). For example:

var myPath = TrailingArray(
header: Path(num_points: 3, isClosed: false),
repeating: Point(x: 1, y: 1)
)

print(myPath.header.isClosed) // false
print(myPath[1].x)            // 1

The TrailingArray type is a noncopyable type (see my post on those). The init(header:repeating:) initializer called above will allocate memory on the heap for the header and its trailing elements, then free the memory in the deinit. Since it’s a noncopyable type, there is no runtime overhead for managing the memory, addressing one of my concerns with ManagedBuffer.

Temporary allocation on the stack

The C standard library has a function, alloca, that allocates from the stack. The Swift equivalent is called withUnsafeTemporaryAllocation, which provides stack allocation within the scope of a closure. If we were working directly with the underlying storage, we could use it like this:

withUnsafeTemporaryAllocation(
byteCount: MemoryLayout<Path>.stride + numPoints * MemoryLayout<Point>.stride + padding,
alignment: MemoryLayout<Path>.alignment
) { storage in
// storage is an UnsafeMutableRawBufferPointer usable only in this closure
}

The TrailingArray type makes this a little simpler with its withTemporaryValue static method:

TrailingArray<Path>.withTemporaryValue(
header: Path(num_points: 3, isClosed: false),
repeating: Point(x: 1, y: 1)
) { path in
// path is a TrailingArray<Path> usable within this closure
}

Interoperating with C APIs

The design of TrailingArray means that it can work with existing C structures that use flexible array members. The C structure can be extended to add the TrailingElements conformance, specifying the trailing element type and how to get the count. Additionally, there is a TrailingArray initializer that takes in pointers that would come from a C function, adopting it as the storage for the trailing array, like so:

public init(
consuming pointer: UnsafeMutablePointer<Header>,
storage: UnsafeMutableRawPointer
)

Usually pointer and storage will be the same (see the documentation). There’s also the opposite API, which consumes a TrailingArray without freeing the storage. Instead, it returns the header and storage pointers so they can be passed along to a C API.

public consuming func leakStorage() -> (
pointer: UnsafeMutablePointer<Header>,
storage: UnsafeMutableRawPointer
)

Wrap-up

Flexible array members are tricky, whether you are in C or in Swift. It’s possible to use them correctly in both languages, by carefully reasoning about pointer arithmetic and the alignment of the various types. The pointer arithmetic is rather verbose in Swift, which is deliberately intended to try to make all of these fundamentally-unsafe operations clear and obvious in the code. When doing pointer arithmetic like this, it’s best to encapsulate it into a safe type like TrailingArray that provides a nice interface over an efficient, low-level implementation.