A proof of concept of a semistable vector container
#include <semistable/vector.hpp>
#include <iostream>
int main()
{
semistable::vector<int> x = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
auto it = x.begin() + 5;
std::cout << *it << "\n"; // prints 5
x.erase(x.begin()); // erases first element
std::cout << *it << "\n"; // prints 5 again!
}
std::vector stores its contents in a contiguous block of memory, so mid insertions and erasures invalidate references and iterators to previous elements. semistable::vector is called semistable in the sense that, while references are still unstable, its iterators correctly track elements in situations like the above.
semistable::vector stores elements contiguously and provides the same API as std::vector with the extra guarante…
A proof of concept of a semistable vector container
#include <semistable/vector.hpp>
#include <iostream>
int main()
{
semistable::vector<int> x = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
auto it = x.begin() + 5;
std::cout << *it << "\n"; // prints 5
x.erase(x.begin()); // erases first element
std::cout << *it << "\n"; // prints 5 again!
}
std::vector stores its contents in a contiguous block of memory, so mid insertions and erasures invalidate references and iterators to previous elements. semistable::vector is called semistable in the sense that, while references are still unstable, its iterators correctly track elements in situations like the above.
semistable::vector stores elements contiguously and provides the same API as std::vector with the extra guarantee of iterator stability (including end()). The library is header-only and depends solely on Boost.Config. C++11 or later required.
Implementation
From the point of view of stability, there are three types of operation that cause iterators to become invalid in a classical std::vector:
- insertion of elements before a given position,
- erasure of elements before a given position,
- reallocation to a new buffer (e.g. with a call to
reserve).
When any of these operations happens, semistable::vector creates a new epoch descriptor indicating the change. Outstanding iterators internally point to the epoch that was current when they were last used. All arrows in the diagram are std::shared_ptrs:
When the iterators are used, they follow the chain of epoch descriptors till the last one, making the necessary adjustments to their stored position along the way. This ensures that dereference (as well as other iterator operations) are consistent with the current state of the vector:
When an epoch descriptor is outdated (all outstanding iterators are past it), it gets automatically deleted (no shared_ptr points to it any longer).
Performance
The graph shows normalized execution times of the following operations:
- traversal with
for_each, - repeated insertion at the end,
erase_ifof odd elements,- sorting of elements,
for std::vector<int>, std::list<int> and semistable::vector<int> with 0.5M elements. Results are normalized to the execution time of std::vector. Benchmarks compiled with clang-cl for Visual Studio 2022 in release mode.
Some observations:
semistable::vectoriterators provide araw()member function returning a plain pointer to the element (this is equivalent to calling std::to_address on the iterator). When usingraw()for traversal and sorting (that is,std::for_each(x.begin().raw(), x.end.raw(), ...),std::sort(x.begin().raw(), x.end.raw()), execution times are of course the same as withstd::vector. C++20 introduces the notion of contiguous iterators, which standard algorithms could in principle take advantage of by internally converting contiguous iterators to pointers for increased performance. In reality, though, no standard library implementation does that except for a handful of algorithms with a high chance of being eligible for autovectorization.std::list::sortis not entirely equivalent to sorting a vector (semistable or otherwise), as the former sorts nodes whereas the latter sorts values. So, astd::listiterator will point to the exact same value after sorting, which is not the case for vectors.
Limitations and potential extensions
Thread safety
Like standard C++ containers, semistable::vector const and const-like member functions are thread safe. Iterator usage, however, requires extra precautions:
- The same iterator object can’t be used concurrently in different threads, even for nominally const operations such as dereferencing (internally, thread-unsafe epoch traversal is triggered).
- An iterator can’t be used concurrently with any thread-unsafe operation on the
semistable::vectorit belongs in, even if the operation does not touch the piece of memory the iterator points to.
These limitations could in principle be avoided by modifying the library’s implementation to use atomic shared pointers.
Exception safety
Currently, iterator stability is not enforced if an exception is thrown other than by the allocator (typically, by a value_type construction or assignment operation). There’s no internal impediment to evolving the library so as to properly cover these cases.
Dormant iterators
If an iterator it is kept in the program and never touched while its semistable::vector is being modified, the associated epoch chain will grow undefinitely because its head can’t be garbage-collected as long as it points at it.
Invalidation detection
Much as with std::vector, using a semistable::vector iterator pointing to an erased element is still undefined behavior. The internal epoch machinery, however, could be easily leveraged so that those illegal uses are detected and signaled via an exception or some other mechanism.
Acknowledgements
Thanks to Dmitry Arkhipov for his help setting the CI for this project.