Thank you for the questions and kicking this discussion off @ryanofsky! I’ll update the PR description with a better motiviation re. C vs C++ header, but will also try to answer your questions here.
This seems to offer a lot of nice features, but can you explain the tradeoffs of wrapping the C++ interface in C instead of using C++ from rust directly? It seems like having a C middle layer introduces a lot of boilerplate, and I’m wondering if it is really necessary. For example it seems like there is a rust cxx crate (https://docs.rs/cxx/latest/cxx/, https://chatgpt.com/share/dd4dde59-66d6-4486-88a6-2f42144be056) that lets you call C++ direct…
Thank you for the questions and kicking this discussion off @ryanofsky! I’ll update the PR description with a better motiviation re. C vs C++ header, but will also try to answer your questions here.
This seems to offer a lot of nice features, but can you explain the tradeoffs of wrapping the C++ interface in C instead of using C++ from rust directly? It seems like having a C middle layer introduces a lot of boilerplate, and I’m wondering if it is really necessary. For example it seems like there is a rust cxx crate (https://docs.rs/cxx/latest/cxx/, https://chatgpt.com/share/dd4dde59-66d6-4486-88a6-2f42144be056) that lets you call C++ directly from Rust and avoid the need for C boilerplate. It looks like https://cppyy.readthedocs.io/en/latest/index.html is an even more full-featured way of calling c++ from python.
It is true that the interoperability between C++ and Rust has become very good. In fact there is someone working on wrapping the entirety of Bitcoin Core in Rust: https://github.com/klebs6/bitcoin-rs.
During the last Core Dev meeting in Berlin I also asked if a C API were desirable in the first place (notes here) during the libbitcoinkernel session. I moved forward with this implementation, because the consensus at the time with many contributors in the room was that it was desirable. The reasons for this as discussed during the session at the meeting can be briefly summarised:
- Shipping a shared library with a C++ header is hard
- Mature and well-supported tooling for integrating C exists for nearly every popular language.
- C offers a reasonably stable ABI
So if we want the broadest possible support, across as many languages as possible with both dynamic and statically compiled libraries, a C header is the go-to option. I’m speculating here, but a C++ header might also make future standard version bumps and adoption of new standard library features harder. If having some trade-offs with compatibility, library portability, and language support is acceptable, a C++ header might be acceptaple though. It would be nice to hear more reviewers give their opinions here.
I’d also like to add that two libraries that we use and depend on in this project, minisketch and zeromq, use the same pattern. They are C++ codebases, that only expose a C API that in both instances can be used with a C++ RAII wrapper. So there is precedent in the free software ecosystem for doing things this way.
The quality of C++ language interop seems to vary a lot between languages. Python and Rust seem to have decent support, ziglang on the other hand has no support for C++ bindings. JVM family languages are a bit hit and miss, and many of the common academic and industrial data analysis languages, like Julia, R, and Matlab have no support for direct C++ bindings. The latter category should not be disregarded as potential future users, since this library might be useful to access Bitcoin Core data for data analysis projects.
Another drawback of going through a C API seems like not just increased boilerplate, but reduced safety. For example, the implementation is using reinterpret_cast everywhere
I feel like the reduced type safety due to casting is bit of a red herring. The type casting can be harder to abuse if you always use a dedicated helper function for interpreting passed in data types (as I believe is implemented here). Casting is also a pattern used in many other projects; both minisketch and libzmq use similar type casts extensively. It should definitely be possible to scrutinize the API in this PR to a point where it offers decent safety to its users as well as contributors to and maintainers of this code base.
The concerns around boilerplate are more serious in my view, but at least with the current internal code and headers I feel like exposing a safe C++ API is not trivial either. The current headers do not lend themselves to it well, for example through tricky locking mechanics, exposing boost types, or confusing lifetimes. There also comes a point where we should probably stop extensively refactoring internal code for the kernel. I’ve heard some voices during the last two Core Dev meetings with concerns that the kernel project might turn the validation code into an extensive forever building site. Having some boilerplate and glue to abstract some the ugliness and make it safe seems like an acceptable solution for this dilemma. If this means boilerplate is required anyway, I would personally prefer a C API.
Some of the boilerplate-y duplicate definitions in the header could be dropped again eventually if some of the enums are moved to C-style enums instead of class enum. As long as they are properly namespaced, I don’t see a big drawback for this. Similarly, some of the structs could be defined in a way where they can be used on both sides using pimpl or similar idioms. All in all, most of these translations seem very straightforward.
It might be interesting to see how some of the RPC methods could be re-implemented using the kernel header. There have been some RPC implementation bugs over the years that were due to unsafe usage of our internal code within the method implementations. Using the kernel header instead might make this safer and reduce boilerplate. To be clear, I am not suggesting replacing the implementations, but separately re-implementing some of them to show where the kernel header might shine.
it seems like the exposed C functions use a kernel_ErrorCode enum type with the union of every possible error type, so callers don’t have a way to know which functions can return which errors.
We have disagreed on the design of this before. If I understood you correctly, consolidating all error codes into a single enumeration was one of the reasons you opened your version for handling fatal errors in the kernel: #29700 as an alternative to my original: #29642. I am still a bit torn by the two approaches. I get that it may be useful to exactly see which errors may be encountered by invoking a certain routine, but at the same time I get the feeling this often ends up splintering the error handling to the point where you end up with a catch all approach after all. I also think that it is nice to have a single, central list for looking up all error codes and defining some routines for handling them in close proximity to their definition. It would be nice to finally hear some more voices besides the two of us discussing this. real-or-random has recently provided some good points on error handling in the libsecp silent payments pr (that I mostly did not adopt in this PR) and argues that most error codes are not useful to the user. As mentioned in the description, error handling is a weak spot of this pull request and I would like to improve it.