[Rpm-maint] [rpm-software-management/rpm] C++ transition, guidelines and todo (Issue #3103)

Panu Matilainen notifications at github.com
Fri May 17 07:15:22 UTC 2024


For the lack of better place and to allow commenting/questions, putting this here for now. Think of this as a forever work-in-progress doc on the subject.

Rpm being an old-school C project started in the nineties, transitioning it to anything resembling a modern C++ codebase will be a long road. We're doing this in multiple stages over years as time permits, with the initial priority being on *enabling* use of C++ STL types and algorithms internally throughout the code: stages 1 and 2. We are relative C++ newbies and will be missing tricks and best practises, so questions and suggestions are quite welcome, *but*, please understand the constraints we're living with:
- these guidelines are specifically for controlled, gradual transition towards C++, for this particular project. It's *not* how a new C++ project should be written.
- C++17 is the C++ standard versions we'll be using initially. We will upgrade the standard as tooling matures, but fussing and buzzing how this or that is done in C++23 is not helpful just now. I expect us to move to C++20 in a couple of years. There are several things to look forward in that.
- rpm needs to maintain a stable C API and ABI
- these guidelines are specicially for that

== Stage 1, native allocation ==

Convert heap-allocated data structures to use new/delete or STL containers instead of malloc/calloc/realloc/free. This is a practical requirement to allow those structs to contain non-trivial C++ types like strings and containers. 

Practicalities:
- things that API user needs to free() cannot be converted, new/delete must always pair up. This may not always not always be obvious at the outset: if we have fooNew() and fooFree() API, its okay to convert, but eg headerExport() returns a blob that must be free()'d, and sometimes the implementation leaks into the API in surprising ways.
- memset() cannot be used for initialization in C++, always use `new {}` to ensure similar effect (`mytype_s foo {}` for local structs)
- realloc() always calls for an STL container (or string, we realloc those a lot too)
- DON'T use raw C++ arrays (ie new[]/delete[]), they're just painful

STL notes:
- watch out when storing data that requires manual freeing in containers (see stage 2)
- vector is your Swiss army knife for transition work, in particular it exposes its raw data via non-const .data() which can be passed around to unsuspecting C code without modifications to keep changes local and minimal (as long as it's not free()'d)
- rpmhash is a flexible beast that supports multiple values per key and also set semantics, replacing rpmhash calls for one of unordered_multimap, unordered_map (depending on use), or unordered_set.
- when passing containers as arguments, prefer references over raw pointers (for safety, but also eg vector[index] becomes annoying otherwise)

== Stage 2, native objects ==

This is basically the flip-side of the stage 1 coin, the objective being to make rpm structs RAII compatible native objects and safe for use in STL containers. Meaning construcors/destructors and copy-control (see rule-of-five)

Practicalities:
- default-initialize the struct members in the struct declaration itself, this ensures you never end up with uninitialized members no matter what
- if unsure, mark it as deleted. The compiler will tell you if it needs it, but it will *not* tell you the default-generated method is unsafe for a given datatype

== Stage ??, fancy pants ==

Look at things like smart pointers. Rpm is full of pointers that we'd prefer automatically freed, and many of those are manually reference counted too. Many of them will also get passed to C which limits our options.

== Source (re)naming ==

We'll rename all the C++ sources to an appropriate extension somewhere around rpm 4.20 beta/rc release, at any rate during 2024. This is lessons learned from similar transition in gcc who only renamed things 10+ years after the fact. Having C++ in files called .c is painful for a whole number or reasons, including
- build systems will fight you back
- syntax highlighting will fight you back
- its just confusing

== C++ APIs ==

Sooner than later we'll need to start adding C++ native APIs to move things forward. Initially these will be internal only to allow us to learn the ropes before heading to open seas. 

Some immediate needs that have risen already and have a wide impact internally:
- macro expansion that accepts and returns STL strings
- argvSplit() and argvJoin() counterparts

Any new APIs should be written with RAII in mind. Macros make for a nice example: the macro context is mutex-protected, and each rpmExpand() call is forced to take and release a lock, and anything at all can happen between two expands. The sane way to do this is to acquire a local context object which is allows you to do your business and at the end of scope the variable goes out of scope and so does the lock.

== C++ hygiene ==

- change C standard library includes to C++ counterparts, eg `#include <stdlib.h>` to `#include <cstdlib>
- introduce rpm:: namespace to our own stuff
- figure out what to do with the struct foo_s mess in a way that's compatible with the C API, having rpmThisStruct_s::rpmThisStruct_s() and all gets really ugly and old real fast

== Useful references ==

- https://en.cppreference.com/ and https://cplusplus.com/ (the former is more thorough, the latter is easier to consume)
- https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines

== General stuff ==

Keep it simple, Sam. We're using C++ mainly for STL convenience, not because OMG OOP! Polymorphism is good when it's called for, but this is not anybody's Design Patterns master thesis. There are at most a handful of places in rpm that call for it. Don't use a C++ feature just because it exists, some of them are just terrible.  As the core guidelines shows, C++ is so big it's easier to tell what to do than what not to. I expect us to develop some more detailed guidelines over time.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/3103
You are receiving this because you are subscribed to this thread.

Message ID: <rpm-software-management/rpm/issues/3103 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20240517/657e771f/attachment.html>


More information about the Rpm-maint mailing list