[Rpm-maint] [rpm-software-management/rpm] Rust bindings for rpmlib (#429)
Tony Arcieri
notifications at github.com
Thu Apr 12 01:33:31 UTC 2018
Hello,
I've written a nascent Rust binding for rpmlib:
https://crates.io/crates/rpmlib
First of all, I would love any feedback. This is my first deep dive into the internals of RPM, and while I found ample documentation about certain things, a lot of what I was doing was guesswork.
I'm also not super up-to-speed on RPM nomenclature but I've largely tried to mirror the structure and terminology rpmlib itself uses. I've already gotten some feedback that it's "Red Hat", not "RedHat", and I probably shouldn't be calling it the "RedHat Package Manager", so it seems I could use some advice on appropriate project branding. Is "rpmlib" a good name for the Rust crate?
I'd also be interested in upstreaming this if there's interest. I'm presently the sole author and am happy to relicense everything under LGPL/GPL, sign CLAs, transfer copyright or what have you, as well as transfer control of the packages on https://crates.io
## Lifetimes
If there's one thing in particular I'd like clarification and feedback on it's my understanding rpmlib's memory model. Rust's claim to fame is the compiler's ability to prove properties about the lifetime relationships of objects in the program, and thereby provide safe "zero cost" (i.e. zero copy) abstractions which might otherwise be deemed "risky" in practically any other language due to the potential for use-after-free errors. I am trying to apply this approach in my Rust binding.
In RPM nomenclature I am using `HEADERGET_MINMEM` with the goal of directly accessing memory owned by RPM and thereby providing a zero-copy API but also providing safe abstractions for doing so. Doing that correctly involves describing the precise memory relationships to the Rust compiler.
First I'll say Rust's notion of memory safety applies to multithreaded programs, and in that regard I have just thrown a big mutex across the whole rpmlib FFI. In particular right now creating a transaction set also acquires the mutex and does not release it until it is complete. tl;dr: assume sequential/single-threaded access for now.
I've largely been following this guide which provided much of the lifetime information I was looking for for things like transaction sets and iterators:
https://docs-old.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/RPM_Guide/ch15s04.html
(The combination of `docs-old` and `Draft` doesn't bode particularly well, but...)
If I have one immediate takeaway: transaction sets everywhere! Which is something I can get on board with. In almost all cases (aside from reading the initial config/rpmrc and configuring macro contexts) everything seems to being with a transaction set, so that is the initial lifetime of importance from a Rust perspective.
Regarding that, the referenced documentation suggests things like:
> When you are done with a transaction set, call rpmtsFree:
> rpmts rpmtsFree(rpmts ts);
Rust has a trait that automatically frees things for you at the end of their lifetime called `Drop`, and it's the sort of RAIA pattern you might expect. I have implemented free for both transaction sets and iterators using this trait based on this documentation.
Where things got a bit hazy was actually using `HEADERGET_MINMEM`. Quoting the linked documentation:
> You do not need to free the Header returned by rpmdbNextIterator. Also, the next call to rpmdbNextIterator will reset the Header.
What I took this to mean, in Rustier nomenclature, is that the lifetime of a `Header` referenced by an iterator is valid until the next item is requested from the iterator. This is a bit different from a typical Rust `Iterator`, which provides a read-only view of a collection, where it's safe to have references to multiple items in the collection at once.
There is a Rust pattern for what I think this is describing though, which is a `StreamingIterator` and the one I chose to use:
https://docs.rs/streaming-iterator/
The idea of `StreamingIterator` is you iterate by borrowing a value from the iterator. Before you move on, you must give that value back. From a nitty gritty perspective, it actually does this by splitting up the iteration into two steps: one where you ask the iterator to mutate itself and "preload" the value into a buffer, and another where you immutably borrow that value from the iterator, with a lifetime guaranteed to end before you request the next item.
The relevant code is here:
https://github.com/iqlusion-io/crates/blob/master/rpmlib/src/db.rs#L219
The two relevant lifetimes are `'db` and `'ts` for "database" and "transaction set" respectively, with transaction set having the longest lifetime.
In my usage of `HEADERGET_MINMEM` I have assumed the lifetime of a borrowed `Header` is only valid until the next one is requested, and users of the API must drop any references to the previous `Header` value before requesting the next. More nitty gritty details: it does this by means of Rust's affine type system: since getting the next value requires a mutable reference to the iterator, it's explicitly disallowed to obtain one of the previous header value is in any way aliased. Programs which wish to make progress iterating must make the borrow checker happy by dropping the previous value first or they will be rejected by the compiler.
## Totally Bogus Code: Tag Data Parsing
I mapped "tag data" (not quite sure that's the right term, but what I'm meaning to describe is the values in headers that correspond to tags) onto a Rust enum/sum type, which I'd like to say is pretty awesome:
https://github.com/iqlusion-io/crates/blob/master/rpmlib/src/td.rs
...except for the part where it's all half-implemented and untested. Strings work, I think?
Things I wasn't entirely clear on:
- What can I assume about the character sets of `STRING` VS `I18NSTRING`?
- What exactly is the `count` member of the `rpmtd_s` struct for and how does it relate to the various data types?
- Where do I get the length of binary data? Is it `count`?
- What's the character encoding of char? Is it 1 byte? What is it used for? I presently assume it's a 1-byte ASCII character.
- How do string array types work? Is the length `count`? I've left them completely unimplemented for now
## RPM Signing
Last but not least: if there's one particularly interesting thing I'd like to do, it's use `librpmsign.so` to sign an RPM, but swap in some Rust code to perform the actual digital signature operation i.e. using GPG via librpmsign to handle the digest computation and serializing of the signature, but swapping out the actual cryptographic primitive. I've been working on a Rust library for digital signatures supporting multiple software and hardware backends and one of the use cases I'm most interested in is RPM signing, specifically with keys kept in secure enclaves / hardware devices.
I look forward to any input/clarifications, and again would be happy to work toward upstreaming this code if there is interest.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/429
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20180411/a322bc43/attachment-0001.html>
More information about the Rpm-maint
mailing list