[Rpm-maint] [rpm-software-management/rpm] Add an sqlite based rpmdb backend (experimental) (#899)

Panu Matilainen notifications at github.com
Wed Oct 16 08:06:14 UTC 2019


For further background on the topic:

The world is full of key-value databases, but there are *very* few databases that support the kind of multiprocess concurrency that rpm requires, and even fewer that are license compatible. There's LMDB, but existing versions have severe limitations to key size (far below common PATH_MAX) which makes it unsuitable for rpm production use. Their upstream promised to up the limit over three years ago but it still hasn't happened and we ran out of time five years ago. There's something known as WiredTiger which is supposed to be the successor to BDB and according to spec sheet is technically suitable, but it's hardly time-proven, nobody has ever heard of it and its license is incompatible (GPLv3). From there on it gets even more exotic and incompatible.

And then there's sqlite, which obviously is not a key-value store but can serve the needs of rpm just fine. It's been there all these years, it's ubiquitous and has just the kind of track record we're looking for, but it's been overlooked because, well, we already had an sqlite backend and ripped it off, so there must be something wrong with it, right? There are couple of important factors here:
- The old sqlite backend in the 4.4.x days emulated bdb in a very literal way, using multiple separate whole databases for the index tables and all, so it couldn't do what it does best *at all*.
- The old sqlite backend was also limited concurrency-wise, because back then sqlite didn't have WAL support yet. These days, the amount of concurrency sqlite supports is *just* enough for rpm's needs.
- Third, the old sqlite backend did chroot in/out from the database code. That is what commit 0508e9a6e3311ed00c34887fd7715d3b469cdca6 is all about - to eliminate the need for that. In reality, BDB needs that just as much, it currently only works through the smallest of slightly lucky margins, and to do transactions on BDB level would require the same thing (which is something we could do now too, but that's another topic)

So in the end, going (back) to sqlite was not much of choice, it simply was the last database standing: no arbitrary key limitations, multiprocess concurrency, proven track record and stable file format, good performance, sane API etc. It also has one *huge* bonus over the key-value alternatives: you can actually inspect the database contents interactively while sitting in your lazy chair. So it's actually rather pleasant to work with. (however there's also the danger that it's *too* nice to work with and people will start poking directly into the sql instead of going through librpm)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/pull/899#issuecomment-542578228
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20191016/e89fa7b6/attachment.html>


More information about the Rpm-maint mailing list