[Rpm-maint] Rpm database backend benchmarks

Panu Matilainen pmatilai at redhat.com
Wed Oct 16 15:12:45 UTC 2019


On 10/16/19 4:28 PM, Michael Schroeder wrote:
> 
> Hi,
> 
> I wrote a little benchmarking tool to find out how the different
> database backends compare. The backends tested are bdb, ndb,
> lmdb, and the new sqlite backend.
> 
> I used the 2506 packages I have on my system as data set. The
> benchmark tool does the following:
> 
> Add all packages to an empty database, in forward/reverse/random
> order.
> 
> Remove all packages from a database that contains all packages,
> in forward/reverse/random order.
> 
> Update all packages in forward/reverse/random order. An update
> consistes of an reinstall of the package plus a remove of the
> old header
> 
> Simulate a dependency check of all packages (rpm -Va --nofiles).

Can you share this benchmark tool?

I ran similar comparisons (rpm -Uvh --justdb, rpmdb --rebuilddb, rpm -e 
--test all etc) manually when doing the initial work on the sqlite 
backend, sometimes with modifications to bdb/lmdb make the comparisons 
more relevant (enable transactions, syncs etc), and mostly just to see 
it's in the ballpark with others.

> 
> The install/erase/update lines look like this:
> 
> Operation
>      forward / reverse / random in seconds with fsync disabled
>      forward / reverse / random in seconds with fsync enabled
>      forward / reverse / random disk space used in MByte
>
> Here's the result:
> 
> BDB
> ---
> Adding all headers...
>      3.19s / 3.21s / 3.17s
>      85.69s / 88.58s / 84.99s
>      164.41M / 164.48M / 164.52M
> Erasing all headers...
>      3.74s / 3.70s / 3.70s
>      70.00s / 70.06s / 74.60s
>      164.29M / 164.30M / 164.29M
> Updating all headers...
>      7.23s / 7.31s / 7.26s
>      147.16s / 150.98s / 147.26s
>      174.17M / 174.07M / 174.17M
> Dep check
>      4.82s
> 
> NDB
> ---
> Adding all headers...
>      1.05s / 0.99s / 1.04s
>      46.91s / 41.94s / 42.09s
>      177.70M / 175.22M / 173.37M
> Erasing all headers...
>      1.10s / 0.95s / 1.09s
>      42.47s / 30.93s / 36.13s
>      0.67M / 0.71M / 0.82M
> Updating all headers...
>      2.34s / 2.42s / 2.61s
>      78.17s / 70.24s / 72.18s
>      164.97M / 170.09M / 168.97M
> Dep check
>      2.81s
> 
> LMDB
> ----
> Adding all headers...
>      1.06s / 1.03s / 1.03s
>      (not implemented)
>      268.44M / 268.44M / 268.44M
> Erasing all headers...
>      1.13s / 1.14s / 1.13s
>      (not implemented)
>      268.44M / 268.44M / 268.44M
> Updating all headers...
>      3.33s / 3.37s / 3.42s
>      (not implemented)
>      268.44M / 268.44M / 268.44M
> Dep check
>      2.36s
> 
> SQLITE
> ------
> Adding all headers...
>      4.24s / 4.28s / 4.24s
>      34.50s / 38.03s / 34.48s
>      158.58M / 158.54M / 158.58M
> Erasing all headers...
>      19.58s / 19.59s / 20.52s
>      51.12s / 55.65s / 51.83s
>      158.58M / 158.58M / 158.58M
> Updating all headers...
>      45.52s / 45.84s / 46.39s
>      108.55s / 114.18s / 113.97s
>      172.46M / 171.74M / 173.19M
> Dep check
>      12.50s
> 
> 
> Things to note:
> 
> - Berkeley db is actually not that fast, both lmdb and ndb are much
>    faster

Yup. And it gets slower if you enable transactions.

> - rpm's lmdb code does not implement fsync

Yup. Drop MDB_MAPASYNC and MDB_WRITEMAP flags to level the playground. 
It's still fastest of the lot, but much more comparable.

> - there's something weird with the sqlite package erase, it takes
>    way too much time with fsync disabled

Right, haven't tested erase with fsync disabled because to me it's a 
rare corner case (as opposed to install). With sqlite, if it goes too 
slow there are simply too many transactions going on (which is one thing 
that makes it quite different from bdb/lmdb). Sqlite doesn't really have 
a "disable fsync" mode in the sense that eg bdb has, it's more a matter 
of finding the right balance of transaction sizes and pragmas etc.

Probably not even related, but for one the current rpmdb backend API 
forces it to play on the key-value db terms, whereas internally it could 
just do one sweeping "delete from ..." statement and skip a whole lot of 
work currently done if only the API permitted that (working on it).

> - sqlite is quite slow

It's also not an entirely apples-to-apples comparison, in many ways.

In my benchmarks, LMDB was the all-round winner even when syncing 
enabled and modified to use per-package transactions, but then it's a 
bit moot to compare as long as it can't be used for real due to the key 
size limitation.

Besides sqlite having to do horribly stupid things due to the key-value 
oriented internal backend API, the new backend is also not really 
optimized at all. So far I've only cared about getting it into the rough 
ballpark with BDB backend which is the thing it's supposed to be 
eventually replacing. For various things it's faster even in its current 
state, which is not a bad start. It's just a totally different animal 
from the others so it needs rather different approaches to make it fast(er).

> - sqlite's "Adding all header" benchmark is quite fast with fsync
>    enabled. I wonder what sqlite guarantees if there is a crash

Generally speaking, all sqlite operations are transactionally protected 
and thus crash-resilient (unlike the current BDB backend). The rest 
depends on all manner of details, such as if you're inserting headers 
into a freshly created database rpm assumes fsync off (for all backends 
that implement it). Etc.

> - ndb is the only database that can shrink if packages get
>    removed

BDB supports shrinking, but the backend doesn't use it because I haven't 
dared enable it on a non-transactional database, dunno about LMDB. 
Sqlite does support shrink via vacuum as well, but not currently 
implemented in the backend.

> - lmdb won the dep check benchmark. I think this is because it
>    mmaps the complete database and thus has not to copy the header
>    data.

And sqlite is slowest by far. I haven't looked into it at all, but my 
guess would be overhead from recalculating sql statements over and over 
instead of caching them. Like noted, it's unoptimized.

	- Panu -

> 
> If fsync is enabled (aka normal rpm operation), all implementations
> take very long. The question is how much this is drowned out by the
> time spent in unpacking/erasing all the files on disk.
> 
> Cheers,
>    Michael.
> 



More information about the Rpm-maint mailing list