[Rpm-maint] Rpm database backend benchmarks
Panu Matilainen
pmatilai at redhat.com
Wed Oct 16 15:12:45 UTC 2019
On 10/16/19 4:28 PM, Michael Schroeder wrote:
>
> Hi,
>
> I wrote a little benchmarking tool to find out how the different
> database backends compare. The backends tested are bdb, ndb,
> lmdb, and the new sqlite backend.
>
> I used the 2506 packages I have on my system as data set. The
> benchmark tool does the following:
>
> Add all packages to an empty database, in forward/reverse/random
> order.
>
> Remove all packages from a database that contains all packages,
> in forward/reverse/random order.
>
> Update all packages in forward/reverse/random order. An update
> consistes of an reinstall of the package plus a remove of the
> old header
>
> Simulate a dependency check of all packages (rpm -Va --nofiles).
Can you share this benchmark tool?
I ran similar comparisons (rpm -Uvh --justdb, rpmdb --rebuilddb, rpm -e
--test all etc) manually when doing the initial work on the sqlite
backend, sometimes with modifications to bdb/lmdb make the comparisons
more relevant (enable transactions, syncs etc), and mostly just to see
it's in the ballpark with others.
>
> The install/erase/update lines look like this:
>
> Operation
> forward / reverse / random in seconds with fsync disabled
> forward / reverse / random in seconds with fsync enabled
> forward / reverse / random disk space used in MByte
>
> Here's the result:
>
> BDB
> ---
> Adding all headers...
> 3.19s / 3.21s / 3.17s
> 85.69s / 88.58s / 84.99s
> 164.41M / 164.48M / 164.52M
> Erasing all headers...
> 3.74s / 3.70s / 3.70s
> 70.00s / 70.06s / 74.60s
> 164.29M / 164.30M / 164.29M
> Updating all headers...
> 7.23s / 7.31s / 7.26s
> 147.16s / 150.98s / 147.26s
> 174.17M / 174.07M / 174.17M
> Dep check
> 4.82s
>
> NDB
> ---
> Adding all headers...
> 1.05s / 0.99s / 1.04s
> 46.91s / 41.94s / 42.09s
> 177.70M / 175.22M / 173.37M
> Erasing all headers...
> 1.10s / 0.95s / 1.09s
> 42.47s / 30.93s / 36.13s
> 0.67M / 0.71M / 0.82M
> Updating all headers...
> 2.34s / 2.42s / 2.61s
> 78.17s / 70.24s / 72.18s
> 164.97M / 170.09M / 168.97M
> Dep check
> 2.81s
>
> LMDB
> ----
> Adding all headers...
> 1.06s / 1.03s / 1.03s
> (not implemented)
> 268.44M / 268.44M / 268.44M
> Erasing all headers...
> 1.13s / 1.14s / 1.13s
> (not implemented)
> 268.44M / 268.44M / 268.44M
> Updating all headers...
> 3.33s / 3.37s / 3.42s
> (not implemented)
> 268.44M / 268.44M / 268.44M
> Dep check
> 2.36s
>
> SQLITE
> ------
> Adding all headers...
> 4.24s / 4.28s / 4.24s
> 34.50s / 38.03s / 34.48s
> 158.58M / 158.54M / 158.58M
> Erasing all headers...
> 19.58s / 19.59s / 20.52s
> 51.12s / 55.65s / 51.83s
> 158.58M / 158.58M / 158.58M
> Updating all headers...
> 45.52s / 45.84s / 46.39s
> 108.55s / 114.18s / 113.97s
> 172.46M / 171.74M / 173.19M
> Dep check
> 12.50s
>
>
> Things to note:
>
> - Berkeley db is actually not that fast, both lmdb and ndb are much
> faster
Yup. And it gets slower if you enable transactions.
> - rpm's lmdb code does not implement fsync
Yup. Drop MDB_MAPASYNC and MDB_WRITEMAP flags to level the playground.
It's still fastest of the lot, but much more comparable.
> - there's something weird with the sqlite package erase, it takes
> way too much time with fsync disabled
Right, haven't tested erase with fsync disabled because to me it's a
rare corner case (as opposed to install). With sqlite, if it goes too
slow there are simply too many transactions going on (which is one thing
that makes it quite different from bdb/lmdb). Sqlite doesn't really have
a "disable fsync" mode in the sense that eg bdb has, it's more a matter
of finding the right balance of transaction sizes and pragmas etc.
Probably not even related, but for one the current rpmdb backend API
forces it to play on the key-value db terms, whereas internally it could
just do one sweeping "delete from ..." statement and skip a whole lot of
work currently done if only the API permitted that (working on it).
> - sqlite is quite slow
It's also not an entirely apples-to-apples comparison, in many ways.
In my benchmarks, LMDB was the all-round winner even when syncing
enabled and modified to use per-package transactions, but then it's a
bit moot to compare as long as it can't be used for real due to the key
size limitation.
Besides sqlite having to do horribly stupid things due to the key-value
oriented internal backend API, the new backend is also not really
optimized at all. So far I've only cared about getting it into the rough
ballpark with BDB backend which is the thing it's supposed to be
eventually replacing. For various things it's faster even in its current
state, which is not a bad start. It's just a totally different animal
from the others so it needs rather different approaches to make it fast(er).
> - sqlite's "Adding all header" benchmark is quite fast with fsync
> enabled. I wonder what sqlite guarantees if there is a crash
Generally speaking, all sqlite operations are transactionally protected
and thus crash-resilient (unlike the current BDB backend). The rest
depends on all manner of details, such as if you're inserting headers
into a freshly created database rpm assumes fsync off (for all backends
that implement it). Etc.
> - ndb is the only database that can shrink if packages get
> removed
BDB supports shrinking, but the backend doesn't use it because I haven't
dared enable it on a non-transactional database, dunno about LMDB.
Sqlite does support shrink via vacuum as well, but not currently
implemented in the backend.
> - lmdb won the dep check benchmark. I think this is because it
> mmaps the complete database and thus has not to copy the header
> data.
And sqlite is slowest by far. I haven't looked into it at all, but my
guess would be overhead from recalculating sql statements over and over
instead of caching them. Like noted, it's unoptimized.
- Panu -
>
> If fsync is enabled (aka normal rpm operation), all implementations
> take very long. The question is how much this is drowned out by the
> time spent in unpacking/erasing all the files on disk.
>
> Cheers,
> Michael.
>
More information about the Rpm-maint
mailing list