[Rpm-maint] [rpm-software-management/rpm] RPM backend performance is limited by arrays of hdrNum's (#290)

Jeff Johnson notifications at github.com
Sat Jul 29 19:11:40 UTC 2017


The following callgraphs for BDB/LMDB/NDB all show a common hotspot retrieving arrays of hdrNum's from indices.

The performance problem shows up worst on add/del operations, where a RMW loop has to be performed to add/del a hdrNum item to an array. The array is then sorted (and perhaps uniqified) by qsort(3) repeatedly, the worst case behavior for the algorithm, resorting almost sorted arrays (merge sort or even a home rolled insertion loop would be less costly).

Maintaining the hdrNum's endianness is another flaw; exposing the hdrNum's through the RPM API is yet another flaw because the values will change with every --rebuildb (i.e. the hdrNum's are not persistent).

The fundamental architectural problem that needs solving for better performance is the nesting of per-header and then per-tag operations performed by rpmdbAdd(). Ideally, a batch mode update for each index of all the headers would remove the need to constantly reread/modify/rewrite.

One approach to removing the overhead associated with the array management that "works" with BerkeleyDB is to tie the secondary index to the primary store using db->associate. Then Berkeley DB can handle the caching/optimizations needed to handle indices transparently to RPM.

Using db->associate in Berkeley DB is essentially the same as using a SQL trigger to maintain indices derived from a primary store, a very common abstraction used with RDBM's.

Here are the callgraphs that show the performance bottleneck for all of BDB/NDB/LMDB:

BDB
===
```
[jbj at ji rpm]$ /usr/bin/time sudo ./libtool --mode=execute /home/jbj/bin/cg ./rpmdb --rebuilddb
208.17user 3.31system 3:32.94elapsed 99%CPU (0avgtext+0avgdata 74000maxresident)k
0inputs+492608outputs (0major+37493minor)pagefuls 0swaps
```
[bdb.cga.gz](https://github.com/rpm-software-management/rpm/files/1185163/bdb.cga.gz)

NDB
===
```
/usr/bin/time sudo ./libtool --mode=execute /home/jbj/bin/cg ./rpmdb --rebuilddb --ndb
99.59user 3.67system 8:35.82elapsed 20%CPU (0avgtext+0avgdata 93888maxresident)k
0inputs+3315224outputs (0major+461509minor)pagefuls 0swaps
```
[ndb.cga.gz](https://github.com/rpm-software-management/rpm/files/1185165/ndb.cga.gz)

LMDB
====
```
[jbj at ji rpm]$ /usr/bin/time sudo ./libtool --mode=execute /home/jbj/bin/cg ./rpmdb --rebuilddb --lmdb
113.50user 1.57system 1:55.07elapsed 99%CPU (0avgtext+0avgdata 393692maxresident)k
0inputs+455720outputs (1103major+129040minor)pagefuls 0swaps
```
[lmdb.cga.gz](https://github.com/rpm-software-management/rpm/files/1185166/lmdb.cga.gz)


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/290
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20170729/b964b929/attachment.html>


More information about the Rpm-maint mailing list