[Rpm-maint] rpmstrpool.c way too slow

Alexey Tourbin alexey.tourbin at gmail.com
Wed Jan 30 00:08:02 UTC 2013


Hello,

My previous patches were made by hacking on rpmio/ and injecting new
code via LD_LIBRARY_PATH=$PWD/rpmio/.libs. Thus previous profiling
results were obtained using system librpm.so library, which comes with
rpm-libs-4.10.2-1.fc18.x86_64.  Today I made new "full" profiling of
rpm specfile parser - that is, without using system rpm-libs - and the
result changed dramatically.

Here is the exact profiling command I used:
$ LD_LIBRARY_PATH=$(echo $PWD/*/.libs |sed 's| |:|g') valgrind
--tool=callgrind --dump-instr=yes ./.libs/rpmspec -P
../fc18-specs/texlive.spec >/dev/null

Here are the top20 routines from callgrind_annotate(1) output:
57,420,754,719  PROGRAM TOTALS
10,035,890,516  rpmio/rpmstrpool.c:rpmstrPoolId
 5,874,365,680  rpmio/rpmstrpool.c:rstrhash
 3,062,539,970  glibc-2.16-75f0d304/stdlib/bsearch.c:bsearch
 3,029,616,175  lib/header.c:copyTdEntry
 2,824,229,185  rpmio/rpmstrpool.c:rpmstrPoolGet
 2,720,364,090
glibc-2.16-75f0d304/string/../sysdeps/x86_64/strcmp.S:__GI_strcmp
 2,613,824,638  rpmio/rpmstrpool.c:poolHashAddHEntry
 2,594,167,218
glibc-2.16-75f0d304/string/../sysdeps/x86_64/strcmp.S:__GI_strncmp
 1,828,340,892
glibc-2.16-75f0d304/string/../sysdeps/x86_64/memset.S:__GI_memset
 1,616,755,970  rpmio/rpmstrpool.c:rpmstrPoolStr
 1,611,388,096  lib/header.c:intGetTdEntry
 1,538,230,858  rpmio/rpmstrpool.c:rpmstrPoolPut
 1,387,732,877  lib/header.c:headerGet
 1,251,399,444  rpmio/rpmstrpool.c:poolHashFree.part.0
 1,244,486,470
glibc-2.16-75f0d304/string/../sysdeps/x86_64/rawmemchr.S:__GI___rawmemchr
 1,227,687,377  lib/header.c:findEntry
   978,801,050  lib/headerutil.c:headerGetString
   968,938,071
glibc-2.16-75f0d304/string/../sysdeps/x86_64/memcpy.S:__GI_memcpy
   785,113,238  lib/rpmds.c:rpmdsNIndex
   749,893,987  lib/rpmds.c:rpmdsNext

With system librpm.so, bsearch was the most expensive routine, and now
it is only #3, lagging behind by a great margin. Actually the new
rpmstrpool.c code squanders as much as 45% of the PROGRAM TOTALS.

$ Sum() { perl -MList::Util=sum -ln0 -e 'print sum(split)'; }
$ callgrind_annotate |fgrep rpmstrpool.c |sed 's/,//g' |Sum
25754696291
$ callgrind_annotate |fgrep TOTAL |sed 's/,//g'
57420754719  PROGRAM TOTALS
$ perl -le 'print 25754696291/57420754719'
0.448525910483688
$

So I'm a little bit lost here. I need faster specfile parsing, because
faster parsing would make it possible to run extensive tests more
often (e.g. processing all Fedora specfiles). However, with
rpmstrpool.c, any possibility of faster specfile parsing become very
remote.


More information about the Rpm-maint mailing list