[Rpm-maint] rpmstrpool.c way too slow
Panu Matilainen
pmatilai at laiskiainen.org
Wed Jan 30 08:53:34 UTC 2013
On 01/30/2013 02:08 AM, Alexey Tourbin wrote:
> Hello,
>
> My previous patches were made by hacking on rpmio/ and injecting new
> code via LD_LIBRARY_PATH=$PWD/rpmio/.libs. Thus previous profiling
> results were obtained using system librpm.so library, which comes with
> rpm-libs-4.10.2-1.fc18.x86_64. Today I made new "full" profiling of
> rpm specfile parser - that is, without using system rpm-libs - and the
> result changed dramatically.
>
> Here is the exact profiling command I used:
> $ LD_LIBRARY_PATH=$(echo $PWD/*/.libs |sed 's| |:|g') valgrind
> --tool=callgrind --dump-instr=yes ./.libs/rpmspec -P
> ../fc18-specs/texlive.spec >/dev/null
>
> Here are the top20 routines from callgrind_annotate(1) output:
> 57,420,754,719 PROGRAM TOTALS
> 10,035,890,516 rpmio/rpmstrpool.c:rpmstrPoolId
> 5,874,365,680 rpmio/rpmstrpool.c:rstrhash
> 3,062,539,970 glibc-2.16-75f0d304/stdlib/bsearch.c:bsearch
> 3,029,616,175 lib/header.c:copyTdEntry
> 2,824,229,185 rpmio/rpmstrpool.c:rpmstrPoolGet
> 2,720,364,090
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/strcmp.S:__GI_strcmp
> 2,613,824,638 rpmio/rpmstrpool.c:poolHashAddHEntry
> 2,594,167,218
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/strcmp.S:__GI_strncmp
> 1,828,340,892
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/memset.S:__GI_memset
> 1,616,755,970 rpmio/rpmstrpool.c:rpmstrPoolStr
> 1,611,388,096 lib/header.c:intGetTdEntry
> 1,538,230,858 rpmio/rpmstrpool.c:rpmstrPoolPut
> 1,387,732,877 lib/header.c:headerGet
> 1,251,399,444 rpmio/rpmstrpool.c:poolHashFree.part.0
> 1,244,486,470
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/rawmemchr.S:__GI___rawmemchr
> 1,227,687,377 lib/header.c:findEntry
> 978,801,050 lib/headerutil.c:headerGetString
> 968,938,071
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/memcpy.S:__GI_memcpy
> 785,113,238 lib/rpmds.c:rpmdsNIndex
> 749,893,987 lib/rpmds.c:rpmdsNext
>
> With system librpm.so, bsearch was the most expensive routine, and now
> it is only #3, lagging behind by a great margin. Actually the new
> rpmstrpool.c code squanders as much as 45% of the PROGRAM TOTALS.
>
> $ Sum() { perl -MList::Util=sum -ln0 -e 'print sum(split)'; }
> $ callgrind_annotate |fgrep rpmstrpool.c |sed 's/,//g' |Sum
> 25754696291
> $ callgrind_annotate |fgrep TOTAL |sed 's/,//g'
> 57420754719 PROGRAM TOTALS
> $ perl -le 'print 25754696291/57420754719'
> 0.448525910483688
> $
>
> So I'm a little bit lost here. I need faster specfile parsing, because
> faster parsing would make it possible to run extensive tests more
> often (e.g. processing all Fedora specfiles). However, with
> rpmstrpool.c, any possibility of faster specfile parsing become very
> remote.
Nah... Its not the string pool code that is somehow horrendously slow in
itself. The above figures originate from isNewDep() which is stupid as
hell to begin with and now that rpmds is using string pools internally,
doubly so.
Much of the rpm codebase hasn't yet been pool-optimized, only some of
the more critical bits in librpm were "converted" in the rpm-4.11 cycle
and the build-side code got practically none of it, except for rpmfc.c
(which is not involved in spec parse stage).
The worst offender in isNewDep() is almost certainly rpmdsNew() which
recreates an entire rpmds from the header for each and every dependency
that gets checked, causing the same strings hashed (which is a cost that
was previously not there and what is topping your figures above) over
and over again.
The rest of it just wants a regular "string pool conversion" treatment:
using a spec-global pool and mostly operating on id's instead of the
actual strings. The necessary bits and pieces aren't exported from
librpm currently so there's some other work to be done in that area, but
once its pool-optimized it'll be considerably faster than the non-pool
variant ever was.
P.S. I'll try to get to looking at your patches properly in a few days /
next week, just a bit busy ATM. Just so you know they're not being
ignored :)
- Panu -
More information about the Rpm-maint
mailing list