[Rpm-maint] rpmstrpool.c way too slow

Panu Matilainen pmatilai at laiskiainen.org
Wed Jan 30 08:53:34 UTC 2013


On 01/30/2013 02:08 AM, Alexey Tourbin wrote:
> Hello,
>
> My previous patches were made by hacking on rpmio/ and injecting new
> code via LD_LIBRARY_PATH=$PWD/rpmio/.libs. Thus previous profiling
> results were obtained using system librpm.so library, which comes with
> rpm-libs-4.10.2-1.fc18.x86_64.  Today I made new "full" profiling of
> rpm specfile parser - that is, without using system rpm-libs - and the
> result changed dramatically.
>
> Here is the exact profiling command I used:
> $ LD_LIBRARY_PATH=$(echo $PWD/*/.libs |sed 's| |:|g') valgrind
> --tool=callgrind --dump-instr=yes ./.libs/rpmspec -P
> ../fc18-specs/texlive.spec >/dev/null
>
> Here are the top20 routines from callgrind_annotate(1) output:
> 57,420,754,719  PROGRAM TOTALS
> 10,035,890,516  rpmio/rpmstrpool.c:rpmstrPoolId
>   5,874,365,680  rpmio/rpmstrpool.c:rstrhash
>   3,062,539,970  glibc-2.16-75f0d304/stdlib/bsearch.c:bsearch
>   3,029,616,175  lib/header.c:copyTdEntry
>   2,824,229,185  rpmio/rpmstrpool.c:rpmstrPoolGet
>   2,720,364,090
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/strcmp.S:__GI_strcmp
>   2,613,824,638  rpmio/rpmstrpool.c:poolHashAddHEntry
>   2,594,167,218
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/strcmp.S:__GI_strncmp
>   1,828,340,892
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/memset.S:__GI_memset
>   1,616,755,970  rpmio/rpmstrpool.c:rpmstrPoolStr
>   1,611,388,096  lib/header.c:intGetTdEntry
>   1,538,230,858  rpmio/rpmstrpool.c:rpmstrPoolPut
>   1,387,732,877  lib/header.c:headerGet
>   1,251,399,444  rpmio/rpmstrpool.c:poolHashFree.part.0
>   1,244,486,470
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/rawmemchr.S:__GI___rawmemchr
>   1,227,687,377  lib/header.c:findEntry
>     978,801,050  lib/headerutil.c:headerGetString
>     968,938,071
> glibc-2.16-75f0d304/string/../sysdeps/x86_64/memcpy.S:__GI_memcpy
>     785,113,238  lib/rpmds.c:rpmdsNIndex
>     749,893,987  lib/rpmds.c:rpmdsNext
>
> With system librpm.so, bsearch was the most expensive routine, and now
> it is only #3, lagging behind by a great margin. Actually the new
> rpmstrpool.c code squanders as much as 45% of the PROGRAM TOTALS.
>
> $ Sum() { perl -MList::Util=sum -ln0 -e 'print sum(split)'; }
> $ callgrind_annotate |fgrep rpmstrpool.c |sed 's/,//g' |Sum
> 25754696291
> $ callgrind_annotate |fgrep TOTAL |sed 's/,//g'
> 57420754719  PROGRAM TOTALS
> $ perl -le 'print 25754696291/57420754719'
> 0.448525910483688
> $
>
> So I'm a little bit lost here. I need faster specfile parsing, because
> faster parsing would make it possible to run extensive tests more
> often (e.g. processing all Fedora specfiles). However, with
> rpmstrpool.c, any possibility of faster specfile parsing become very
> remote.

Nah... Its not the string pool code that is somehow horrendously slow in 
itself. The above figures originate from isNewDep() which is stupid as 
hell to begin with and now that rpmds is using string pools internally, 
doubly so.

Much of the rpm codebase hasn't yet been pool-optimized, only some of 
the more critical bits in librpm were "converted" in the rpm-4.11 cycle 
and the build-side code got practically none of it, except for rpmfc.c 
(which is not involved in spec parse stage).

The worst offender in isNewDep() is almost certainly rpmdsNew() which 
recreates an entire rpmds from the header for each and every dependency 
that gets checked, causing the same strings hashed (which is a cost that 
was previously not there and what is topping your figures above) over 
and over again.

The rest of it just wants a regular "string pool conversion" treatment: 
using a spec-global pool and mostly operating on id's instead of the 
actual strings. The necessary bits and pieces aren't exported from 
librpm currently so there's some other work to be done in that area, but 
once its pool-optimized it'll be considerably faster than the non-pool 
variant ever was.

P.S. I'll try to get to looking at your patches properly in a few days / 
next week, just a bit busy ATM. Just so you know they're not being 
ignored :)

	- Panu -


More information about the Rpm-maint mailing list