[Rpm-ecosystem] hawkey query performance
mls at suse.de
Fri Sep 11 15:19:24 UTC 2015
On Wed, Sep 09, 2015 at 10:52:17AM +0200, Daniel Mach wrote:
> I'm struggling with hawkey query performance.
> Hawkey performs a sequential scan through the whole package set on
> every query. Consider 100k+ RPMs and making a closure on
> dependencies (pairing Requires and Provides, incl. file deps). My
> use case is to create a package set for a compose, similar use
> case is to draw a dependency graph.
> Libsolv's depsolving can't be probably used, because compose can
> contain mutually conflicting packages (but individually installable)
> or broken deps (we need to deliver latest RPMs to quality
> engineering for every cost, even if the deps are broken).
> Do you have any idea how to resolve my problem?
> Can hawkey or libsolv be improved to index data for queries?
> Or should I be using it differently?
Libsolv has an index for provides, of course, so that's not the
issue. Looking at the callgrind output I think your problem is
that hawkey's query mechanism is super expensive and not written
to be called so often.
It should be easy and fun to change this, though. My suggestion
is to move away from using a map over all packages to store
the result of the query steps, as creating and working with
the map is O(npackages). Instead, store the result in a simple
list (aka libsolv Queue) and just use a map to combine lists
with many elements (which should be pretty rare).
callgrind says most time is spent in
2,189,796,993 strlen [/usr/lib64/libc-2.21.90.so]
1,754,407,974 __strncmp_sse42 [/usr/lib64/libc-2.21.90.so]
1,571,778,421 hy_query_apply [/usr/lib64/libhawkey.so.2]
strlen/strncmp_sse42 is from str_startswith, called from is_package,
called from FOR_PKG_SOLVABLES, which is used to initialize the
result map. hy_query_apply takes so long as it has to merge maps.
My 2 cents,
Michael Schroeder mls at suse.de
SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg
More information about the Rpm-ecosystem