[Rpm-ecosystem] hawkey query performance
jsilhan at redhat.com
Fri Sep 11 16:13:04 UTC 2015
> From: "Michael Schroeder" <mls at suse.de>
> On Wed, Sep 09, 2015 at 10:52:17AM +0200, Daniel Mach wrote:
> > I'm struggling with hawkey query performance.
> > Hawkey performs a sequential scan through the whole package set on
> > every query. Consider 100k+ RPMs and making a closure on
> > dependencies (pairing Requires and Provides, incl. file deps). My
> > use case is to create a package set for a compose, similar use
> > case is to draw a dependency graph.
> > Libsolv's depsolving can't be probably used, because compose can
> > contain mutually conflicting packages (but individually installable)
> > or broken deps (we need to deliver latest RPMs to quality
> > engineering for every cost, even if the deps are broken).
> > Do you have any idea how to resolve my problem?
> > Can hawkey or libsolv be improved to index data for queries?
> > Or should I be using it differently?
> Libsolv has an index for provides, of course, so that's not the
> issue. Looking at the callgrind output I think your problem is
> that hawkey's query mechanism is super expensive and not written
> to be called so often.
> It should be easy and fun to change this, though. My suggestion
> is to move away from using a map over all packages to store
> the result of the query steps, as creating and working with
> the map is O(npackages). Instead, store the result in a simple
> list (aka libsolv Queue) and just use a map to combine lists
> with many elements (which should be pretty rare).
> callgrind says most time is spent in
> 2,189,796,993 strlen [/usr/lib64/libc-2.21.90.so]
> 1,754,407,974 __strncmp_sse42 [/usr/lib64/libc-2.21.90.so]
> 1,571,778,421 hy_query_apply [/usr/lib64/libhawkey.so.2]
> strlen/strncmp_sse42 is from str_startswith, called from is_package,
> called from FOR_PKG_SOLVABLES, which is used to initialize the
> result map. hy_query_apply takes so long as it has to merge maps.
Thanks for the audit, Michael. Michael Mraka also came with list solution.
It could be possible to run FOR_PKG_SOLVABLES just once in hy_query_apply
and append to the result list only packages that passes through all filters.
Hawkey queries performance is the issue but Dan was more likely to ask what
tools release engineers from SUSE are using. Do they use libsolv depsolver
directly? Or do they also make queries above packages? Maybe they can share
their efforts for the similar use case.
More information about the Rpm-ecosystem