[Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Wed Aug 8 16:43:25 UTC 2018

On Tue, Aug 07, 2018 at 08:50:25AM +0000, Michael Schroeder wrote:
> On Mon, Aug 06, 2018 at 04:36:07PM +0000, Zbigniew J??drzejewski-Szmek wrote:
> > this mail is a continuation of an FPC [1] and a FESCo [2] tickets.
> > 
> > A proposal was made is to disallow packages in Fedora from using file
> > deps, and to optimize dnf to not load filelists.xml. File deps would
> > still be supported, because external packages and users want to use
> > them, but they would not be allowed for distro packages.
> > 
> > Not downloading or loading filelists.xml which are required for file
> > deps would provide significant bandwidth savings (~47 MB compressed)
> > and noticeable runtime savings (~10s at dnf startup) in many common
> > cases.
> > 
> > So this is something that is worth exploring, but it's not clear if it
> > is at all feasible.
> There's also something that can easily be done and would make
> loading the filelist unneeded in most of the cases: extend the
> primary filelist to include some whitelist of files. The whitelist
> must also be stored in the primary data, so that the solver knows
> what to expect.

Yep, that sounds like an excellent idea.

> > It seems that dnf would need to support loading
> > filelists.xml lazily. In the mailing list discussions, some people
> > said that this would be hard, some people said that it would be
> > possible??? What is the situation here?
> Lazy loading of primary extensions is supported in libsolv, the
> demo solver included in the package makes use of that feature.
> > IIUC, dnf would need to restart
> > the resolution of a transaction mid-flight once it encounters a file dep,
> > which would require support across the different layers.
> No, it works different. At some point the solver creates the ruleset
> needed for dependency resolution. To do this, it has to find which
> packages provide a given dependency. If that's a filename dependency,
> it will check if it matches the default patterns (/etc/* *bin/*
> /usr/lib/sendmail). If it does not match, it will search the filelists.xml
> extension. Here's where libsolv can use a callback to make the lazy
> loading happen.
> > If Fedora commits to making use of this, would it be possible to
> > implement this in dnf? What kind of changes would be required?
> > 
> > [1] https://pagure.io/packaging-committee/issue/714
> > [2] https://pagure.io/fesco/issue/1955
> I don't think this is hard to implement, but there's a little detail
> that needs to be discussed: what should happen if the filelists.xml
> download fails? This can happen because the metadata has been rewritten
> in the meantime. How should the error be propagated back to the user?

That's a good question. The time window where this can happen is not
be that big, because without filelists the loading of metadata is
quicker. But it's still non-zero, so given enough machines and enough
updates, it'll be hit occasionally.

A dirty solution would be to simply error out, not nice.

I think the best solution if part of the meta-data cannot be
downloaded, is to restart the download of metadata in non-lazy mode. In
other words, if the lazy approach fails, repeat the process exactly
like it is done today.

One thing that mitigates this issue is that we have multiple mirrors,
and they cannot be all updated at the same time, so some mirrors will
carry "stale" metadata, and dnf should be able to hit some other mirror
that still has the old filelists. Thus, I think it should be OK to start
with the "dirty solution", if implementing the fallback is complicated,
and implement the fallback later.


More information about the Rpm-ecosystem mailing list