[Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

Jeff Johnson n3npq at me.com
Thu Aug 9 14:28:21 UTC 2018



> On Aug 9, 2018, at 4:12 AM, Vít Ondruch <vondruch at redhat.com> wrote:
> 
> 
> 
> Dne 9.8.2018 v 07:34 Neal Gompa napsal(a):
>>> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan <pterjan at gmail.com> wrote:
>>>> On 7 August 2018 at 09:50, Michael Schroeder <mls at suse.de> wrote:
>>>>> On Mon, Aug 06, 2018 at 04:36:07PM +0000, Zbigniew J??drzejewski-Szmek wrote:
>>>>> this mail is a continuation of an FPC [1] and a FESCo [2] tickets.
>>>>> 
>>>>> A proposal was made is to disallow packages in Fedora from using file
>>>>> deps, and to optimize dnf to not load filelists.xml. File deps would
>>>>> still be supported, because external packages and users want to use
>>>>> them, but they would not be allowed for distro packages.
>>>>> 
>>>>> Not downloading or loading filelists.xml which are required for file
>>>>> deps would provide significant bandwidth savings (~47 MB compressed)
>>>>> and noticeable runtime savings (~10s at dnf startup) in many common
>>>>> cases.
>>>>> 
>>>>> So this is something that is worth exploring, but it's not clear if it
>>>>> is at all feasible.
>>>> There's also something that can easily be done and would make
>>>> loading the filelist unneeded in most of the cases: extend the
>>>> primary filelist to include some whitelist of files. The whitelist
>>>> must also be stored in the primary data, so that the solver knows
>>>> what to expect.
>>> That's what Mandrake/Mandriva/Mageia/... has been doing for many
>>> years, there is a small file-deps file containing the ones we end up
>>> generating, mostly from scriptlets IIRC, and we end up with provides
>>> added for those in the main metadata when generating it. Then file
>>> lists are lazily loaded when people want to query them but not used
>>> for dependency resolution.
>>> 
>>> $ GET http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps
>>> /bin/csh
>>> /bin/grep
>>> /bin/perl
>>> /usr/bin/ln
>>> /usr/bin/rm
>>> /sbin/service
>>> /usr/bin/chattr
>>> /usr/bin/guile
>>> /usr/bin/openssl
>>> /usr/bin/pear
>>> /usr/bin/texhash
>>> /usr/bin/tr
>>> /usr/bin/which
>>> /usr/sbin/groupadd
>>> /usr/sbin/groupdel
>>> /usr/sbin/useradd
>>> /usr/sbin/userdel
>> So the primary.xml already includes all that. If you actually look in
>> the primary.xml.gz files in the Mageia rpm-md data, those are already
>> there. The problem is that there are people who actually request files
>> outside of the base whitelist as a means to be able to request
>> "things" without knowing how they are packaged, because the file path
>> is the consistent thing across distros.
> 
> 
> So couldn't be createrepo actually extended in a way that if it
> identifies package, which has "Requires: /some/random/path" and at the
> same time, the "/some/random/path" is actually included in the
> repository, such file/package would be included in primary.xml.gz? This
> would help with huge repositories, since there is the highest cost of
> downloading filelist.xml.
> 

Creating a tool to automate generating the list of file dependencies in a whitelist is a sound idea.

A separate tool instead of bundling into createrepo may be simpler for two reasons:

1) the whitelist is not just existence, but also policy control: some file dependencies may not be permitted because of policy.

2) the whitelist must be complete before the markup is generated: this forces two passes on the packages, first to find the whitelist, then to generate primary.xml with permitted file paths.

(aside)
Note that file paths can appear in all dependencies, not just Requires:, even though Requires: is by far the most common usage case for rpm depsolvers which typically do not attempt back tracking (I.e. removing installed packages to avoid Conflicts:).

73 de Jeff


More information about the Rpm-ecosystem mailing list