[Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

Vít Ondruch vondruch at redhat.com
Thu Aug 9 08:12:06 UTC 2018

Dne 9.8.2018 v 07:34 Neal Gompa napsal(a):
> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan <pterjan at gmail.com> wrote:
>> On 7 August 2018 at 09:50, Michael Schroeder <mls at suse.de> wrote:
>>> On Mon, Aug 06, 2018 at 04:36:07PM +0000, Zbigniew J??drzejewski-Szmek wrote:
>>>> this mail is a continuation of an FPC [1] and a FESCo [2] tickets.
>>>> A proposal was made is to disallow packages in Fedora from using file
>>>> deps, and to optimize dnf to not load filelists.xml. File deps would
>>>> still be supported, because external packages and users want to use
>>>> them, but they would not be allowed for distro packages.
>>>> Not downloading or loading filelists.xml which are required for file
>>>> deps would provide significant bandwidth savings (~47 MB compressed)
>>>> and noticeable runtime savings (~10s at dnf startup) in many common
>>>> cases.
>>>> So this is something that is worth exploring, but it's not clear if it
>>>> is at all feasible.
>>> There's also something that can easily be done and would make
>>> loading the filelist unneeded in most of the cases: extend the
>>> primary filelist to include some whitelist of files. The whitelist
>>> must also be stored in the primary data, so that the solver knows
>>> what to expect.
>> That's what Mandrake/Mandriva/Mageia/... has been doing for many
>> years, there is a small file-deps file containing the ones we end up
>> generating, mostly from scriptlets IIRC, and we end up with provides
>> added for those in the main metadata when generating it. Then file
>> lists are lazily loaded when people want to query them but not used
>> for dependency resolution.
>> $ GET http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps
>> /bin/csh
>> /bin/grep
>> /bin/perl
>> /usr/bin/ln
>> /usr/bin/rm
>> /sbin/service
>> /usr/bin/chattr
>> /usr/bin/guile
>> /usr/bin/openssl
>> /usr/bin/pear
>> /usr/bin/texhash
>> /usr/bin/tr
>> /usr/bin/which
>> /usr/sbin/groupadd
>> /usr/sbin/groupdel
>> /usr/sbin/useradd
>> /usr/sbin/userdel
> So the primary.xml already includes all that. If you actually look in
> the primary.xml.gz files in the Mageia rpm-md data, those are already
> there. The problem is that there are people who actually request files
> outside of the base whitelist as a means to be able to request
> "things" without knowing how they are packaged, because the file path
> is the consistent thing across distros.

So couldn't be createrepo actually extended in a way that if it
identifies package, which has "Requires: /some/random/path" and at the
same time, the "/some/random/path" is actually included in the
repository, such file/package would be included in primary.xml.gz? This
would help with huge repositories, since there is the highest cost of
downloading filelist.xml.


More information about the Rpm-ecosystem mailing list