[Rpm-ecosystem] DNF's use cases of CAShe

James Antill james at fedoraproject.org
Thu Jul 2 03:56:33 UTC 2015


On Wed, 2015-07-01 at 09:10 -0400, Radek Holy wrote:
> Hi James (and others),
> 
> I've identified several DNF's use cases that, I believe, can be
> supported by the new CAShe project.
> 
> 
> Makecache
> ---------
> 
> DNF's metadata cache is often refreshed; on demand, regularly or lazily... 
> 1) download the checksums of the current metadata
> 2) if they do not match the cache, download the current metadata and store them in the cache
> 3) remove the old metadata of the given repositories from the cache
> 
> In this case, I think that DNF should remember the checksums of the
> metadata stored in CAShe for each repository (including the
> $RELEASEVER etc.). Then it could store the new metadata in CAShe, make
> the local hardlinks, remove the old hardlinks, trigger the cleanup and
> update the list of the checksums.

 huh? I'm not sure why you want to remember the checksums of metadata?

 The point of the CAShe is that when you are about to download something
with a checksum of XYZ you do:

1. does object with checksum XYZ exist in the CAShe.

2a. If yes, then get it into my program cache.

2b. If no, then I download it. When done I can put it into the CAShe.

...where hopefully 2a and 2b can use hardlinks, to save diskspace and do
automatic tracking of inuse objects.
 I wouldn't trigger the CAShe cleanup here, esp. as you didn't do
anything that would need cleaning up.

 The only time you need to think about the checksums are when you are
downloading (when you'd have them anyway).

> Download
> --------
> 
> People sometimes want to download an RPM.
> 1) "makecache"
> 2) is the package stored in the cache?
>     yes) copy it to the destination directory
>     no) just download it (without storing it in the cache)
> It may be configured to make a hardlink instead of copying the file
> and to store the package in the cache.
> 
> The default case is just about retrieving the RPM from CAShe. No rocket science.

 Sure, this seems fine.

> Install/upgrade etc.
> --------------------
> 
> These are the core use cases of DNF.
> 1) "makecache"
> 2) if the packages are not stored in the cache, download them and
> store them in the cache
> 3) once the transaction is successfully performed (after N attempts),
> all the downloaded packages are removed from the cache
> 
> Also people want to download the packages in advance and perform the
> upgrade later. The procedure is the same. The only difference is that
> there is a delay between (2) and (3) for sure.
> 
> In this case, DNF should persist the list of the checksums to be
> removed after the successful transaction in addition. Then, it will
> remove the hardlinks, trigger the cleanup and clean the list.

 Why do you want to explicitly/manually remove things from the CAShe?
That defeats the purpose. Just run the cleanup code, and if the packages
you just installed are chosen to be deleted so be it ... no reason to
manually delete them though.
 Indeed the installroot usecases break if you do this, as do the
multi-machine use cases. And while less likely there are use cases where
a user would want the packages again.

 In yum (and I assume dnf) we have to remove the packages from the
application cache after they are installed in a transaction, this is
because there's no other good time to find and delete them. So there is
only one lifetime configuration, delete when used or keep forever. CAShe
allows the user to set more useful lifetime and size constraints

> Upgrade other devices, virtual machines and containers from a single cache
> --------------------------------------------------------------------------
> 
> People want to download the data once and reuse them among the whole LAN.
> 1) "Install/upgrade etc." on all representative systems
> 2) but don't remove anything, or remove just those packages that are
> not needed any more (based on access times or using a depsolver)
>
> Here I think that it shouldn't be needed to hardlink the packages into
> DNF's cache since the instance which downloaded them probably does not
> need them any more.

 I'm not sure what you mean here. For each system just download them as
you would, and delete them from DNF's cache as you would.
 Then for each machine either have the CAShe mounted over NFS, or use
the cashe rsync-to/rsync-from commands.

> Undo/downgrade
> --------------
> 
> Fedora removes the old packages from the repositories but people
> sometimes need to undo a transaction or downgrade a package.
> 1) "Install/upgrade etc."
> 2) but remove only those packages that were persisted on the list but
> were *not* installed during the last successful transaction
> 
> This is the same as the "Install/upgrade etc." case, just the DNF's
> logic includes an additional condition. Also this may be a task for
> another tool.

 I'm not sure if you are trying to implement some kind of hidden/shadow
repos. with the CAShe data here, or something?
 If you want to be able to download/downgrade upgraded Fedora packages
then you also want to implement something similar to the "yum-local"
plugin. I wouldn't recommend using CAShe as a backend for this though.

 Again, just treat it as it works in DNF now. If the package is
available from a repo. with a checksum, then you don't need to download
it if you can look it up in the CAShe.


> What do you think? Are these use cases reasonable from the CAShe's
> POV? What do you think about the brief implementation proposals?
> 
> 
> 
> And some ideas:
> - I think that DNF should, by default, trigger the cleanup just for
> those packages of which it removed the hardlinks (since the operation
> may not be cheap).

 No, don't explicitly remove anything.
 You can decide not to call the cleanup operation unless you have
removed packages from DNF's cache (presumably due to a transaction), to
not do the "expensive" operation.

 Note that "expensive" here is relative, in the best (hopefully normal)
case getting a file from the CAShe is a single syscall, and putting a
file into it is 2 syscalls. By comparison the cleanup operation requires
at least reading the config. and readdir+stat'ing all the files in the
CAShe. If you have to load from disk that readdir'ing+stat'ing is
noticeable when someone runs "list foo", but not so much when someone
runs "upgrade firefox".

> - I think that there should be some other way how to mark that a file
> should stay in CAShe. E.g. in the "makecache" case, a user may clean
> the DNF's cache for some reason. Then something may trigger the
> automatic cleanup. It will remove the metadata even though they are
> still useful for DNF and for the other package managers that were not
> executed for some time, if the data are fresh.

 No, that's not how it works. Even if you manage to run "cashe cleanup"
just after you ran "dnf clean all" it won't delete everything, it just
obeys the limits (either by default or changed by the user).
 By default that's the last 500MB of stuff used (upto another 1500MB of
stuff, if it's been accessed in the 8 days). Obviously those can be
changed.

>  The same goes for the "multiple devices" and "undo/downgrade" cases.
> Packages can be installed using multiple package managers and all of
> them should be able to contribute to these repositories for the other
> devices or to allow the future downgrades. If the metadata were marked
> as "latest metadata for given repository" and the packages as
> "packages for random/remote usage", all the package managers could
> better collaborate to achieve the goal.

 That's what the hardlinking does, without having to put any knowledge
about repos./packages/URLs/etc. in the CAShe layer. And in the cases
where hardlinking doesn't work all it needs to care about is data =>
checksum mapping and what is the most recently used data, and it should
mostly just work anyway (depending on what you set the limits to).



More information about the Rpm-ecosystem mailing list