[Rpm-ecosystem] DNF's use cases of CAShe

Radek Holy rholy at redhat.com
Wed Jul 1 13:10:26 UTC 2015


Hi James (and others),

I've identified several DNF's use cases that, I believe, can be supported by the new CAShe project.


Makecache
---------

DNF's metadata cache is often refreshed; on demand, regularly or lazily... 
1) download the checksums of the current metadata
2) if they do not match the cache, download the current metadata and store them in the cache
3) remove the old metadata of the given repositories from the cache

In this case, I think that DNF should remember the checksums of the metadata stored in CAShe for each repository (including the $RELEASEVER etc.). Then it could store the new metadata in CAShe, make the local hardlinks, remove the old hardlinks, trigger the cleanup and update the list of the checksums.


Download
--------

People sometimes want to download an RPM.
1) "makecache"
2) is the package stored in the cache?
    yes) copy it to the destination directory
    no) just download it (without storing it in the cache)
It may be configured to make a hardlink instead of copying the file and to store the package in the cache.

The default case is just about retrieving the RPM from CAShe. No rocket science.


Install/upgrade etc.
--------------------

These are the core use cases of DNF.
1) "makecache"
2) if the packages are not stored in the cache, download them and store them in the cache
3) once the transaction is successfully performed (after N attempts), all the downloaded packages are removed from the cache

Also people want to download the packages in advance and perform the upgrade later. The procedure is the same. The only difference is that there is a delay between (2) and (3) for sure.

In this case, DNF should persist the list of the checksums to be removed after the successful transaction in addition. Then, it will remove the hardlinks, trigger the cleanup and clean the list.


Upgrade other devices, virtual machines and containers from a single cache
--------------------------------------------------------------------------

People want to download the data once and reuse them among the whole LAN.
1) "Install/upgrade etc." on all representative systems
2) but don't remove anything, or remove just those packages that are not needed any more (based on access times or using a depsolver)

Here I think that it shouldn't be needed to hardlink the packages into DNF's cache since the instance which downloaded them probably does not need them any more. It's the other devices which *may* need them. On the other hand, DNF needs to know which packages are allowed to be cleaned once they become unneeded (according to depsolver and links count).

Maybe this should be some other tool with plugins for all package managers.


Undo/downgrade
--------------

Fedora removes the old packages from the repositories but people sometimes need to undo a transaction or downgrade a package.
1) "Install/upgrade etc."
2) but remove only those packages that were persisted on the list but were *not* installed during the last successful transaction

This is the same as the "Install/upgrade etc." case, just the DNF's logic includes an additional condition. Also this may be a task for another tool.



And the problem of sharing the cache between multiple package managers and multiple users is solved out of the box, IIUIC.



What do you think? Are these use cases reasonable from the CAShe's POV? What do you think about the brief implementation proposals?



And some ideas:
- I think that DNF should, by default, trigger the cleanup just for those packages of which it removed the hardlinks (since the operation may not be cheap).
- I think that there should be some other way how to mark that a file should stay in CAShe. E.g. in the "makecache" case, a user may clean the DNF's cache for some reason. Then something may trigger the automatic cleanup. It will remove the metadata even though they are still useful for DNF and for the other package managers that were not executed for some time, if the data are fresh. The same goes for the "multiple devices" and "undo/downgrade" cases. Packages can be installed using multiple package managers and all of them should be able to contribute to these repositories for the other devices or to allow the future downgrades. If the metadata were marked as "latest metadata for given repository" and the packages as "packages for random/remote usage", all the package managers could better collaborate to achieve the goal. But as I said, it can be solved by another tool with plugins for all the package managers. Also sometimes, IIUIC, people just make repositories from the cached packages (so they treat such cache as a repository, not as a shared cache) and let the other devices upgrade just from these repositories (while disabling Fedora repos so that the devices uses only tested packages); in that case, the packages can be hardlinked there in order to allow all the package managers collaborate there. The same can be applied for the repositories metadata. There can be a shared directory with the latest metadata for each repository (and $RELEASEVER etc.).

Best regards
-- 
Radek Holý
Associate Software Engineer
Software Management Team
Red Hat Czech


More information about the Rpm-ecosystem mailing list