[Rpm-ecosystem] DNF's use cases of CAShe
Radek Holy
rholy at redhat.com
Fri Jul 3 09:47:06 UTC 2015
----- Original Message -----
> From: "James Antill" <james at fedoraproject.org>
> To: "rpm-ecosystem" <rpm-ecosystem at lists.rpm.org>
> Sent: Friday, July 3, 2015 5:27:47 AM
> Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
>
> On Thu, 2015-07-02 at 01:20 -0400, Radek Holy wrote:
> > > On Wed, 2015-07-01 at 09:10 -0400, Radek Holy wrote:
> > > The point of the CAShe is that when you are about to download something
> > > with a checksum of XYZ you do:
> > >
> > > 1. does object with checksum XYZ exist in the CAShe.
> > >
> > > 2a. If yes, then get it into my program cache.
> > >
> > > 2b. If no, then I download it. When done I can put it into the CAShe.
> > >
> > > ...where hopefully 2a and 2b can use hardlinks, to save diskspace and do
> > > automatic tracking of inuse objects.
> > > I wouldn't trigger the CAShe cleanup here, esp. as you didn't do
> > > anything that would need cleaning up.
> >
> > It removes the hardlinks to the obsoleted metadata if there are any.
> > Even after that you wouldn't trigger the cleanup? (I don't insist on
> > it, I just expected that this is what would users prefer.)
>
> I'm not sure if it's a good idea to do a cleanup when metadata is
> deleted. I didn't do it on the yum side because it's very likely that in
> Fedora you'll be installing an update at some point in the near future,
> and we can cleanup a few extra metadata files then. It might be a good
> idea to do it more often though, if only to stop weird edge cases.
>
> [...]
> > I'd just remove the hardlinks from the DNF's cache and trigger the
> > cleanup. Is this wrong?
>
> Ok, yeh that's good, I thought you meant manually removing data from
> the CAShe after you deleted it in DNF.
>
> > > > Upgrade other devices, virtual machines and containers from a single
> > > > cache
> > > > --------------------------------------------------------------------------
> > > >
> > > > People want to download the data once and reuse them among the whole
> > > > LAN.
> > > > 1) "Install/upgrade etc." on all representative systems
> > > > 2) but don't remove anything, or remove just those packages that are
> > > > not needed any more (based on access times or using a depsolver)
> > > >
> > > > Here I think that it shouldn't be needed to hardlink the packages into
> > > > DNF's cache since the instance which downloaded them probably does not
> > > > need them any more.
> > >
> > > I'm not sure what you mean here. For each system just download them as
> > > you would, and delete them from DNF's cache as you would.
> > > Then for each machine either have the CAShe mounted over NFS, or use
> > > the cashe rsync-to/rsync-from commands.
> >
> > So, how would you set up a package manager and CAShe (and potentially
> > every other software which uses CAShe) to make sure that every single
> > package is downloaded only once in a potentially inhomogeneous
> > network?
>
> As I said above, you just set it up as normal and use a big NFS store
> or use the rsync mirroring. The objects are accessed by checksum, so as
> long as that doesn't change all the programs/hosts can access the same
> data.
Oh, I forgot that in this case, DNF will be configured to use a different CAShe instance than the other non-packaging software.
> > > > Undo/downgrade
> > > > --------------
> > > >
> > > > Fedora removes the old packages from the repositories but people
> > > > sometimes need to undo a transaction or downgrade a package.
> > > > 1) "Install/upgrade etc."
> > > > 2) but remove only those packages that were persisted on the list but
> > > > were *not* installed during the last successful transaction
> > > >
> > > > This is the same as the "Install/upgrade etc." case, just the DNF's
> > > > logic includes an additional condition. Also this may be a task for
> > > > another tool.
> > >
> > > I'm not sure if you are trying to implement some kind of hidden/shadow
> > > repos. with the CAShe data here, or something?
> > > If you want to be able to download/downgrade upgraded Fedora packages
> > > then you also want to implement something similar to the "yum-local"
> > > plugin. I wouldn't recommend using CAShe as a backend for this though.
> >
> > Yes, that's it. I wanted to make the "local" plugin use CAShe.
>
> Having the plugin use it as well, is ok (but I'm not sure it provides
> much benefit, as the local repo. is local anyway).
> The plugin can't use only the CAShe though as it needs it's own primary
> storage to control the lifetimes of the packages in the local repo. (Eg.
> last N versions of each package).
Yes, sure, it needs to keep the hardlinks in it's internal cache and repeatedly check the number of versions etc. The benefit of having it backed by CAShe is that even other package managers will be able to perform the downgrades. What is missing is that it wouldn't take into account the packages installed by other package managers and that's why I wonder about another specialized tool. But even without yet another tool, the user experience will improve.
> > > Again, just treat it as it works in DNF now. If the package is
> > > available from a repo. with a checksum, then you don't need to download
> > > it if you can look it up in the CAShe.
> > >
> >
> > In this case, you need to have every single package which have ever
> > been installed in CAShe. How would you achieve that if not the way I
> > proposed.
>
> All the packages go into the CAShe, if the user configures the storage
> to be big enough then they'll stay there ... if not they get removed.
So you wouldn't make the hardlinks? Then there is again the problem with the priority. Other software may have stored into CAShe another less important content (in this case, the old packages are very important) which is not linked anymore but because it was stored later, CAShe will clean the old packages. Or do you assume that the packages are stored in another CAShe instance?
> > > No, don't explicitly remove anything.
> > > You can decide not to call the cleanup operation unless you have
> > > removed packages from DNF's cache (presumably due to a transaction), to
> > > not do the "expensive" operation.
> >
> > if I do unlink something, then I believe that I should be able to ask
> > CAShe to check whether the given content is needed somewhere else and
> > if not, clean it. But since there can be many unneeded items in CAShe,
> > I don't want to force user to wait for the general cleanup after every
> > successful "dnf upgrade".
>
> The question mostly isn't "is this needed anymore" the question is "if
> we need to delete something which of the items we have are the least
> likely to be needed", and to answer that we need to look at everything
> and what the user configured limits/policy are.
Right, good point. So, we can start with this and check how it works in practice.
> [...]
> > But the sysadmin should know then that they shouldn't set the CAShe's
> > time limit below the longest expiration period of all the enabled
> > repositories if they don't want to re-download the metadata again (in
> > case they are out of disk space and run the CAShe cleanup often).
>
> One thing here is that CAShe doesn't have a timelimit in a way that
> would do that, data isn't deleted _just because_ it's N days old.
Sure. We'll see how it works in practice; i.e. what and how much content is being stored in CAShe and how often it is accessed on regular machines.
>
> > I mean, there might be less important data in CAShe than the
> > repository metadata (even if those data were accessed later) which
> > should be removed first if the limits are exceeded and CAShe currently
> > cannot recognize the priority of the content.
>
> I mean ... this is a problem with all caches that aren't clairvoyant,
> and any priorities will be different for different usecases so I didn't
> try that atm. (in theory you could hack it using utime, but again ...).
> I'm assuming that LRU is going to be better than MRU, and if you want
> to keep a lot of stuff you can always configure the storage size to be
> bigger etc. (disk is cheap in this case).
Yes, it depends on the other software which is going to be integrated with CAShe. I'm definitely not asking for a privilege of packaging-like data in CAShe. I just wondered about a possibility to attach a priority (either subjective or objective) to given content but let's not complicate it now.
--
Radek Holý
Associate Software Engineer
Software Management Team
Red Hat Czech
More information about the Rpm-ecosystem
mailing list