[Rpm-ecosystem] DNF's use cases of CAShe

Radek Holy rholy at redhat.com
Fri Sep 11 15:33:52 UTC 2015



----- Original Message -----
> From: "Radek Holy" <rholy at redhat.com>
> To: rpm-ecosystem at lists.rpm.org
> Sent: Friday, July 3, 2015 11:47:06 AM
> Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> 
> 
> 
> ----- Original Message -----
> > From: "James Antill" <james at fedoraproject.org>
> > To: "rpm-ecosystem" <rpm-ecosystem at lists.rpm.org>
> > Sent: Friday, July 3, 2015 5:27:47 AM
> > Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> > 
> > On Thu, 2015-07-02 at 01:20 -0400, Radek Holy wrote:
> > > > On Wed, 2015-07-01 at 09:10 -0400, Radek Holy wrote:
> > > >  The point of the CAShe is that when you are about to download
> > > >  something
> > > > with a checksum of XYZ you do:
> > > > 
> > > > 1. does object with checksum XYZ exist in the CAShe.
> > > > 
> > > > 2a. If yes, then get it into my program cache.
> > > > 
> > > > 2b. If no, then I download it. When done I can put it into the CAShe.
> > > > 
> > > > ...where hopefully 2a and 2b can use hardlinks, to save diskspace and
> > > > do
> > > > automatic tracking of inuse objects.
> > > >  I wouldn't trigger the CAShe cleanup here, esp. as you didn't do
> > > > anything that would need cleaning up.
> > > 
> > > It removes the hardlinks to the obsoleted metadata if there are any.
> > > Even after that you wouldn't trigger the cleanup? (I don't insist on
> > > it, I just expected that this is what would users prefer.)
> > 
> >  I'm not sure if it's a good idea to do a cleanup when metadata is
> > deleted. I didn't do it on the yum side because it's very likely that in
> > Fedora you'll be installing an update at some point in the near future,
> > and we can cleanup a few extra metadata files then. It might be a good
> > idea to do it more often though, if only to stop weird edge cases.
> > 
> > [...]
> > > I'd just remove the hardlinks from the DNF's cache and trigger the
> > > cleanup. Is this wrong?
> > 
> >  Ok, yeh that's good, I thought you meant manually removing data from
> > the CAShe after you deleted it in DNF.
> > 
> > > > > Upgrade other devices, virtual machines and containers from a single
> > > > > cache
> > > > > --------------------------------------------------------------------------
> > > > > 
> > > > > People want to download the data once and reuse them among the whole
> > > > > LAN.
> > > > > 1) "Install/upgrade etc." on all representative systems
> > > > > 2) but don't remove anything, or remove just those packages that are
> > > > > not needed any more (based on access times or using a depsolver)
> > > > >
> > > > > Here I think that it shouldn't be needed to hardlink the packages
> > > > > into
> > > > > DNF's cache since the instance which downloaded them probably does
> > > > > not
> > > > > need them any more.
> > > > 
> > > >  I'm not sure what you mean here. For each system just download them as
> > > > you would, and delete them from DNF's cache as you would.
> > > >  Then for each machine either have the CAShe mounted over NFS, or use
> > > > the cashe rsync-to/rsync-from commands.
> > > 
> > > So, how would you set up a package manager and CAShe (and potentially
> > > every other software which uses CAShe) to make sure that every single
> > > package is downloaded only once in a potentially inhomogeneous
> > > network?
> > 
> >  As I said above, you just set it up as normal and use a big NFS store
> > or use the rsync mirroring. The objects are accessed by checksum, so as
> > long as that doesn't change all the programs/hosts can access the same
> > data.
> 
> Oh, I forgot that in this case, DNF will be configured to use a different
> CAShe instance than the other non-packaging software.
> 
> > > > > Undo/downgrade
> > > > > --------------
> > > > > 
> > > > > Fedora removes the old packages from the repositories but people
> > > > > sometimes need to undo a transaction or downgrade a package.
> > > > > 1) "Install/upgrade etc."
> > > > > 2) but remove only those packages that were persisted on the list but
> > > > > were *not* installed during the last successful transaction
> > > > > 
> > > > > This is the same as the "Install/upgrade etc." case, just the DNF's
> > > > > logic includes an additional condition. Also this may be a task for
> > > > > another tool.
> > > > 
> > > >  I'm not sure if you are trying to implement some kind of hidden/shadow
> > > > repos. with the CAShe data here, or something?
> > > >  If you want to be able to download/downgrade upgraded Fedora packages
> > > > then you also want to implement something similar to the "yum-local"
> > > > plugin. I wouldn't recommend using CAShe as a backend for this though.
> > > 
> > > Yes, that's it. I wanted to make the "local" plugin use CAShe.
> > 
> >  Having the plugin use it as well, is ok (but I'm not sure it provides
> > much benefit, as the local repo. is local anyway).
> >  The plugin can't use only the CAShe though as it needs it's own primary
> > storage to control the lifetimes of the packages in the local repo. (Eg.
> > last N versions of each package).
> 
> Yes, sure, it needs to keep the hardlinks in it's internal cache and
> repeatedly check the number of versions etc. The benefit of having it backed
> by CAShe is that even other package managers will be able to perform the
> downgrades. What is missing is that it wouldn't take into account the
> packages installed by other package managers and that's why I wonder about
> another specialized tool. But even without yet another tool, the user
> experience will improve.
> 
> > > >  Again, just treat it as it works in DNF now. If the package is
> > > > available from a repo. with a checksum, then you don't need to download
> > > > it if you can look it up in the CAShe.
> > > > 
> > > 
> > > In this case, you need to have every single package which have ever
> > > been installed in CAShe. How would you achieve that if not the way I
> > > proposed.
> > 
> >  All the packages go into the CAShe, if the user configures the storage
> > to be big enough then they'll stay there ... if not they get removed.
> 
> So you wouldn't make the hardlinks? Then there is again the problem with the
> priority. Other software may have stored into CAShe another less important
> content (in this case, the old packages are very important) which is not
> linked anymore but because it was stored later, CAShe will clean the old
> packages. Or do you assume that the packages are stored in another CAShe
> instance?
> 
> > > >  No, don't explicitly remove anything.
> > > >  You can decide not to call the cleanup operation unless you have
> > > > removed packages from DNF's cache (presumably due to a transaction), to
> > > > not do the "expensive" operation.
> > > 
> > > if I do unlink something, then I believe that I should be able to ask
> > > CAShe to check whether the given content is needed somewhere else and
> > > if not, clean it. But since there can be many unneeded items in CAShe,
> > > I don't want to force user to wait for the general cleanup after every
> > > successful "dnf upgrade".
> > 
> >  The question mostly isn't "is this needed anymore" the question is "if
> > we need to delete something which of the items we have are the least
> > likely to be needed", and to answer that we need to look at everything
> > and what the user configured limits/policy are.
> 
> Right, good point. So, we can start with this and check how it works in
> practice.
> 
> > [...]
> > >  But the sysadmin should know then that they shouldn't set the CAShe's
> > > time limit below the longest expiration period of all the enabled
> > > repositories if they don't want to re-download the metadata again (in
> > > case they are out of disk space and run the CAShe cleanup often).
> > 
> >  One thing here is that CAShe doesn't have a timelimit in a way that
> > would do that, data isn't deleted _just because_ it's N days old.
> 
> Sure. We'll see how it works in practice; i.e. what and how much content is
> being stored in CAShe and how often it is accessed on regular machines.
> 
> > 
> > >  I mean, there might be less important data in CAShe than the
> > > repository metadata (even if those data were accessed later) which
> > > should be removed first if the limits are exceeded and CAShe currently
> > > cannot recognize the priority of the content.
> > 
> >  I mean ... this is a problem with all caches that aren't clairvoyant,
> > and any priorities will be different for different usecases so I didn't
> > try that atm. (in theory you could hack it using utime, but again ...).
> >  I'm assuming that LRU is going to be better than MRU, and if you want
> > to keep a lot of stuff you can always configure the storage size to be
> > bigger etc. (disk is cheap in this case).
> 
> Yes, it depends on the other software which is going to be integrated with
> CAShe. I'm definitely not asking for a privilege of packaging-like data in
> CAShe. I just wondered about a possibility to attach a priority (either
> subjective or objective) to given content but let's not complicate it now.
> --
> Radek Holý
> Associate Software Engineer
> Software Management Team
> Red Hat Czech

James,

I was told that you and Tomas are going to integrate librepo (hence DNF) with CAShe. Is it true?

Thank you in advance
-- 
Radek Holý
Associate Software Engineer
Software Management Team
Red Hat Czech


More information about the Rpm-ecosystem mailing list