[Rpm-ecosystem] DNF's use cases of CAShe

Radek Holy rholy at redhat.com
Tue Sep 29 22:55:04 UTC 2015



----- Original Message -----
> From: "Radek Holy" <rholy at redhat.com>
> To: "James Antill" <james at fedoraproject.org>
> Cc: rpm-ecosystem at lists.rpm.org
> Sent: Friday, September 11, 2015 5:33:52 PM
> Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> 
> 
> 
> ----- Original Message -----
> > From: "Radek Holy" <rholy at redhat.com>
> > To: rpm-ecosystem at lists.rpm.org
> > Sent: Friday, July 3, 2015 11:47:06 AM
> > Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "James Antill" <james at fedoraproject.org>
> > > To: "rpm-ecosystem" <rpm-ecosystem at lists.rpm.org>
> > > Sent: Friday, July 3, 2015 5:27:47 AM
> > > Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> > > 
> > > On Thu, 2015-07-02 at 01:20 -0400, Radek Holy wrote:
> > > > > On Wed, 2015-07-01 at 09:10 -0400, Radek Holy wrote:
> > > > >  The point of the CAShe is that when you are about to download
> > > > >  something
> > > > > with a checksum of XYZ you do:
> > > > > 
> > > > > 1. does object with checksum XYZ exist in the CAShe.
> > > > > 
> > > > > 2a. If yes, then get it into my program cache.
> > > > > 
> > > > > 2b. If no, then I download it. When done I can put it into the CAShe.
> > > > > 
> > > > > ...where hopefully 2a and 2b can use hardlinks, to save diskspace and
> > > > > do
> > > > > automatic tracking of inuse objects.
> > > > >  I wouldn't trigger the CAShe cleanup here, esp. as you didn't do
> > > > > anything that would need cleaning up.
> > > > 
> > > > It removes the hardlinks to the obsoleted metadata if there are any.
> > > > Even after that you wouldn't trigger the cleanup? (I don't insist on
> > > > it, I just expected that this is what would users prefer.)
> > > 
> > >  I'm not sure if it's a good idea to do a cleanup when metadata is
> > > deleted. I didn't do it on the yum side because it's very likely that in
> > > Fedora you'll be installing an update at some point in the near future,
> > > and we can cleanup a few extra metadata files then. It might be a good
> > > idea to do it more often though, if only to stop weird edge cases.
> > > 
> > > [...]
> > > > I'd just remove the hardlinks from the DNF's cache and trigger the
> > > > cleanup. Is this wrong?
> > > 
> > >  Ok, yeh that's good, I thought you meant manually removing data from
> > > the CAShe after you deleted it in DNF.
> > > 
> > > > > > Upgrade other devices, virtual machines and containers from a
> > > > > > single
> > > > > > cache
> > > > > > --------------------------------------------------------------------------
> > > > > > 
> > > > > > People want to download the data once and reuse them among the
> > > > > > whole
> > > > > > LAN.
> > > > > > 1) "Install/upgrade etc." on all representative systems
> > > > > > 2) but don't remove anything, or remove just those packages that
> > > > > > are
> > > > > > not needed any more (based on access times or using a depsolver)
> > > > > >
> > > > > > Here I think that it shouldn't be needed to hardlink the packages
> > > > > > into
> > > > > > DNF's cache since the instance which downloaded them probably does
> > > > > > not
> > > > > > need them any more.
> > > > > 
> > > > >  I'm not sure what you mean here. For each system just download them
> > > > >  as
> > > > > you would, and delete them from DNF's cache as you would.
> > > > >  Then for each machine either have the CAShe mounted over NFS, or use
> > > > > the cashe rsync-to/rsync-from commands.
> > > > 
> > > > So, how would you set up a package manager and CAShe (and potentially
> > > > every other software which uses CAShe) to make sure that every single
> > > > package is downloaded only once in a potentially inhomogeneous
> > > > network?
> > > 
> > >  As I said above, you just set it up as normal and use a big NFS store
> > > or use the rsync mirroring. The objects are accessed by checksum, so as
> > > long as that doesn't change all the programs/hosts can access the same
> > > data.
> > 
> > Oh, I forgot that in this case, DNF will be configured to use a different
> > CAShe instance than the other non-packaging software.
> > 
> > > > > > Undo/downgrade
> > > > > > --------------
> > > > > > 
> > > > > > Fedora removes the old packages from the repositories but people
> > > > > > sometimes need to undo a transaction or downgrade a package.
> > > > > > 1) "Install/upgrade etc."
> > > > > > 2) but remove only those packages that were persisted on the list
> > > > > > but
> > > > > > were *not* installed during the last successful transaction
> > > > > > 
> > > > > > This is the same as the "Install/upgrade etc." case, just the DNF's
> > > > > > logic includes an additional condition. Also this may be a task for
> > > > > > another tool.
> > > > > 
> > > > >  I'm not sure if you are trying to implement some kind of
> > > > >  hidden/shadow
> > > > > repos. with the CAShe data here, or something?
> > > > >  If you want to be able to download/downgrade upgraded Fedora
> > > > >  packages
> > > > > then you also want to implement something similar to the "yum-local"
> > > > > plugin. I wouldn't recommend using CAShe as a backend for this
> > > > > though.
> > > > 
> > > > Yes, that's it. I wanted to make the "local" plugin use CAShe.
> > > 
> > >  Having the plugin use it as well, is ok (but I'm not sure it provides
> > > much benefit, as the local repo. is local anyway).
> > >  The plugin can't use only the CAShe though as it needs it's own primary
> > > storage to control the lifetimes of the packages in the local repo. (Eg.
> > > last N versions of each package).
> > 
> > Yes, sure, it needs to keep the hardlinks in it's internal cache and
> > repeatedly check the number of versions etc. The benefit of having it
> > backed
> > by CAShe is that even other package managers will be able to perform the
> > downgrades. What is missing is that it wouldn't take into account the
> > packages installed by other package managers and that's why I wonder about
> > another specialized tool. But even without yet another tool, the user
> > experience will improve.
> > 
> > > > >  Again, just treat it as it works in DNF now. If the package is
> > > > > available from a repo. with a checksum, then you don't need to
> > > > > download
> > > > > it if you can look it up in the CAShe.
> > > > > 
> > > > 
> > > > In this case, you need to have every single package which have ever
> > > > been installed in CAShe. How would you achieve that if not the way I
> > > > proposed.
> > > 
> > >  All the packages go into the CAShe, if the user configures the storage
> > > to be big enough then they'll stay there ... if not they get removed.
> > 
> > So you wouldn't make the hardlinks? Then there is again the problem with
> > the
> > priority. Other software may have stored into CAShe another less important
> > content (in this case, the old packages are very important) which is not
> > linked anymore but because it was stored later, CAShe will clean the old
> > packages. Or do you assume that the packages are stored in another CAShe
> > instance?
> > 
> > > > >  No, don't explicitly remove anything.
> > > > >  You can decide not to call the cleanup operation unless you have
> > > > > removed packages from DNF's cache (presumably due to a transaction),
> > > > > to
> > > > > not do the "expensive" operation.
> > > > 
> > > > if I do unlink something, then I believe that I should be able to ask
> > > > CAShe to check whether the given content is needed somewhere else and
> > > > if not, clean it. But since there can be many unneeded items in CAShe,
> > > > I don't want to force user to wait for the general cleanup after every
> > > > successful "dnf upgrade".
> > > 
> > >  The question mostly isn't "is this needed anymore" the question is "if
> > > we need to delete something which of the items we have are the least
> > > likely to be needed", and to answer that we need to look at everything
> > > and what the user configured limits/policy are.
> > 
> > Right, good point. So, we can start with this and check how it works in
> > practice.
> > 
> > > [...]
> > > >  But the sysadmin should know then that they shouldn't set the CAShe's
> > > > time limit below the longest expiration period of all the enabled
> > > > repositories if they don't want to re-download the metadata again (in
> > > > case they are out of disk space and run the CAShe cleanup often).
> > > 
> > >  One thing here is that CAShe doesn't have a timelimit in a way that
> > > would do that, data isn't deleted _just because_ it's N days old.
> > 
> > Sure. We'll see how it works in practice; i.e. what and how much content is
> > being stored in CAShe and how often it is accessed on regular machines.
> > 
> > > 
> > > >  I mean, there might be less important data in CAShe than the
> > > > repository metadata (even if those data were accessed later) which
> > > > should be removed first if the limits are exceeded and CAShe currently
> > > > cannot recognize the priority of the content.
> > > 
> > >  I mean ... this is a problem with all caches that aren't clairvoyant,
> > > and any priorities will be different for different usecases so I didn't
> > > try that atm. (in theory you could hack it using utime, but again ...).
> > >  I'm assuming that LRU is going to be better than MRU, and if you want
> > > to keep a lot of stuff you can always configure the storage size to be
> > > bigger etc. (disk is cheap in this case).
> > 
> > Yes, it depends on the other software which is going to be integrated with
> > CAShe. I'm definitely not asking for a privilege of packaging-like data in
> > CAShe. I just wondered about a possibility to attach a priority (either
> > subjective or objective) to given content but let's not complicate it now.
> > --
> > Radek Holý
> > Associate Software Engineer
> > Software Management Team
> > Red Hat Czech
> 
> James,
> 
> I was told that you and Tomas are going to integrate librepo (hence DNF) with
> CAShe. Is it true?
> 
> Thank you in advance

Ping?
-- 
Radek Holý
Associate Software Engineer
Software Management Team
Red Hat Czech


More information about the Rpm-ecosystem mailing list