[Rpm-ecosystem] DNF's use cases of CAShe

Jan Zelený jzeleny at redhat.com
Wed Sep 30 07:33:28 UTC 2015


On 29. 9. 2015 at 18:55:04, Radek Holy wrote:
> ----- Original Message -----
> 
> > From: "Radek Holy" <rholy at redhat.com>
> > To: "James Antill" <james at fedoraproject.org>
> > Cc: rpm-ecosystem at lists.rpm.org
> > Sent: Friday, September 11, 2015 5:33:52 PM
> > Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> > 
> > 
> > 
> > ----- Original Message -----
> > 
> > > From: "Radek Holy" <rholy at redhat.com>
> > > To: rpm-ecosystem at lists.rpm.org
> > > Sent: Friday, July 3, 2015 11:47:06 AM
> > > Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > 
> > > > From: "James Antill" <james at fedoraproject.org>
> > > > To: "rpm-ecosystem" <rpm-ecosystem at lists.rpm.org>
> > > > Sent: Friday, July 3, 2015 5:27:47 AM
> > > > Subject: Re: [Rpm-ecosystem] DNF's use cases of CAShe
> > > > 
> > > > On Thu, 2015-07-02 at 01:20 -0400, Radek Holy wrote:
> > > > > > On Wed, 2015-07-01 at 09:10 -0400, Radek Holy wrote:
> > > > > >  The point of the CAShe is that when you are about to download
> > > > > >  something
> > > > > > 
> > > > > > with a checksum of XYZ you do:
> > > > > > 
> > > > > > 1. does object with checksum XYZ exist in the CAShe.
> > > > > > 
> > > > > > 2a. If yes, then get it into my program cache.
> > > > > > 
> > > > > > 2b. If no, then I download it. When done I can put it into the
> > > > > > CAShe.
> > > > > > 
> > > > > > ...where hopefully 2a and 2b can use hardlinks, to save diskspace
> > > > > > and
> > > > > > do
> > > > > > automatic tracking of inuse objects.
> > > > > > 
> > > > > >  I wouldn't trigger the CAShe cleanup here, esp. as you didn't do
> > > > > > 
> > > > > > anything that would need cleaning up.
> > > > > 
> > > > > It removes the hardlinks to the obsoleted metadata if there are any.
> > > > > Even after that you wouldn't trigger the cleanup? (I don't insist on
> > > > > it, I just expected that this is what would users prefer.)
> > > >  
> > > >  I'm not sure if it's a good idea to do a cleanup when metadata is
> > > > 
> > > > deleted. I didn't do it on the yum side because it's very likely that
> > > > in
> > > > Fedora you'll be installing an update at some point in the near
> > > > future,
> > > > and we can cleanup a few extra metadata files then. It might be a good
> > > > idea to do it more often though, if only to stop weird edge cases.
> > > > 
> > > > [...]
> > > > 
> > > > > I'd just remove the hardlinks from the DNF's cache and trigger the
> > > > > cleanup. Is this wrong?
> > > >  
> > > >  Ok, yeh that's good, I thought you meant manually removing data from
> > > > 
> > > > the CAShe after you deleted it in DNF.
> > > > 
> > > > > > > Upgrade other devices, virtual machines and containers from a
> > > > > > > single
> > > > > > > cache
> > > > > > > ----------------------------------------------------------------
> > > > > > > ----------
> > > > > > > 
> > > > > > > People want to download the data once and reuse them among the
> > > > > > > whole
> > > > > > > LAN.
> > > > > > > 1) "Install/upgrade etc." on all representative systems
> > > > > > > 2) but don't remove anything, or remove just those packages that
> > > > > > > are
> > > > > > > not needed any more (based on access times or using a depsolver)
> > > > > > > 
> > > > > > > Here I think that it shouldn't be needed to hardlink the
> > > > > > > packages
> > > > > > > into
> > > > > > > DNF's cache since the instance which downloaded them probably
> > > > > > > does
> > > > > > > not
> > > > > > > need them any more.
> > > > > >  
> > > > > >  I'm not sure what you mean here. For each system just download
> > > > > >  them
> > > > > >  as
> > > > > > 
> > > > > > you would, and delete them from DNF's cache as you would.
> > > > > > 
> > > > > >  Then for each machine either have the CAShe mounted over NFS, or
> > > > > >  use
> > > > > > 
> > > > > > the cashe rsync-to/rsync-from commands.
> > > > > 
> > > > > So, how would you set up a package manager and CAShe (and
> > > > > potentially
> > > > > every other software which uses CAShe) to make sure that every
> > > > > single
> > > > > package is downloaded only once in a potentially inhomogeneous
> > > > > network?
> > > >  
> > > >  As I said above, you just set it up as normal and use a big NFS store
> > > > 
> > > > or use the rsync mirroring. The objects are accessed by checksum, so
> > > > as
> > > > long as that doesn't change all the programs/hosts can access the same
> > > > data.
> > > 
> > > Oh, I forgot that in this case, DNF will be configured to use a
> > > different
> > > CAShe instance than the other non-packaging software.
> > > 
> > > > > > > Undo/downgrade
> > > > > > > --------------
> > > > > > > 
> > > > > > > Fedora removes the old packages from the repositories but people
> > > > > > > sometimes need to undo a transaction or downgrade a package.
> > > > > > > 1) "Install/upgrade etc."
> > > > > > > 2) but remove only those packages that were persisted on the
> > > > > > > list
> > > > > > > but
> > > > > > > were *not* installed during the last successful transaction
> > > > > > > 
> > > > > > > This is the same as the "Install/upgrade etc." case, just the
> > > > > > > DNF's
> > > > > > > logic includes an additional condition. Also this may be a task
> > > > > > > for
> > > > > > > another tool.
> > > > > >  
> > > > > >  I'm not sure if you are trying to implement some kind of
> > > > > >  hidden/shadow
> > > > > > 
> > > > > > repos. with the CAShe data here, or something?
> > > > > > 
> > > > > >  If you want to be able to download/downgrade upgraded Fedora
> > > > > >  packages
> > > > > > 
> > > > > > then you also want to implement something similar to the
> > > > > > "yum-local"
> > > > > > plugin. I wouldn't recommend using CAShe as a backend for this
> > > > > > though.
> > > > > 
> > > > > Yes, that's it. I wanted to make the "local" plugin use CAShe.
> > > >  
> > > >  Having the plugin use it as well, is ok (but I'm not sure it provides
> > > > 
> > > > much benefit, as the local repo. is local anyway).
> > > > 
> > > >  The plugin can't use only the CAShe though as it needs it's own
> > > >  primary
> > > > 
> > > > storage to control the lifetimes of the packages in the local repo.
> > > > (Eg.
> > > > last N versions of each package).
> > > 
> > > Yes, sure, it needs to keep the hardlinks in it's internal cache and
> > > repeatedly check the number of versions etc. The benefit of having it
> > > backed
> > > by CAShe is that even other package managers will be able to perform the
> > > downgrades. What is missing is that it wouldn't take into account the
> > > packages installed by other package managers and that's why I wonder
> > > about
> > > another specialized tool. But even without yet another tool, the user
> > > experience will improve.
> > > 
> > > > > >  Again, just treat it as it works in DNF now. If the package is
> > > > > > 
> > > > > > available from a repo. with a checksum, then you don't need to
> > > > > > download
> > > > > > it if you can look it up in the CAShe.
> > > > > 
> > > > > In this case, you need to have every single package which have ever
> > > > > been installed in CAShe. How would you achieve that if not the way I
> > > > > proposed.
> > > >  
> > > >  All the packages go into the CAShe, if the user configures the
> > > >  storage
> > > > 
> > > > to be big enough then they'll stay there ... if not they get removed.
> > > 
> > > So you wouldn't make the hardlinks? Then there is again the problem with
> > > the
> > > priority. Other software may have stored into CAShe another less
> > > important
> > > content (in this case, the old packages are very important) which is not
> > > linked anymore but because it was stored later, CAShe will clean the old
> > > packages. Or do you assume that the packages are stored in another CAShe
> > > instance?
> > > 
> > > > > >  No, don't explicitly remove anything.
> > > > > >  You can decide not to call the cleanup operation unless you have
> > > > > > 
> > > > > > removed packages from DNF's cache (presumably due to a
> > > > > > transaction),
> > > > > > to
> > > > > > not do the "expensive" operation.
> > > > > 
> > > > > if I do unlink something, then I believe that I should be able to
> > > > > ask
> > > > > CAShe to check whether the given content is needed somewhere else
> > > > > and
> > > > > if not, clean it. But since there can be many unneeded items in
> > > > > CAShe,
> > > > > I don't want to force user to wait for the general cleanup after
> > > > > every
> > > > > successful "dnf upgrade".
> > > >  
> > > >  The question mostly isn't "is this needed anymore" the question is
> > > >  "if
> > > > 
> > > > we need to delete something which of the items we have are the least
> > > > likely to be needed", and to answer that we need to look at everything
> > > > and what the user configured limits/policy are.
> > > 
> > > Right, good point. So, we can start with this and check how it works in
> > > practice.
> > > 
> > > > [...]
> > > > 
> > > > >  But the sysadmin should know then that they shouldn't set the
> > > > >  CAShe's
> > > > > 
> > > > > time limit below the longest expiration period of all the enabled
> > > > > repositories if they don't want to re-download the metadata again
> > > > > (in
> > > > > case they are out of disk space and run the CAShe cleanup often).
> > > >  
> > > >  One thing here is that CAShe doesn't have a timelimit in a way that
> > > > 
> > > > would do that, data isn't deleted _just because_ it's N days old.
> > > 
> > > Sure. We'll see how it works in practice; i.e. what and how much content
> > > is
> > > being stored in CAShe and how often it is accessed on regular machines.
> > > 
> > > > >  I mean, there might be less important data in CAShe than the
> > > > > 
> > > > > repository metadata (even if those data were accessed later) which
> > > > > should be removed first if the limits are exceeded and CAShe
> > > > > currently
> > > > > cannot recognize the priority of the content.
> > > >  
> > > >  I mean ... this is a problem with all caches that aren't clairvoyant,
> > > > 
> > > > and any priorities will be different for different usecases so I
> > > > didn't
> > > > try that atm. (in theory you could hack it using utime, but again
> > > > ...).
> > > > 
> > > >  I'm assuming that LRU is going to be better than MRU, and if you want
> > > > 
> > > > to keep a lot of stuff you can always configure the storage size to be
> > > > bigger etc. (disk is cheap in this case).
> > > 
> > > Yes, it depends on the other software which is going to be integrated
> > > with
> > > CAShe. I'm definitely not asking for a privilege of packaging-like data
> > > in
> > > CAShe. I just wondered about a possibility to attach a priority (either
> > > subjective or objective) to given content but let's not complicate it
> > > now.
> > > --
> > > Radek Holý
> > > Associate Software Engineer
> > > Software Management Team
> > > Red Hat Czech
> > 
> > James,
> > 
> > I was told that you and Tomas are going to integrate librepo (hence DNF)
> > with CAShe. Is it true?
> > 
> > Thank you in advance
> 
> Ping?

Long story short, we decided to temporarily suspend this effort in favor of 
integration of libhif and hawkey/librepo. The reason is that the conditions 
for CAShe integration can change quite a bit in context of libhif.

Thanks
Jan


More information about the Rpm-ecosystem mailing list