[Rpm-ecosystem] Containerization using rpm

Neil Horman nhorman at tuxdriver.com
Wed May 20 19:50:36 UTC 2015


On Tue, May 19, 2015 at 04:55:35PM +0200, Tomáš Smetana wrote:
> Hi.
> 
> Sorry for the slow response... 
> 
> Dne Fri, 15 May 2015 09:07:43 -0400
> Neil Horman <nhorman at tuxdriver.com> napsal(a):
> 
> ...
> > > I agree completely. However my impression is that "quicker" is actually
> > > the point of containers (and "dirtier" is the usual side-effect). I've
> > > already mentioned this: "time to market" is the cost function for
> > > container application developers. And if we want them to use rpm we need
> > > to offer them solution that would not be worse than the competing ones in
> > > this regard.
> > > 
> > 
> > Thats an excellent point. I too hear that most people want containers that
> > are 'quick' to assemble, because getting a service stood up ASAP is
> > financially desireous.  I wonder if, given our proposed use of rpm here, it
> > might be worth trying to change the narrative such that containers don't
> > have to be quick to assemble if they are instead quick to tailor to your
> > needs?  That is to say, if we allow for a container to be built and curated
> > in a slow path of development, but provide an interface that allows for
> > derivative containers to quickly customize aspects that the parent
> > container marks as being customizable, would that be a worthwhile endeavor?
> 
> Maybe. It's up to the users... However if producing e.g. a Docker image would
> be significantly easier than producing a "container rpm", then I wouldn't bet
> on rpm.  And if there are no images to modify, the ease of modification
> doesn't matter...
> 
Images to modify I think is really an adoption issue, something that is simply a
non issue at critical mass.  Having done both, I'm not sure I agree that
assembling a docker container is more or less easy than an rpm based container,
but I'm hoping this tool suite I'm working on will make that relatively simply.
I guess we will have to see.

At the moment, I've started a branch to develop ideas for derivative containers.
My first effort is to use subvolume snapshots in btrfs.  Not sure what kind of
legs it will have, but if you or anyone is interested, it will hopefully be up
and running in the next week or two.

> > > There is one more thing that anybody who tried to install a container from
> > > distribution packages encountered: the resulting containers were rather
> > > huge (314 MB only for httpd in my current test case...). Being able to
> > > remove the
> > Yup, the test container manifest that I have in freight is an httpd server,
> > and it clocks in right at 314MB.
> > 
> > > unnecessary stuff and still not lose the possibility to use rpm for the
> > > content management might actually make the "dirtier" aspect less
> > > dangerous. I could be able to find out there is a security update for the
> > > bits I ship in my container for example...
> > > 
> > Hm, do you think it would be worth adding a strip stage to the container
> > build, that could let a user specify a list of files to remove from the
> > container prior to packaging?  That would certainly be a dirty hack, but it
> > would allow for reduction of size that was custom to a users needs, and if
> > you did it in the input manifest, the freight-builder tool could easily scan
> > the list and warn/fail the operation if you tried to remove 'crtical files'
> > like the rpm database, so you would be prevented from removing rpm query
> > functionality.
> 
> I think the users already do remove some files manually. At least those I
> know. So having this ability could be useful.
> 
Perhaps.  Docker containers seem to have a wide variance in their assembly
stategy though.  Some are highly customized and tailored for specific use (i.e.
don't use feature X of this container service because we removed the support
files for them).  Others are huge and bulky, because they are generated by just
doing yum installs, and are very robust for it (like the ones I'm building)

As I move along this path, I'm finding that the subvolume approach might have
benefits in that you derivate ad nauseum I think.  That is to say, you can
package just systemd in a single container, and httpd in a derivative container,
with a custom configuration in the derivation of that container, and each parent
container can be shared among several users, which is a nice space savings, and
your selinux contexts can be kept in tact.

> > > I was thinking also about another approach: merge the rpm whose file I
> > > modified into the "container rpm". Ie., for all the modified or removed
> > > non-config files do something like 'rpm -e --justdb' of their owners and
> > > then the subsequent container tree scan would identify them as unpackaged
> > > and add to my new container (it should be possible to save the scriptlets
> > > before the package removals too)... This should be possible without
> > > modifying the rpm tools but the resulting package would be an
> > > extraordinary mess and I'm afraid I would end-up with a single all-in-one
> > > rpm file quite often. The main advantage (tiny rpm that can compose its
> > > own container) would disappear.
> > > 
> > 
> > Thats an interesting idea.  Another variant of the idea along the same lines
> > would be to create a derivative rpm with custom files, that does various
> > rpm/yum manipulations as part of a post install script.  That is to say,
> > imagine you have a container that wants to run httpd, and you want it to run
> > your custom website, which uses postgers, but the generic httpd container
> > ships with mysql. What if your custom container had a spec file that looked
> > like this:
> 
> Well... In the world of "microservices" such a situation should not happen:
> 
> * Container is exactly one service (IIRC Kubernetes even assumes a container
>   has only one network port open)
> * If more services are required for an application, it gets composed from more
>   containers and those are then glued together by some management
>   infrastructure (by creating their private virtual network, mounting common
>   filesystem, etc.), the customization is there just to add configuration or
>   content (web site files to httpd container, altered my.cnf in the database
>   container)
> 
> And in my world of "self-composing containers" it should not happen too,
> of course. :)
> 

Right, the way kubernetes does this is something of a hack (I suppose thats the
best word for it).  You codify the need for multiple applications in a service
by specifying them in a pod, and then kube just mandates they run on the same
host and drops them in the same network namespace, so they can talk over
loopback.  I personally see that as toeing the isolation line. If you want your
applications to talk, put them in the same container, if you want to isolate
them, put them in different containers.  I might be naive here, but by using the
above subvolume snapshots, I think composing the right combinations of
containers isn't that huge a deal.

> Eventually it would be nice if the management tooling could support anything
> between "microservice" to full OS container (complete userspace). Which is
> why I wanted to package only the changed parts of any container. Then it
> would be up to the user where to install them (separate or the same
> container, even outside of any container should work).
> 
I agree.  I lean toward the more robust container side, just because I hate the
idea of packaging a service that doesn't have all the advertized components
available, but I may be in the minority there

Neil

> > 
> > Name: mywebsite
> > Version: 1
> > Release: 1
> > Requires: generic-httpd-container # houses httpd/mysql/php
> > Source0: mywebsite-html.tbz2 # extracts to /var/www
> > Source1: mywebsite.conf
> > 
> > %install
> > tar xvf $SOURCE0
> > 
> > %postinstall
> > yum erase mysql
> > yum install postgres
> > cp $SOURCE1 /etc/httpd/conf.d/
> > #any other config file edits go here
> > 
> > #make the container disk image a bit more lightweight
> > rpm -e bash libreoffice mongodb supertuxkart foobar gcc 
> 
> This is why I think composing container from small pieces is better than
> having to purge a big generic one.  If the bits vendor is trustworthy (unlike
> anonymous images in the upstream Docker registry) I don't have to worry about
> incompatibilities but get advantage of not having to ship everything on my
> own, include the latest fixes and eventually be able to make partial updates.
> (I know it's being considered non-issue right now but I expect people to
> discover soon that having to re-deploy several hundreds of MB each time there
> is a security update of a small library might be actually quite expensive.)
> 
> > Its incomplete of course, but theres no reason that, on install we can't
> > customize a container on the fly.  And the database remains consistent that
> > way (allbeit changed from the parent database, which is needed).
> > 
> > The space savings on disk would be nice, though its still a bit messy
> > because you have to download the additional bits before removing them,
> > which is less than great
> 
> Yes.  I don't like that idea too much.  It would be nice if those cleanups
> weren't needed at all.  There is some initiative to break unnecessary
> dependencies, the soft dependencies might also help. However it's going to
> take time before we'll see the real effect and the result would still be
> worse than the "micro-distributions" in this regard. So ability to track what
> the user did to the "clean" system and being able to reply those actions
> would still be valuable.  Managing containers (container images) would
> require greater flexibility than managing Linux distribution.
> 
> Regards,
> -- 
> Tomáš Smetana
> 


More information about the Rpm-ecosystem mailing list