[Rpm-ecosystem] Containerization using rpm

Thu May 14 17:16:22 UTC 2015

On Thu, May 14, 2015 at 01:53:48PM +0200, Pavel Odvody wrote:
> On Wed, 2015-05-13 at 09:41 -0400, Neil Horman wrote:
> > On Wed, May 13, 2015 at 02:36:02PM +0200, Pavel Odvody wrote:
> > > Hey, very interesting project -- comments inline :)
> > > 
> > > On Tue, 2015-05-12 at 11:27 -0400, Neil Horman wrote:
> > > > On Tue, May 12, 2015 at 03:10:12PM +0200, Jan Zelený wrote:
> > > > > On 11. 5. 2015 at 15:09:50, Neil Horman wrote:
> > > > > > Hey all-
> > > > > > 	I was recently made aware of this list, and though I would join, due to
> > > > > > a project I'm exploring currently.  I've been working on container
> > > > > > strategies for a bit, and found that the predominant container technologies
> > > > > > were re-inventing several wheels, and so I started this:
> > > > > > 
> > > > > > http://freightagent.github.io/freight-tools/
> > > > > > 
> > > > > > Freight is a very nascent project, but its something I've been tinkering
> > > > > > with for a few weeks right now, and due to my reuse of rpm as the container
> > > > > > format, I've managed to make quick progress.  The idea behind it is to
> > > > > > create an rpm-ostree like environment (in which the entire container file
> > > > > > system is housed in an rpm, along with some metadata to allow a container
> > > > > > management system to execute instances of it.  The advantages of this
> > > > > > approach have been numerous:
> > > > > > 
> > > > > > - rpmsign allows for container verification
> > > > > > - distribution tools like yum just work to allow others to manage the
> > > > > > containers - using rpms allows the containers to be versioned
> > > > > > - by using rpm to install the file in the container itself, we can preserve
> > > > > > the rpm database internal to the container and use it to check for the need
> > > > > > for updates to container components.
> > > > > > 
> > > > > > Anywho, as I said, still very nascent, but I thought, given the email I
> > > > > > received about this list, that it might be of interest to others here
> > > > > > 
> > > > > > Best
> > > > > > Neil
> > > > > 
> > > > > Hi Neil,
> > > > > I have been wondering about the idea of rpm-based containers for quite some 
> > > > > time. It is very interesting to see that I'm not alone and that someone is 
> > > > > actually working on this. I will be very interested to see your progress.
> > > > > 
> > > > > BTW, this is exactly the kind of things this mailing list was created for so 
> > > > > thank you for sharing the information with us and feel free to send us updates 
> > > > > and/or ask any questions.
> > > > > 
> > > > 
> > > > Glad to hear it!
> > > > 
> > > > So far the tool works quite well for local installations, using yum repositories
> > > > to install and execute the test web server container I have using systemd-nspawn
> > > > (which is the other tool that I'm reusing here).
> > > > 
> > > > My roadmap over the upcomming weeks is something as follows:
> > > > 
> > > > 1) clustered use - allowing a cluster of nodes to execute containers in response
> > > > to a centeral administrator
> > > > 
> > > > 2) multi-tennancy - the ability for several tennants to share the same physical
> > > > cluster of systems
> > > > 
> > > How are you going to orchestrate these two? System-wide daemon with
> > > defined (REST) API ala Docker or simply use SSH? (or something else
> > > completely?:)
> > > 
> > each node has a daemon running on it (freight-agent), that connects to an SQL
> > database (currently only postgres, but the code is written such that others can
> > be used/implemented without too much of an issue).  Postgres provides a
> > reasonably flexible authentication mechanism, transactional processing and a
> > notification system (that has been a source of scaling issues for projects like
> > kubernetes that rely on polling).  SQL is also nice in that it allows for a
> > RESTful api to be layered on top of it easily so that client systems can
> > adminstrate via the web or directly via the command line (using the freightctl
> > utility).
> > 
> 
> +1 for postgres, I really like that DB. I wonder if freight-agent is
> easily auditable and takes care of authentication/authorization? 
> For example the Docker daemon is insane in that it runs as root, and
> doesn't do any usable logging (who/what/where).
> 
freight-agent is fairly small (less than 1500 lines of code right now), so while
its pretty messy from rapid early development, I think it is largely still
auditable, yes.  Authentication/Authorization I intentionally left outside the
scope of freight, which is to say that I rely on the database to provide that
security.  The admin and each tennant are codified as separate roles within the
db.  Regarding running as root, I'm afraid we share that limitation with Docker,
as there are several capabilities that executing a container requires some root
privlidges for, though we can certainly look at limiting risk exposure through
the use of selinux.  I have also incorporated the use of the systemd-nspawn
--user option so that containers can execute at a  lower privlidge level.
Regarding logging, freight-agent is designed to run as a command line tool, or
as a systemd unit.  Either way, logging is all done to stderr which systemd will
capture and redirect to the journal.  I have made sure that common logging
macros are used throughout, so updates to the who/what/where should be pretty
easy to handle.

> > > > 3) derivative containers - the ability to codify in the base manifest, a set of
> > > > derivative containers that rely on the base container.  The idea being that the
> > > > base container contains the base filesystem for the container, while the
> > > > derivatie container contains overridden directories for the purpose of
> > > > specifying configuration for the container.
> > > > 
> > > This sounds very much like overlayfs, or CoW semantics in general --
> > > nspawn/importd uses BTRFS rather extensively, do you plan to build this
> > > around BTRFS CoW subvolume functionality? Or do you plan to simply copy
> > > the upper layer over the lower?
> > > 
> > It does, but systemd-nspawn no longer requires btrfs subvolume snapshots (i.e.
> > you can use any fs you like as the backing store).  The use of btrfs is a nice
> > idea, but I'm not sure I want to go that route yet.  Im personally leaning
> > toward using overlayfs instead, as it can be applied to any fs. Comments/thoughts
> > appreciated here.
> 
> Hmm, I think that the situation with both overlayFS and BTRFS is almost
> the same -- novel FS that nobody seems to be willing to maintain/extend.
> My preference is BTRFS though, as it's almost like ZFS, only without
> license burden. What I like about BTRFS is that it's really a file
> system and volume manager in one, so it's actually working with
> snaphosts with ease, which I couldn't get to work under LVM (well, maybe
> devmapper would be more appropriate ... )
> I'd go with the first one to properly support selinux :)
> 
Yeah, btrfs does provide selinux capabilities, which is very nice.  Bind
mounting would do that too of course.  On the upside I'm not quite there yet
(still dealing with early database population and querying).  If you would like
to investigate ways to handle derivative containers (if thats a good term, I'd
love the help).

> > 
> > > > 4) Networking - One of the things that I've taken away from kubernetes is that
> > > > the SDN components are somewhat difficult to work with in a multi-tennant
> > > > environment.  I'd like to integrate networking features into freight directly to
> > > > better co-ordinate overlay networks to sets of containers
> > > > 
> > > One of the problems with K8s & Docker is that Docker tries to include
> > > their own SDN by default (Socketplane I guess) -- which is still far
> > > from complete -- and they've been like thwarting any effort from the
> > > outside on any 3rd party solution (Weave, Flannel ...).
> > > Which SDN is it going to be for Freight?
> > 
> > Thats a big question.  The one decision that K8s and Docker have lead me to is
> > that networking needs to have some modicum of integration with the clustering
> > technology, if you want to hope to have a true multitennant solution.  That is
> > to say, you need a way to tie container instances to a particular network at run
> > time, as configured from a central location (in my case the SQL database).
> > Otherwise multi-tennancy can only be achieved by running multiple copies of
> > flannel/docker/socketplane/etc, which is somewhat less than scalable.
> > Alternatively you have to run multiple vms on a system to represent your
> > container hosts, which is its own extra layer of latency.  What, in very general
> > terms I would like to do (and have as of yet left completely untouched) is:
> > 
> > 1) create a table in my centeral database describing various private networks
> > per tennant
> > 
> > 2) Each entry in (1) describes the type of network (physical, vxlan, geneve,
> > ipsec, etc), and general type specific parameters so that a node running an instance
> > of that container to can create an interface to attach the container to that
> > network.
> > 
> > 3) maintain another per-tennant table mapping container instance ip addresses to node ip
> > addresses for the purposes of forwarding in the l2-in-l3 tunnel case.
> > 
> > 4) Require that any container set that operates on a private network deploy an
> > infrastructure container configured to do things like serve dhcp addresses and
> > dns queries on an as needed basis.
> > 
> > I guess thats a long winded way of saying it will likely be a custom solution,
> > but to be honest, I feel like SDN is a bit to nascent to really implement any
> > reuse here.  We could OVS or some such I suppose, but IMHO the amount of
> > programming required to make OVS work in such an environment really reduces the
> > ease of use for such a solution.  Feel free to disagree/correct me here though.
> > 
> My only practical experience with SDN was snabbswitch¹ and I think it's
> cool (somehow different take on SDN than OVS). Do you have some
> abstraction for a container set? So that I can group containers into
> logical groups, but thinking about it, I guess that the logical unit
> could be a tenant?
> 
Yes, correct.  My thought was that Pods (to use the kubernetes terminiology),
was something of a false construct.  Theres nothing wrong with it mind you, but
under the covers, its really not much more than a bit of data indicating that
two containers in a pod share the same network namespace (and therefore the same
host), but live in different process (and possibly mount) namespaces.  Given the
above model in which the network namespace is decided at run time (as is the
owning host to execute a container), if a given tennant wishes to run two
containers on the same network, but in different process and mount namespaces,
they can simply execute two containers, and specify in the database that the
executing host should be the same between them, as well as the network.  The
executing host can then recognize that both pods are operating in the same
network and merge their namespaces, or simply attach them to the same overlay
network.  We might also consider being able to dynamically indicate at run time
that two disperate containers wish to share a given network namespace if we
choose (though I haven't thought through that yet).  Lastly, and I think this is
actually adventageous, this install method that relies on systemd-nspawn
provides the ability to place several applications in the same container, which
implicitly gives them atomicity of host and network namespace.  I understand
that somewhat goes against the typical containerization mantra of isolating
everything from everything else, but I think there is something pragmatic and
adventageous about, for instance installing an entire LAMP stack in a single
container so that an instance of it can run in its own network namespace, but
share the process and filesystem namespace.

Again, all very pie in the sky.  This was more about exploring how we can use
rpm to run containers, but it seems like there might be more here.

More thoughts appreciated!
Neil

> [1]: https://github.com/SnabbCo/snabbswitch/wiki
> > > 
> > > > 
> > > > I'll keep you up to date on progress, but if anyone is interested in
> > > > participating, please let me know
> > > > Best
> > > > Neil
> > > > 
> > > > > Regards
> > > > > Jan
> > > > > 
> > > > _______________________________________________
> > > > Rpm-ecosystem mailing list
> > > > Rpm-ecosystem at lists.rpm.org
> > > > http://lists.rpm.org/mailman/listinfo/rpm-ecosystem
> > > 
> > > 
> > > -- 
> > > Pavel Odvody <podvody at redhat.com>
> > > Software Engineer - EMEA ENG Developer Experience
> > > 5EC1 95C1 8E08 5BD9 9BBF 9241 3AFA 3A66 024F F68D
> > > Red Hat Czech s.r.o., Purkyňova 99/71, 612 45, Brno
> > > 
> > 
> > 
> 
> 
> -- 
> Pavel Odvody <podvody at redhat.com>
> Software Engineer - EMEA ENG Developer Experience
> 5EC1 95C1 8E08 5BD9 9BBF 9241 3AFA 3A66 024F F68D
> Red Hat Czech s.r.o., Purkyňova 99/71, 612 45, Brno
>