Running systemd within a Docker Container

I have been working on Docker for the last few months, mainly getting SELinux added to help CONTAIN Containers.

libvirt-sandbox – virt-sandbox-service

For the last couple of years I was working on a different container technology using libvirt-lxc, in addition to my regular SELinux job. I built the virt-sandbox-service tool which would carve up your host system into a bunch of service containers.  My idea was to run systemd within a container and then systemd would start services the same way inside a container as it would outside the container.  Running a virt-sandbox-service container with an Apache unit file, you only see systemd, journald and the httpd processes running.  Very little overhead, and creating a service container was simple, you only needed to specify the unit file of the service you wanted to put in the container.

Docker stopped virt-sandbox-service development:

I put virt-sandbox-service on the backburner when it became obvious that the community was moving towards Docker.

While working with Docker, I looked at the great work that Scott Collier was doing for getting services to run within a container.  Scott provides the fedora-dockerfiles package in docker with lots of “Dockerfile” examples. You can build Docker images by running “docker build” on these examples.

It seemed a little difficult, and wondered if getting systemd to run within a docker container, as I did with virt-sandbox-service, might make this simpler.

The Docker Model suggests that it is better to run a single service within a container.  If you wanted to build an application that required an Apache service and a MariaDB database, you should generate two different containers.   Some people insist on running multiple services within a container, and for this Docker suggested using the supervisord tool.  In RHEL we do not want to support supervisord, since it is written in Python, and do not want to pull a Python requirement into containers, and it is just a package used to monitor multiple services.  We already have a tool for monitoring multiple services called systemd.

Can systemd run within a Docker container?

Anyways, I went off to investigate running systemd in a docker container.

First thing I noticed was a few bug reports by others who had failed.

https://github.com/dotcloud/docker/issues/3629

https://bugzilla.redhat.com/show_bug.cgi?id=1033604

After investigating the failures, I found that systemd requires CAP_SYS_ADMIN capability but Docker drops that capability in the non privileged containers, in order to add more security.  This means for now you have to run systemd within a privileged container since privileged containers do not drop any capabilities.  There is a patch upstream to allow users to add capabilities to a docker container. Once this patch gets merged, I think you would be able to run in a non privileged container by turning on the CAP_SYS_ADMIN capability.

Running a privileged container got me a little further but systemd was still crashing within docker.  Turns out systemd insists on looking at the cgroup file system within a container.  I added the cgroup file system to the container  using the Volume mount

v /sys/fs/cgroup:/sys/fs/cgroup:ro

which allows systemd to look at the cgroup but only in readonly mode.

Systemd ran fully within the container.

But  systemd starts tons of services in the container like udev, getty logins, …  I only want to run systemd, journald, and httpd within the container, I had to stop systemd from starting these other services.

Looking at what I had done to prevent this in virt-sandbox-service, I saw that I needed to remove unit file links from the /lib/systemd/system/*wants/ and  /etc/systemd/system/*wants/ directories within a systemd based docker container.  Removing these links got me to a systemd container image that would only run systemd and journald.

SUCCESS

Here is the Dockerfile I wrote to implement a systemd based docker image.

FROM fedora:rawhide
MAINTAINER “Dan Walsh” <dwalsh@redhat.com>
ENV container docker
RUN yum -y update; yum clean all
RUN yum -y install systemd; yum clean all; \
(cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ $i == systemd-tmpfiles-setup.service ] || rm -f $i; done); \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;
VOLUME [ “/sys/fs/cgroup” ]
CMD [“/usr/sbin/init”]

Notice my DockerFile is using Fedora Rawhide, but this should also work on a RHEL7 or Fedora 20 system.  systemd likes to know that it is running within a container, and it checks the container environment variable for this.  The ENV will cause the “container” environment variable to be set.  Next make sure the image is up to date by running a yum -y update, and cleanup the cache for this image layer.  Next, install the systemd package, and start  removing all of the wants link files.  Finally tell the image it requires the /sys/fs/cgroup volume mounted into it and execute the init command.

I use this command to build the image in the directory containing the Dockerfile

docker build -t systemd_rawhide .

Building an Apache Container Image based on the systemd image.

Now I want to setup a httpd image based on this image, so I create a new Dockerfile.

FROM systemd_rawhide
RUN yum -y install httpd; yum clean all; systemctl enable httpd.service
EXPOSE 80
CMD [“/usr/sbin/init”]

Notice how simple this docker file is to use based on the systemd_rawhide image.  You only have to install the service you want and enable it.  Since this service binds to port 80, you need to specify this in the image.  Finally since systemd will be managing my service within the container you specify /usr/sbin/init as the command to run within the container.

In a different directory, put the new Dockerfile and build the image

docker build -t httpd_rawhide .

Running and testing the Apache Image. 

Now to test the htttpd application container,  start it by executing:

docker run –privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 80:80 httpd_rawhide

Note that I bind the hosts port 80 to the container port 80.  You can test it by executing:

curl http://localhost

This command should bring you the default Apache page from inside the container.

Running multiple services within a systemd based container image.

If you wanted to run multiple services, you can just install multiple services within the Dockerfile.

RUN yum -y install httpd mariadb ; yum clean all; systemctl enable httpd.service mariadb.service

systemd will start the httpd and mariadb service when you start the container.  Please test and give me feedback on how this works for you.

 

Also posted on my personal blog:

http://rhatdan.wordpress.com/2014/04/30/running-systemd-within-a-docker-container/

 


Join Red Hat Developers, a developer program for you to learn, share, and code faster – and get access to Red Hat software for your development.  The developer program and software are both free!

 

Take advantage of your Red Hat Developers membership and download RHEL today at no cost.

  1. Tried this on CentOS 6.5 running a fedora container and failed with following error… when I try to use the systemd_rawhide container in any way…

    ]# docker run systemd_rawhide cat /etc/fedora-release
    lxc-start: No such file or directory – failed to mount ‘/var/lib/docker/vfs/dir/60010e7b4406939c8450e5d937f94dee992aa11c4c71b95587f365c41f4f50f8’ on ‘/usr/lib64/lxc/rootfs///sys/fs/cgroup’
    lxc-start: failed to setup the mount entries for ‘b5b83d742fb25aa403f90986f5ef23d12c71c4e94052c7c7981d90903fba35f4’
    lxc-start: failed to setup the container
    lxc-start: invalid sequence number 1. expected 2
    lxc-start: failed to spawn ‘b5b83d742fb25aa403f90986f5ef23d12c71c4e94052c7c7981d90903fba35f4’

    Where as the default fedora image works…

    # docker run fedora:latest cat /etc/fedora-release
    Fedora release 20 (Heisenbug)

    # docker images
    REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
    systemd latest 73997002d227 14 minutes ago 441.1 MB
    fedora rawhide 5cc9e91966f7 13 days ago 372.7 MB
    fedora 20 b7de3133ff98 4 weeks ago 372.7 MB
    fedora heisenbug b7de3133ff98 4 weeks ago 372.7 MB
    fedora latest b7de3133ff98 4 weeks ago 372.7 MB
    centos centos6 0b443ba03958 5 weeks ago 297.6 MB
    centos latest 0b443ba03958 5 weeks ago 297.6 MB
    centos 6.4 539c0211cd76 13 months ago 300.6 MB

    Like

      1. Oh! Now that I read your reply… of course! It makes perfect sense. I have been doing OS isolation based virtualization for too long, I guess. LXC/Docker instances still has to rely on the core/host OS extensively. Duh.

        Liked by 1 person

  2. Thanks very much for the notes. They have been working very well on Fedora/RHEL7/CentOS 7 hosts for getting systemd running in a container.

    One note: I have been able to run systemd *without* mounting the host’s “/sys/fs/cgroup” directory, while starting the container. Things seem fine. Is there any reason to do so? Thanks!

    Like

  3. I know that systemd was making some changes to allow us to run systemd without using cgroups, Maybe this was fixed. Or maybe we were mistaken. I like the idea of not mounting cgroups into the container, to stop information leak into the container.

    I am hoping the next generation of images from Centos7, RHEL7, and Fedora come with systemd, so that you can just use systemd out of the box if you want.

    Like

  4. Hi!

    I am trying to build a RHEL 7 base image with Java support (OpenJDK 7). The OpenJDK 7 pkgs depends on systemd (not directly but in transitive mode). So, in order to install the openjdk in my RHEL 7 Docker base image I was forced to remove the fakesystemd. Now I have a question. I have to follow yous instructions to avoid issues with the systemd installed as dep for the openjdk in my RHEL7 base image?

    See my approach here: http://rafaeltuelho.net.br/2014/12/11/installing-the-openjdk-7-in-a-rhel-7-docker-image-systemd-dep-issue/

    Thanks in advance.

    Like

  5. You only need to follow these instructions, if you were to run systemd as your init command “CMD /sbin/init” If you are running your OpenJDK 7 app directly or via httpd, you don’t need to do anything.

    Like

  6. I was able to get systemd to run without a completely privileged Docker container using:

    docker run -ti –cap-add=SYS_ADMIN -e container=lxc bradlaue/centos-systemd

    FWIW – this is using the latest Docker 1.4.1. Not sure when –add-cap was added, but this gets me to a running Docker image without a SEGV.

    Oh, I should add – I did not bind in the /sys/fs/cgroup or /run trees – systemd created these on its own within the container seemingly without issue.

    From here I was able to set up most of the OpenStack Juno control plane software in the image – no functional issues.

    Like

  7. The problem with this is it greatly decreases the security of the container. SYS_ADMIN means that Root can pretty much do what it wants within the container. We have a patch to create /run as a TMPFS which we are trying to get upstream and then volume mounting in /sys/fs/cgroup RO fixes the problem with running systemd containers without priv.

    Liked by 2 people

    1. Agreed – looking forward to that. I’m not sure if adding SYS_ADMIN is the same as running docker with –privileged=true – either way, it’s going to be nice to be able to avoid.

      For the moment it’s working well as a host for OpenStack components, which has to be privileged anyway to do some of the things it needs to do.

      Would such an enhancement to systemd be backported to RHEL 7 and derivatives? That’s the one thing that worries me. It would be quite a functional enhancement but it would be years away in all practical terms…

      Like

  8. I’m getting the following error.

    Failed to get D-Bus connection: Unknown error -1 with fedora 21. Any idea what could be the problem?

    Like

  9. [root@centos ~]# docker run -d -e POSTGRES_USER=odoo -e POSTGRES_PASSWORD=odoo –name db postgres

    [root@centos ~]# docker run -p 127.0.0.1:8069:8069 –name odoo –link db:db -t odoo

    I ran above command to get the docker odoo image and i worked with odoo CRM. Then I save the changes and switch off the machine. Next day when I tried to run odoo in docker image and now it shows below error.
    [root@centos ~]# docker start odoo
    Error response from daemon: Cannot start container odoo: Cannot link to a non running container: /db AS /odoo/db
    2015/03/29 11:23:39 Error: failed to start one or more containers

    Please explain me why and give me a solution.

    Like

  10. I’ve been trying this with a CentOS 7 container image I am building and not having a whole lot of luck. Can build a container, but systemd refuses to run. Did something change recently?

    Like

    1. What command line are you running and what was the output. We are adding a

      docker run –init=systemd
      switch to configure a container to run systemd mode. This is experimental since it has not gotten into
      upstream docker yet.

      Like

  11. Can you run a script for your service in the same way. e.g. to start Jenkins

    # Setup for Jenkins master running
    RUN printf “#!/bin/bash\n\nexport PS1=\”[\\\\u@\\h \\W]\\$ \”\nservice jenkins restart\nwhile true; do sleep 1d; done\n” > $HOME/startJenkins.sh
    RUN chmod +x $HOME/startJenkins.sh

    CMD [“/root/startJenkins.sh”]

    Like

  12. Hi All – I’m getting an error when I try to run “docker run –privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 80:80 httpd_rawhide” of “/bin/sh: [“/usr/sbin/init”]: No such file or directory”. Apologize if this is solved above and I missed it, but I can’t figure out how to get past this (using Centos7 host in Virtualbox; Rawhide container; docker version 1.8.2). I can see the /usr/sbin/init in my Centos7 host as a symlink to /lib/systemd/systemd (I tried creating a symlink via RUN in the Dockerfile but get message saying the file exists). Has something changed in 22 or is this self inflicted some how? I’ve tried various things, am wondering if I need to tweak environmental variables (I am using “ENV container docker” to create the image) or erase the symlnk and recreate it?

    Was also hoping to ask, found another post where systemd commands are “masked” so they don’t start. I can get the container to work without the –privileged parameter, but was wondering if this introduces any security concerns (the command sequence also stops mid way on the mask commands, so I have to start another command prompt to access the container).

    Like

  13. I created a fresh image out of the above Dockerfile, and ran w/ privileged and cgroup volume options, and i don’t get systemctl to run

    # docker run -it -privileged -v /sys/fs/cgroup:/sys/fs/cgroup:ro myfedora_rh_systemd bash
    Warning: ‘-privileged’ is deprecated, it will be replaced by ‘–privileged’ soon. See usage.
    [
    # systemctl
    Failed to connect to bus: No such file or directory

    Like

    1. Trying with –privileged doesn’t help too.

      [root@kube-node1-f21 ~]# docker run -it –privileged=true -v /sys/fs/cgroup/:/sys/fs/cgroup/:ro myfedora_rh_systemd
      /bin/sh: [“/usr/sbin/init”]: No such file or directory

      [root@kube-node1-f21 ~]# docker run -it –privileged=true -v /sys/fs/cgroup/:/sys/fs/cgroup/:ro myfedora_rh_systemd bash

      [root@kube-node1-f21 ~]# systemctl
      Failed to connect to bus: No such file or directory

      Liked by 1 person

  14. The —-/bin/sh: [“/usr/sbin/init”]: No such file or directory. — error happens when you copy and paste the code from above into a terminal that creates the Dockerfile. The reason is that font format on websites will use/generate “ and ” which are left and right quotation marks. Linux/unix requires neutral quotation mark “. Just replace all copied quotation marks manually in the terminal to correct the problem.

    https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

    Like

  15. Good post. I found that –privileged flag would give access to /dev of the host which may not be required and may cause issues. If you are just trying to run systemd, a better way is to run containers with the cap-add=SYS_ADMIN flag and not use –privileged flag.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s