containers for grown-ups

10 things to avoid in docker containers

So you finally surrendered to containers and discovered that they solve a lot of problems and have a lot of advantages:

  1. First: Containers are immutable – The OS, library versions, configurations, folders, and application are all wrapped inside the container. You guarantee that the same image that was tested in QA will reach the production environment with the same behaviour.
  2. Second: Containers are lightweight – The memory footprint of a container is small. Instead of hundreds or thousands of MBs, the container will only allocate the memory for the main process.
  3. Third: Containers are fast – You can start a container as fast as a typical linux process takes to start. Instead of minutes, you can start a new container in few seconds.

However, many users are still treating containers just like typical virtual machines and forget that containers have an important characteristic: Containers are disposable.

The mantra around containers:

“Containers are ephemeral”.

RH_Icon_Container_with_App_Flat

This characteristic forces users to change their mindset on how they should handle and manage containers; and I’ll explain what you should NOT do to keep extracting the best benefits of containers:

1) Don’t store data in containers – A container can be stopped, destroyed, or replaced. An application version 1.0 running in container should be easily replaced by the version 1.1 without any impact or loss of data. For that reason, if you need to store data, do it in a volume. In this case, you should also take care if two containers write data on the same volume as it could cause corruption.  Make sure your applications are designed to write to a shared data store.

2) Don’t ship your application in two pieces – As some people see containers like a virtual machine, most of them tend to think that they should deploy their application into existing running containers. That can be true during the development phase where you need to deploy and debug continuously; but for a continuous delivery (CD) pipeline to QA and production, your application should be part of the image. Remember: Containers are immutable.

3) Don’t create large images – A large image will be harder to distribute. Make sure that you have only the required files and libraries to run your application/process. Don’t install unnecessary packages or run “updates” (yum update) that downloads many files to a new image layer.

UPDATE: There’s another post that better explain this recommendation: “Keep it small: a closer look at Docker image sizing

4) Don’t use a single layer image – To make effective use of the layered filesystem, always create your own base image layer for your OS, another layer for the username definition, another layer for the runtime installation, another layer for the configuration, and finally another layer for your application. It will be easier to recreate, manage, and distribute your image.

5) Don’t create images from running containers – In other terms, don’t use “docker commit” to create an image. This method to create an image is not reproducible  and should be completely avoided. Always use a Dockerfile or any other S2I (source-to-image) approach that is totally reproducible, and you can track changes to the Dockerfile if you store it in a source control repository (git).

6) Don’t use only the “latest” tag – The latest tag is just like the “SNAPSHOT” for Maven users. Tags are encouraged because of the layered filesytem nature of containers. You don’t want to have surprises when you build your image some months later and figure out that your application can’t run because a parent layer (FROM in Dockerfile) was replaced by a new version that it’s not backward compatible or because a wrong “latest” version was retrieved from the build cache. The “latest” tag should also be avoided when deploying containers in production as you can’t track what version of the image is running.

7) Don’t run more than one process in a single container – Containers are perfect to run a single process (http daemon, application server, database), but if you have more than a single process, you may have more trouble managing, retrieving logs, and updating the processes individually.

8) Don’t store credentials in the image. Use environment variables – You don’t want to hardcode any username/password in your image. Use the environment variables to retrieve that information from outside the container. A great example of this principle is the Postgres image.

9) Don’t run processes as a root user“By default docker containers run as root. (…) As docker matures, more secure default options may become available. For now, requiring root is dangerous for others and may not be available in all environments. Your image should use the USER instruction to specify a non-root user for containers to run as”. (From Guidance for Docker Image Authors)

10) Don’t rely on IP addresses – Each container have their own internal IP address and it could change if you start and stop the container. If your application or microservice needs to communicate to another container, use environment variables to pass the proper hostname and port from one container to another.

For more information about containers, visit and register at http://developers.redhat.com/containers/

Rafael Benevides

About the author:

Rafael Benevides is a Director of Developer Experience at Red Hat. In his current role he helps developers worldwide to be more effective in software development, and he also promotes tools and practices that help them to be more productive. He worked in several fields including application architecture and design. Besides that, he is a member of Apache DeltaSpike PMC – a Duke’s Choice Award winner project. And a speaker in conferences like JUDCon, TDC, JavaOne and Devoxx. Twitter | LinkedIn | rafabene.com

Shipping_containers_at_Clyde


Join Red Hat Developers, a developer program for you to learn, share, and code faster – and get access to Red Hat software for your development.  The developer program and software are both free!

 


For more information about Red Hat OpenShift and other related topics, visit: OpenShift, OpenShift Online.

  1. I love #4.

    Comments on #5 and #7:
    1. All new applications should be designed with good separation of code, configuration and data. These apps should always follow #5 and #7.

    2. Existing applications may not have good separation of code, configuration and data. With these apps, you can still make your life better by adopting Docker and Kubernetes, but you may not be able to separate everything into separate containers. This matters less though, because often these traditional apps are an all or nothing installation (e.g. they can’t be broken apart).

    Like

  2. Why is this wrong? “Don’t install unnecessary packages or run “updates” (yum update) during builds.” Isn’t it good to have an up-to-date image?

    Like

    1. Not that it is wrong, but a bad idea if you intend to push your image, and/or move it around constantly and at a fast pace. The updates generally add extra data that you don’t actually need (cache), so you can yum update on your image for sure but always remember to flush that cached data you wont be really using.

      Like

    1. Also if yum update is invoked n a Dockerfile, the resulting container can no longer be considered a 100% known quantity, unknown changes will have been introduced. If complete predictability is important to you yum update is bad

      Like

    1. The large yum update is a symptom, not the problem:
      1. The upstream base image maintainers aren’t doing their job of keeping the image up to date, so it is huge. If the upstream maintainer had this base image updated, there would be no updates. Not the developer’s fault. Also, if the developer puts it into production, it’s their responsibility to make sure there are no CVE’s in their code, and hence their responsibility to do a yum update. You touched it, you own it.

      2. The Docker tools don’t have a way of flattening the image easily (without doing a save/load). This can be fixed down the road.

      Telling developers not to do yum updates terrifies me as a security person….

      Like

      1. Containers are ephemeral. If you need a security update from your base distribution, chances are most or all of your images need it. Build a new base image, deploy user/app/etc layers onto your new base image and deploy, then destroy those outdated containers. Also this way, all your containers will have the same update.

        Run ‘yum update’ on 100 containers vaguely simultaneously and chances are you’ll have 100 identical image update layers, one for container, rather than one updated image shared between them. Disk is cheap, but I/O is less so (yes, maybe your store can dedupe). And if today was not your lucky day, you may have 60 copies of one base layer, 30 copies with a slight update in one package, and 10 more with that slight update, and another slight update in another. Have fun troubleshooting that.

        Liked by 1 person

  3. I don’t recommend env vars for credentials, they are not safe for concurrent read/write access. Take the time to set up something like Vault

    Like

    1. I don’t like it either. init scripts with password, yuck!! (at least on the PostgreSQL example you can set the password hashed). Just “docker build” a new image from the base adding the password for your application and start containers based on it. This guarantee you don’t need to store credentials on init scripts.

      Like

  4. Item #7 has unwanted consequences:

    7) Don’t run more than one process in a single container – Containers are perfect to run a single process (http daemon, application server, database), but if you have more than a single process, you will have troubles to manage, get logs, and update them individually.

    http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#MereAggregation

    What is the difference between “mere aggregation” and “combining two modules into one program”?

    Mere aggregation of two programs means putting them side by side on the same CD-ROM or hard disk. We use this term in the case where they are separate programs, not parts of a single program. In this case, if one of the programs is covered by the GPL, it has no effect on the other program.

    Combining two modules means connecting them together so that they form a single larger program. If either part is covered by the GPL, the whole combination must also be released under the GPL—if you can’t, or won’t, do that, you may not combine them.

    I.e., if you create docker container with purpose to run single application only, it means that you are statically bundling libraries with your application, so you are required to obey GPL license of a library or tool in your container.

    If you create virtual host in docker container, which can run any application, you are not required to obey GPL license of a library.

    Like

  5. So basically, what this article is saying is, “if you want to use containers as virtual machines, but without all the penalties described above, use SmartOS zones.” Got it.

    Liked by 1 person

  6. nice list! About #7 “one process per container” i don’t entirely agree. It depends is the answer. If you want to decouple release-frequency, (auto)scale separate tiers/processes independently, use different tech-stacks, or split up teams, then i would split into several containers, otherwise i would avoid the added complexity and keep the processes in one container (like you would do with a VM). Yes, logs are an issue with distributed scalable ephemeral systems (ie “microservices”) but you can solve this much better using dedicated health-ports and -metrics that can be checked in realtime and acted upon. Our container Canary-testing&releasing and micro-scaling solution VAMP (www.vamp.io) also works this way.

    Like

  7. Nice list, very complete and true! About #7 “one process per container” i don’t entirely agree. It depends is the answer. If you want to decouple release-frequency, (auto)scale separate tiers/processes independently, use different tech-stacks, or split up teams, then i would split into several containers, otherwise i would avoid the added complexity and keep the processes in one container (like you would do with a VM). Yes, logs are an issue with distributed scalable ephemeral systems (ie “microservices”) but you can solve this much better using dedicated health-ports and -metrics that can be checked in realtime and acted upon. Our container Canary-testing&releasing and micro-scaling solution VAMP (www.vamp.io) also works this way.

    Like

  8. nice list! About #7 “one process per container” i don’t entirely agree. It depends is the answer. If you want to decouple release-frequency, (auto)scale separate tiers/processes independently, use different tech-stacks, or split up teams, then i would split into several containers, otherwise i would avoid the added complexity and keep the processes in one container (like you would do with a VM). Yes, logs are an issue with distributed scalable ephemeral systems (ie “microservices”) but you can solve this much better using dedicated health-ports and -metrics that can be checked in realtime and acted upon. Our container Canary-testing&releasing and micro-scaling solution VAMP (www.vamp.io) also works this way.

    Like

  9. I’m pretty interested what is the proper way of making updates with Docker?
    Base image is a good idea but it won’t cover all cases, only base packages will be updated – what about depending images ex. one with PHP, another with Nginx, etc.?
    Totally different case is that just running yum install will update lists of available packages and install their fresh / current versions. So if you install some packages as dependency in your app image, every time you build it you may end up with versions of packages different than expected. I saw only few Dockerfiles that pin package version.
    Finally – how to do it well?

    You write also that configuration should be separated – but I saw it mostly as environment variables parsed by a lot of bash. There are really great tools like Chef, Puppet, etc to do this in consistent way – using bash for that is such a pain. What is the Docker way of doing this?

    Like

  10. Isn’t #7 (One process only) a contradiction to #2 (Don’t ship application in pieces.) At least, with the increasing complexity in applications, that we are experiencing, I can hardly remember when a single process was last sufficient for me.

    Like

    1. In a Java world, #7 means. Use the container to run the application server only (no other servers). And #2 means: Don’t ship the application server image plus your WAR file apart. Package your application (war file) inside the image.🙂

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s