In professional IT circles, especially among data center specialists, Docker has been an extremely important topic for years. Containers have been used for a long time in computer science, and unlike other types of virtualization, containers are running at the top of the operating system kernel.
That being said, virtualization by the container is often called virtualization at the operating system level. This kind of virtualization allows more isolated instances to run on a single machine.
This stuff really is revolutionizing, not only in the way we develop and deliver IT applications, but also the way in which we deliver IT infrastructure.
In this article we're going to start from the top - we'll explain what a container actually is, and we'll take a look at Docker, the company and the technology at the forefront of it all. We'll also get into ways in which this will impact you, your business, and your career, as well as touch on some of the ways in which you can prepare.
Besides all of that, we'll look at some of the major concepts and technologies - things like container registries, what they are and what difference they make. The goal being by the end of this article you'll be well up to speed on containers, more than able to hold your own when discussing them and when doing your own investigations.
What are Containers
First of all, you'll need a decent grasp of what a Container actually is. That being said, this section will be dedicated to explaining containers in a bigger picture so that you can follow along nicely with the rest of the article. This knowledge should also give you the confidence to articulate yourself when you're in a container-related conversation.
To do this properly, we really need a quick lesson on IT history:
Applications Run Businesses and Applications Run on Servers
At the highest level, applications are what runs our business, like Internet banking, online retailing, airline booking, air traffic control, media broadcasting, and education. Whatever your business is, applications are a massive part of it and we heavily rely on them to make our work possible.
In today's world, we can no longer distinguish between our business and the applications that power it. They are one and the same. No applications, no business. These applications run for the most part on servers. And back in the day, the early to mid-2000s, most of the time we ran one application on one physical server - so IT worked a little bit like this:
If a business needs a new application, for whatever reason, be it a new product launch or a new service they offer, a new server must be procured to run the new application on. Of course, the server has an upfront CAPEX cost, with the addition of a bunch of OPEX costs that kick in later – the cost of powering and cooling it, administration, and all that jazz.
This raises many questions:
"What kind of server does the application require?"
"How big does it have to be?"
"How fast does it have to be?"
Answers to questions like those are often "We don't know". In that case, they erred on the side of caution and opted for big and fast servers. The last thing anybody wanted, including the business, was dreadful performance – the inability to execute and potential loss of customers and revenue.
Because of that fear, a lot of people ended up with overpowered physical servers utilizing a lot less than they're capable of.
However you look at it though, it's a shameful waste of company capital and resources.
VMware and Hypervisor
It was almost overnight that we had technology that would let us take the same physical server and squeeze so much more out of it. Instead of dedicating one physical server to one lonely application, suddenly we could safely and securely run multiple apps on a single server.
Unlike previously, when a business was growing, expanding and diversifying by adding new services and applications, there was no need for a brand new, sparkling physical server. This lead to reduced CAPEX and OPEX costs, as well as efficient usage of the already existing, powerful servers.
At this point, buying new servers occurred only when we actually genuinely needed them. Smaller businesses were able to flourish due to reduced costs, and bigger businesses could focus more on development and progress due to the same reasoning.
However, ultimately, even this wasn't the ideal solution.
Hypervisors allow multiple apps per server.
Let's take this server for an example. It has processors, memory, and disk space. We can run multiple applications on this server, in our case – four different applications.
For this purpose, we create four different virtual machines or virtual servers. These are essentially slices of the physical server's hardware.
Just for argument's sake, let's say that each of these virtual machines utilizes 25% of the processing power, memory, and disk space respectively.
Each virtual operating system uses a chunk of the processing power, memory, and disk space. They may have license costs and they require time to set up and maintain. Since each application uses a virtual machine, another big chunk of these resources is sliced from the physical server, just to run even without any applications deployed on the server.
Some Linux distributions aren't free, and Windows most certainly isn't – this makes a dent in both resources and the budget - Each virtual machine also needs administrator supervision – security patching, anti-virus management, etc.
There's a whole realm of operational baggage that comes with each one, and VMware as well as other Hypervisors, as great as they are, don't do anything to help us with these kinds of problems.
They revolutionized the way we develop and deploy applications, but they still have issues. There are more efficient solutions to be found if we just keep moving to better technologies and methodologies.
All of this leads us to containers as the current best solution to these problems. Let's take a look at the differences between using containers and hypervisors:
The same four business applications need to be deployed on the same physical server as before. However, this time, instead of installing a hypervisor and four individual virtual operating systems on top of it – we install a single operating system for all of them.
On this operating system, we create four containers, one for each application respectively. It's inside of these containers, on the same operating system that we run our applications. We're still not getting into the Microservices Architecture, but rather a simple application or service per container.
Containers are a lot smaller and a lot more efficient than virtual machines, so this approach costs less and allows us to use our resources more efficiently.
The end result is, we get rid of the VM's from the Hypervisor Architecture and end up with a lot of free space to spin up more containers. This means that we can deploy even more applications on the same physical server as before. There are no virtual machines, no extra operating systems that need to be booted up before even starting the application and most importantly, no more wasteful resource consumption.
A container is launched by running an image. An image is an executable package that includes everything needed to run an application - the code, a runtime, libraries, environment variables, and configuration files. An image runs within the container and is built using layers.
Your software is added to an image, after which other people from your development team can build on top of it to expand it and add other functionality.
Much has been said and written about the persistence of containers, or the supposed lack of persistence. True it is, containers are an outstanding fit for non-persistent workloads, but it's not like containers by design can't persist data, they can and they do.
The first problem that Docker containers run into is security. The file system of the host needs to be completely separated from the file system in any container. And the file system from the containers shouldn't be connected in any way as well if your applications represent different services.
Making a good isolation system was crucial for the security of both the containers and the host server. To answer this problem, Docker adopted the Union file system.
This is what makes container images layered - The union file system combines different directories which act as separate layers.
When you create a container with
docker run, it goes into running state. From there, we can stop it, restart it and also remove it. The thing is though when you stop a container, it's not like it's gone forever or wiped out of existence. It's still there along with its data on the Docker host.
So, when you restart it, it's going to come back to life with all of its data intact, as long as you don't explicitly remove it with a
docker rm command, you shouldn't be afraid of losing any data.
If you're interested to read more about managing persistence for Docker Containers, here's a great article to turn to. It's very detailed and covers a great variety of topics and concerns, it's worth a read.
What is Docker
Docker is an open source application that automates application development in a container. With Docker, developers only take care of applications that are launched into a container, rather than container management itself.
There are definitely other container technologies out there, and good ones at that, but Docker is where most of the development and most of the action is. We can say that Docker is to containers what VMware is to hypervisors.
Many of you have heard of the Java phrase "WORA" - write once, run anywhere even if you're not a Java developer. This is possible because the JVM acts as a mediator between compiled Java code and the operating system. It's enough to compile it into a
.class file format, or a package like a JAR, WAR, or EAR file.
JVM is running on a variety of operating systems and translates these files into the byte-code that the operating system requires.
On the same line, Docker introduces us with the phrase "PODA" - Package once, deploy anywhere.
Your application can be written in any language, could be deployed on any operating system, and could have a whole variety of drivers and plugins and extensions and libraries that need to be packaged up.
The entire application is packaged up in a single image, and this image is used to run on a wide variety of systems.
Although Docker claims to support the concept of "PODA", it's not a true "PODA" because an image created on Linux would run on Linux, similarly an image created on Windows would run on Windows.
How Docker Started
Docker Inc., the major company and the main sponsor behind container technology currently changing the world, is a tech startup based in San Francisco. However, Docker wasn't originally in the business of changing the way we build, ship, and run applications.
The company started out as a platform as a service provider dotCloud Inc. The idea behind the business was to offer a developer platform on the top of Amazon Web Services. Behind the scenes, at dotCloud Inc*, they were using this funky container management and runtime technology as their main deployment engine.
So while their core business of selling a developer platform on top of AWS was waning, they were sitting in silence on top of something special. In 2013, they decided to make a major pivot and bet the business on this container technology that they were calling Docker, and today Docker Inc. is seen as a leading technology company with a market valuation over around a billion dollars.
The "Docker Project" is absolutely not the same as Docker Inc. They are a major sponsor and the driving force behind it, but Docker, the container technology, belongs to the community. If you look at it, you'll notice that everyone is contributing to it from the likes of IBM, Cisco, Microsoft, Red Hat, etc.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
The first and foremost good thing, it's open source. This means everyone and anyone is free to contribute to it, download it, tweak it, and use it as long as they adhere to the terms of the Apache license version 2.
The code is up there for the world to see on GitHub. Core Docker components are written in Go or Golang, the programming language that's been amassing quite a bit of popularity recently that was produced by engineers at Google. Also, you can see the planned release cycle which they pretty much achieve.
The Docker project is all about providing awesome open tools to build, ship, and run modern applications better. And there's more than one tool and technology to the Docker project. The same way that VMware is a ton more than just the hypervisor, well the Docker project is way more than just the Docker engine.
The Docker engine is the core piece of software for building images, stopping, and running containers. It is kind of the core technology that all the other Docker project technologies, plus third-party tooling, build on and build around.
If we stick the Docker engine here in the middle as a core technology, then everything else like clustering, orchestration, registry and security all build around the engine and plug into it.
Docker is available on multiple platforms. Here you can see supported platforms and what to know before you install Docker.
Docker Hub and Other Container Registries
Docker Hub, the public Docker registry, is a place where you can store and retrieve Docker images.
There are over 250,000 repositories. Images from those repositories have been downloaded and used well over a billion times. Container, or image registries, particularly Docker Hub, are literally becoming the App Stores or the Google Play Stores of enterprise IT.
Just like the App Store is central to everything that you do on your iPhone, Docker Hub, or potentially whatever third-party container registry you decide to use, is also dead center of everything you do with containers.
Container registries, at the highest and most basic level, are places to store and retrieve container images. Docker Hub is the official Docker registry from Docker Inc. but there are lots of third-party registries out there too.
When you enter Docker Hub, you'll notice a bunch of official repositories there. And as long as you've got a Docker host with an Internet connection, you can get to any of these. But just how is this done?
Let's say that you're hosting Docker on your laptop. Your Docker is clean, or rather, it doesn't have any images on it.
Containers run on images, so having none means that you can't run any containers. Naturally, the first thing you'd want to do is pull (download) an image to your Docker host.
For an example, let's pull a MongoDB image to our Docker host. Once you've downloaded and installed Docker for your respective platform:
First, run the
docker ps command - this prompts you with a list of installed containers:
To pull MongoDB, run the
docker pull mongo command - this pulls the latest version of the repository:
Lastly, run the
docker images command - this shows you your list of images:
Once we've installed MongoDB, it would be suitable to run it:
docker run --name some-mongo -d mongo:tag
- --name: The name you wish to assign to your container
- some-mongo: The literal value of the name attribute
- tag: The tag specifying the MongoDB version
docker-ps now will prompt us with:
Once that's done, each of our Docker hosts has their very own copy of the Mongo image, so all of them can run MongoDB-based containers.
You can push containers to upload the updated image in a central. That way, any host that wants to run your custom image can just pull it down and crack on.
Does this mean that anyone with Docker and Internet access can access and download my stuff?
For public repositories, yes. They are wide-open to the world, or at least they're wide-open to the world to pull.
Some repositories are marked as private. This means it is not open to everyone. If you'd like your repositories to stay closed for the world, just mark them to be private.
Can everyone push to my own repository and submit code?
Public repositories are there for everyone to pull, but only you or accounts that are authorized can actually push to it.
Private repositories can only be accessed by people or accounts that are specifically given permissions.
Most registries these days let you define organizations and teams.
Are my repositories safe?
There is more to registry security than setting permissions and hiding behind firewalls... trust?
When pulling images down, how do we know that we can trust what we're getting? You absolutely need to know that you can trust what you're pulling. Docker has got a technology called Docker Content Trust, and it's exactly for this purpose.
It lets you verify both the integrity of the image that you're pulling, and verify the publisher of the image. Nowadays, the automated workflows look something like this:
An application is written, modified, patched, and updated. It is then pushed to your software repository. From there, you can perform tests, after all, you want to make sure that none of the changes that we just made didn't actually break the application.
Assuming the test comes out good, we push it to our Container Registry. The registry performs an automated build. This gives us an updated container image to deploy from – from there, we deploy the updated application. We can deploy it to our own data centers on-premises or to the cloud.
The Container Registry in the middle is the pivot point or rather the dead center of these types of workflows. Of course, all of this can be automated so you can test and automate every change that you made to your application as well push them to development, test, or even a production environment.
If the concept of container orchestration is foreign to you completely, or partially, let's make a real-life comparison:
In basketball, there are a lot of players. At any given point of the game, some players are on the court and some aren't. Each player has a specific role, or job, so having a lot of players means that there are a lot of different jobs on and off the court.
To work as an efficient team, they need some sort of organization, or rather orchestration. This is the job of a coach or a coaching team. They orchestrate everyone, tell people what to do, where to go, make play calls, etc.
The team is doing a lot of specific, unique tasks, organized by a coach or coaching team. The same goes for our applications – they're usually made up of a bunch of individual or small services that are orchestrated in such a way to act as a single unified application. Just like a basketball team.
So, just about any containerized app out there, certainly any production worthy app, is going to be composed of multiple interlinked containers, probably spanning multiple hosts, and maybe even multiple cloud infrastructures. And if we're talking about a lot of component parts to our app - many microservices spanning thousands of containers on tens or hundreds of hosts, honestly, we don't want to be manually hand-stitching all of that.
What we need is a game plan of sorts, something that composes everything into the overall app. We're thinking of things like:
- Defining all the different components or the services that make up the application
- How to fit them together
- Message Queues
- API calls, etc...
Then, once our app's defined, we need a way of deploying and scaling it. We definitely don't want to be manually choosing which containers run on which hosts. We just want a pool of hosts, and then be able to fire-up containers and have our orchestration tool put the right containers on the right hosts.
This is all high-level, but this is what container orchestration is about. Defining our application, how all the parts interact, provisioning the infrastructure, and then deploying the application potentially with a single click.
After this, you can kick your feet up and enjoy the performance.
From the Docker Inc. perspective, they've got four products that do all of this for us.
- Docker Machine provisions Docker hosts for us on-premises - in the cloud.
- Docker Compose is being used to define and compose our multi-container application. Which images to use, which network ports to open, and the config that glues our application containers together.
- Docker Swarm is used to take care of actually scheduling our containers across our estate of Docker hosts.
- Docker Totem gives us a pretty UI and lets us control and manage everything, on top of everything above.
As always, there's the wider ecosystem. Technologies and frameworks like Kubernetes, Mesosphere Data Center OS, CoreOS, OpenStack Magnum. These can all be used to orchestrate containerized apps. And obviously, each has its own pros and cons.
For an example - Kubernetes was developed by Google and is an open-source framework now.
Some of you are going to be hands-on techies, developers, sys-admins and DevOps, while others are going to focus more on management and generally less hands-on. If you're one of the hands-on types, do just that, get your hands on this stuff.
All you need is a virtual machine, on your computer or in the cloud, it really doesn't matter where.
Get Docker installed and do what you do - play with it, develop and dockerize an app, build images, start containers, smash them, trash them, just get your hands dirty messing around with it.