In this Tech Talk, Blake, a Software Developer here at Fog Creek, gives an introduction to Docker. He covers what Docker is, why you might want to use it, how it works, as well as explaining some key terminology. He finishes up with a few demos demonstrating the functionality of Docker.
About Fog Creek Tech Talks
At Fog Creek, we have weekly Tech Talks from our own staff and invited guests. These are short, informal presentations on something of interest to those involved in software development. We try to share these with you whenever we can.
Content and Timings
- What is Docker? (0:00)
- Basic Terms (3:00)
- Why Use Docker? (6:45)
- How Does Docker Work? (10:08)
- Docker Artifact Server (11:01)
- Docker Demos (14:45)
What is Docker?
At its core, Docker is a platform for developing, deploying and running services with Linux containers. So what does that mean? Linux containers – this is a feature of the kernel as of, I believe, 2.6.24 or so, it has been around for a few years. It’s a way of just isolating processes from each other. And you can do a lot of cool things with it.
So one way to look at it is, as chroot on steroids. It’s not just filesystem rooting, it’s also isolating you from all of the other processes of the machine and this is a pretty cool thing when you think about it. Like you can do things like running unsafe code, running lots of stuff on your server without really vetting it, or just running multiple instances of something on the same machine and have them be isolated from each other.
It also seems like when I’m describing it like this, that it’s a VM. That’s usually the first way people think about Docker. ‘Oh it’s a VM, let’s go ahead and set one up. I need SSH, I need to run Apache, I need to run my Python service, lets also install those things on this VM and lets SSH into it’. Well, it’s not a VM. It’s really like I said, it’s a lightweight way of running a process in total isolation. So you can treat it like a VM, but that would be wrong.
Their metaphor and the true metaphor for Docker is that of the problem of shipping along time ago. You have rocking chairs, and you have couches, and you have cars, and you have golf balls… and you have all of these things. How am I going to ship all of these things? You can pile them into a giant pile on a big ship, but that isn’t going to work. I’m sure they started using boxes, then they started using crates and then they realized that all of these crates are different shapes and so they came up with a standard. This standard, there’s these containers that we see passing us on the road all of the time and on these ships, if you get a chance to go to the docks. There’s a standard size for these things, there’s a standard place for where the doors go, for how the locks work, and for where the mount points are to pick it up. So that they all fit on ships and they can get them on and off efficiently and there’s weight restrictions and all this. So that the shipping company doesn’t need to know what’s inside these things, that it hits all of the specs and adheres to all of the standards. And there’s a lot that you can do with that, right. You can watch the orchestration of these containers off trucks and on to ships and you can do some really cool things.
So that’s what Docker is. It takes all your services, your apps, or other people’s apps and it bundles them up in a standard way where you can now have orchestration at the software level. You can deploy these to different machines, you can do all kinds of cool things with this. So that’s the Docker metaphor, it’s containers on container ships.
So to give you some basic terms, just so we’re clear from the start. A Docker Image is a static filesystem, and in every case except the top level, it’s going to have a parent image. So my basic filesystem might just have like let’s say home directory. So I put home directory in there. It might also have like a user banner, it’s going to hold a whole bunch of binaries, I don’t know. And then I create another image on top of that where I add another layer of the filesystem. And another layer on that, where I am adding this and that and that. And then at some point I’m installing a bunch of services, and every one of these actions are another layer on top. The file system is a copy on write. So if I’m going to overwrite something from a parent image, that’s no problem, it’s just going to overlay on top of that. So, there’s benefits to them being static, which I’m going to get to in a little bit.
An Image is like, you can think of it like, if you installed Debian or something, and you snapshotted it and put it on a DVD, there’s your image. It’s something that can’t be modified at this point. And now when I put that DVD in the machine, and this is kind of a weird metaphor because I don’t want anyone to think of this as a machine, but when you put the DVD in the machine you say you’re going to boot from DVD. Well, now you have the equivalent to Docker as a Container. A Container is a writable instance of an Image. It’s based on an Image, and so in this metaphor, if this holds out we’ll see, as you’re running you’re able to overwrite files that are effectively on the DVD but the time when you’re running it, they’re in memory or in some kind of mount. So when you’re running a Container you can write to any file and again it’s a copy on write, it’s going to be writing its files somewhere. But in another metaphor here, you can think of an image like a Class. It’s a definition, it’s ‘here’s what this thing is’ and then a Container is an instance of that Class. So just like in programming when you can have a hundred different person objects, in Docker you can have a hundred different Containers based on one Image. And that might just be that I want to run bin/bash inside a stock Debian image. Right, so I’m going to run bin/bash inside a Container. And that bash command is going to see a fresh file system that no other instance sees because each has their own instance of that image. And then you can throw away the container but the image always exists in your registry.
And one more point about Containers is that they should be ephemeral. An Image is something that is recorded, it’s archived, it’s shared, it’s deployed to different machines. A Container is just a running service inside this Image and then when you’re done you should be able to delete the Container, there should be nothing special about the Container. And so you start thinking about, you know, in production, you have Logs – I can’t throw away Logs. You have secrets – I can’t throw away secrets. Those things aren’t stored in the container, or in the Image, those things are separate – I’ll get to that later. But start thinking of Images as definitions that are deployed, shared, checked-in somewhere. Containers are just started up and then I should be able to delete the Container. You shouldn’t design Containers in such a way that you’re holding on to them, they’re ephemeral.
Why Use Docker?
So why Docker? It’s a way to ship software in a standard way. And the system requirement is only Linux with a minimum kernel of I think 3.8. And then you install Docker, and that’s all you need. I can create an image of Postgres, of Redis, of Apache, of my own custom software and I can just give it to you as an image, and I say ‘just run this image as a container on your machine, oh and all you need is Docker.’ There’s a whole bunch of cool things you can do with that by running multiple containers at the same time. And it’s not just a deployment tool, there’re benefits at all stages in the lifecycle. In dev, in test, in continuous integration, in integration tests, staging and then obviously in production.
For test and QA, they are able to run multiple versions of your containers, so they can run multiple versions of your app at the same time on the same machine, without worrying about port collisions, without worrying about library collisions. So you can run Python 3 and Python 2.7 at the same time. And I know that you can do that with virtual environments, this is just a way of stepping back and making a standardized virtual environment that works for any kind of script. So you can run a bash script, Python, whatever you want.
Another thing, is that they can run test suites, in parallel without worrying about what else is running on that system because you can set them up in such a way that they don’t interfere with each other.
And one thing that is pretty cool, and I’m not entirely sure if this isn’t an anti-pattern yet after spending more time using it and playing with it. But, if you have your whole system, your whole back-end system in containers you can actually setup, you can setup your system in such a way and snapshot it. And now you have, like say you have a particular customer, that gets into a situation and we want to write some integration test to test what happens when I do this, or do that. You can snapshot, you can archive that and then you can run tests specifically for that state. It’s pretty spooky when you see it work. In production, there’s a lot of cool things you can do. You can limit the resources of a container, limit the memory, the CPU, device access and read/write speed, which is pretty cool.
Also, Docker has a remote API, so you can start querying different boxes and asking what containers do you have on there? And I’ll talk a little bit more about that in a minute.
There’s also since you have a standard format of shipping these things, there’s orchestration tools. And then you have things like where you can add new containers to a resource pool behind an HAProxy server and dynamically scale up and scale down if you set up the dynamic container discovery.
How Does Docker Work?
So how does this work? I mentioned briefly that it’s using the LXC, Linux container technology. That laid the groundwork, apparently working with LXC is kind of cumbersome. Docker decided to put a nice abstraction over that and make the thing easier to work with. So I mentioned that you have file system and process isolation and I mentioned also that this isn’t a VM. But what’s nice is, when you start playing with this, at first you’re thinking there’s an overhead here, right? Well, there’s not as much overhead as you might think and in many cases it’s almost zero. All Docker is doing is orchestrating, it’s saying ‘hey Kernel, I’m going to run this process, I want you to run this process in isolation and here’s the filesystem you’re going to use,’ and then Docker steps away. Processes are still running in the host system’s Kernel, which is great because then there’s very little latency to access memory and very little CPU overhead. And no Hypervisor.
Docker Artifact Server
And now I have all of these images running everywhere, so what do I do with them? Let’s say that we have a developer on our team building these images and we plan on using them in staging, prod, test, dev, all of these places. So what do we do with them? How do I give you my image?
Well, you push them to a thing called the Docker Registry. Registry is a terrible work, it conjures up images of the Windows Registry. So I’m calling it the Docker Artifact Server. It’s just a web server that you can run and push versioned images to. So what does that mean? I can download the Debian images, which is a series of layers, but in the end I don’t care. I just see the layer stack. I can build on top of that and I can call this fogcreek.com/blake-docker-demo and then I can push that with a version 1.0, I can push that into our own copy of the artifact server. Now I can make a change an push in 1.1, same thing. So the Docker Registry is Open Source, and they have a public hosted version of this, called the Docker Hub. And basically, anyone who wants you to use their software on a server is creating an image, you can create any image, and post it on there and publish it for free. If you want to do private images you can pay them some money, it’s the same business model as GitHub. So public is free, you can push your own private repos in there. Here’s an example of what you see on their main page. So if you went there now and said ‘I want to download pre-made Docker images,’ Redis, ubuntu, MySQL, they’re all just sitting there. There’re thousands, anything you want is in there. If you want to prototype something real quick, and you know you’re going to need a Redis server, a MySQL, WordPress, whatever, you can put together a series of images that are talking to each other as containers. And they come with documentation, so you can be up and running with a Redis server is minutes. You’re not installing Redis on your machine, which is awesome. You’re just spinning up an instance of Redis. Say you don’t like the version, you can roll back to a version. You can have multiple versions on your system all running at the same time. For any of these things. This is a standard way to install whatever you need.
Now pushing makes sense right, this is easy to understand. During development, I’m going to make my version of my Python app, I’m going to box it up in a container, image it, check in the definition of the image, which is called a Docker file, and then I’m going to push this to our Artifact server. And after I tell the testers, and hopefully I’ll have an automated system that’s running all of the tests on these. Once I’ve got their blessing, all that means is production would then just say ‘oh, I need to start up these five containers from these images, I need Blake’s Docker Demo v1.1, I need the official MySQL 1.2, whatever and it’ll go out on to the Internet and go on to the Docker Registry and just pull down those images for you. So it’s a nice way of pushing stuff to different environments and having the exact same build.
It’s easy to get carried away with this. I’m still trying to figure out best practices, but you can definitely go too far with this. There’s some guidelines that I’ve read, like ‘would you want 50 instances of this running on this machine’, If the answer is ‘hell no’ then you probably shouldn’t be Docker containerizing this. So somethings are better off not in containers. It may be your database server, that is always going to be on one machine, maybe it is Redis. I don’t know exactly, but I think when you have a new tech like this you have to be careful that you don’t overdo it.