
Please note, still in draft, expected to finalised in Jan 2019
Introduction
The basic idea of Docker is that it gives you the ability to have complete control over your tech stack. Maybe you want a particular version of Linux with really specific software versions installed. Or maybe, if you are one of those data science types, you always like to work with a particular versions of RStudio or Jupyter Notebook and specific versions of libraries. Docker provides a way to create the exact set up you need, and then lets you easily pass this specification to other people, so you can be confident that all your stuff will just work on different machines.
Many of the tutorials you will come across for Docker assume you are trying to set up complex, multi-server environments because you might want to deploy applications across different machines and networks.But for the data science types, the concerns are often a little different, and I will focus on how Docker can help you create data science environments that help with your workflow. At the end of the tutorial I have some of that other stuff and can point you to further resources if you would like to go deeper (and I recommend you do as Docker is just an amazing piece of tech).
First steps
Now I am going to go ahead and assume you have Docker up and running. You will find that, Docker is generally pretty plug and play for Mac, Linux and Windows. I am more of Linux guy, but I use Docker mostly on my work machine which is Windows and I had to configure a couple of things to do this, but there is lots of documentation out there to to do this. There are also a ton of tutorials around to help you set it up. Note that you need admin rights on your machine to use Docker.
Peraps the simplest I could do to give you an intution about docker is to open a command shell (such as PowerShell on Windows, the Terminal on Mac, or Bash on Linux), and try something like the following to start up a container (and stay calm, I will explain what a container is in a moment):
docker container run –p 80:80 nginx
Ok, lets now shut the container down. You should be able to just ctrl-C. If you run into any issue, the easiest thing to do right now is to first run a command to list all the running containers (you will see a big long ID string). Then, run a second command to stop the running container. Note that if you just put the few characters of the container ID, Docker will know which one you are talking about.
docker container ls docker container stop [CONTAINER ID]
Let’s try this again and start up a new container. But this time, we will give the container a name with the –nameflag and also run it in detached mode with the -d flag
docker container run –publish 80:80 –detach –name webhost nginx
Close it down by running those close down commands above.
Before we go much further, let’s introduce a couple of key ideas from Docker: image and container. You can think of an image as a master copy of the stuff needed for a particular tech set up. Like maybe you have an Ubuntu image which has all libraries and binaries to run Ubuntu. Or maybe your image holds all the specs for Debian linux environment with but has React.js, Node.js and MongoDB installed.
The people that created Docker also created this neat thing called DockerHub, which is a repository of all kinds of different images. You can think of Dockerhub asa really cool place that holds a huge variety of master copies for all the different tech set ups you want. It has images that hold Linux machines with Python Anaconda Stack, or an image that has the set up for a R Shiny Server, or an image that has a set up for ElasticDB. You can grab them and use them, or change them and create your own images.
A container is just an instance of an image. So let’s say you get hold of the Ubuntu image from dockerhub. Then you can run an instance of it. Or you might run a bunch of instances. To provide an intuition, imagine if someone handed you a copy of Ubuntu operating on a CD or flash drive (an image). Then, you could install this a whole bunch of desktop computers - each of these desktops would be like a container.
Now that we have a little more context, let’s break down what happened when I ran that last run command:
docker container run –publish 80:80 –detach –name webhost nginx
- First, Docker tried to create a container from an image called nginx. It could not find it on my local machine, so it went to try and find it on Dockerhub
- Dockerhub had a look and found an image called nginx, and sent it across the network to my local machine
- Now armed with image, my local machine then set about creating an instance of it (called a container)
- Along the way it set up some things based on instructions I provided (such as
-publish, -detach and –name. This tells docker to forward my local port 80 to the containers port 80, to run it in detached mode, or in the background, and give ia a name called ‘webhost’
Now lets get a little more comfortable with using different images to create different containers. You can just stop the containers as you did above, by listing them and stoping them. Note you can stop multiple containers by typing:
docker containenr stop [FIRST CONTIANER ID] [SECOND CONTAINER ID]…etc….
Ok, create and stop following containers now
docker container run -p 3306:3306 –detach –name mysql -e MYSQL_RANDOM_ROOT_PASSWORD=yes mysql
docker container run -p 8080:80 -d –name apachehttpd httpd
docker container run -p 80:80 -d –name nginx nginx
There are also some neat ways to get hold of information about containers. Seeing what is happening on your container is just like runining the Task Manager on Windows, or Activitity Monitor on Mac, to see the kinds of processes that are happening. There are some different commends you can use to get different types of information
docker container logs [CONTAINER NAME]
docker container top [CONTAINER NAME]
docker container inspect [CONTAINER NAME]
docker container stats [CONTAINER NAME]
You might be starting to notice that Docker is full of handy ways to give you info about what is going on. You can give it all kinds of instructions such as -p and -d. Docker has really nice documentation for all of this and if you put “help” at the end of any commands, you can can access this.
docker container stats help
Ok, so right now we still haven’t done very much. But I hope you will at least agree that the idea behind all this does seem pretty cool for data scientists. I like the idea of being able to spin up different machines and customise the software in them for my data science needs. I can have a container with all my favourite Python libraries and Juptyer notebook setup. I can have another one with R Shiny Server set up. I can have another one Graph DBs. And I can get all these to communicate to each other too (and we will get to all of this soon).
Next steps
Given that these containers are kind of like machines we have spun up, a logical next question might be how to get into the machines to run commands. And of course you can do this. So close down all your running containers, and the run the following command to try this now:
docker container run -it –name nginx nginx bash
Much of this will be stuff we have seen before. The new thing though is -it. This tells Docker to start up a container put us into into it in interactive mode. It also tells us (with the “bash” word) that we want to be dropped into the bash prompt. This image is based off a Debian linux which has a bash prompt so its no problem
You will notice though that, that we didn’t run this container in -d flag. So its not running in the background, and as soon as we jump out of the container (by using ctrl-C or just typing ‘exit’) the container itself just stops. It would make more sense if we could start a container, leave it running, and jump in and out of it as needed. So lets stop any running containers, and first start up a new container.
docker container run -d –name myUbuntu ubuntu
We can see that it is running by doing a docker container ls. Then we can jump into this container using the exec command. Note that we could do this by using the name of the container that we created, or the container ID (or even the first few characters of the container ID)
docker container exec -it ubuntu bash
Once you are in the container it is just the same as being on the ubuntu command line. You can create directories, install things (in this case using apt-get as its ubuntu linux), and do anything you might do in a standard linux machine.
Now don’t get foolled by a little gotcha here. The above calls exec and then passes it bash. So you need the bash command line to be available in the container ubuntu. If I was running an alpine linux image (which does not have a bash shell, but a sh shell), things would be a little different:
docker container run -d alpine alpine
docker container exec -it alpine sh
Dockerfiles
Pretty soon you will want to customise some of the images that yo uare using. You might want a standard Debian image, but you also want to install a little extra software. You don’t want to have to use the exec command to jump into a container and do all this set up. And this is where a Dockerfile comes in. It is kind of like a shell script that has bunch of simple commands that specify your image, and any extra set up you would like.
The best way to provide an intuition around this is to start with a super simple example. Start by creating a new directory (call it DockerTest or something logical) and inside of that, create a new file and call it “Dockerfile”. Note it doesn’t need any kind of extension, its just “Dockerfile”. Now open it up in your text editor of choice (I mostly use Atom or Sublime Text these days but anything you have is fine) and type in the following:
#
# A first super simple example of a Dockerfile
#
MAINTAINER Jamie Gabriel "email@someemail.com"
FROM ubuntu:latest
RUN apt-get update
RUN apt-get install -y python python-pip wget
RUN pip install Flask
PORT 5000 5000
ADD hello.py /home/hello.py
WORKDIR /home
The syntax here is pretty straightforward. Let’s take a walk through it. The # symbol makes whatever follows it a comment. All the all-capital-letter commands, things like like MAINTAINER, FROM and RUN are keywords that allow Docker to have some basic instructions on building an image and adding things into it. It is also possible to include some basic metadata about this image (here using the keywoard MAINTAINER).
The above Dockerfile starts with a comment, then uses the MAINTAINER instruction, which allows me to include a little metadata saying who is building and maintaining this Dockerfile. The it uses the FROM command to specifiy a base image that it will get from Dockerhub if it is not found on my local machine.
Then, it uses a series of RUN commands to run commands on the commandline of this container one it is created. The commands tell the image (an ubuntu image in this case) to do an apt-get udpate to make sure all ubuntu standard packages are up to date, then do an apt-get install to get some libraries (in this case, python, python-pip, and wget), and then does pip install Flask to install the Flask web framework. Finally it uses the keyword PORT to indicate which port from my local machine will be forwarded to the port on the container (in this case all my local traffic on port 5000 will go to) The I am using the PORT command to indicate PORT
One done, we have built a base image of ubuntu, and then customised to our own needs, and kept all this in a handy config script. Lets now build this image by doing the following:
docker image build -t mycustomimage .
Note the use of the . in the above command. This tells Docker to look for something in the current directory with the name of “Dockerfile”. You could also just write docker image build -t cusomnginx Dockerfile
Now you can run this customised container in a way that we could expect, as following:
docker container run -p 5000:5000 mycustomimage
Note another little gotch here. Even though we specified the port with the keyword PORT in the Dockerfile, we still needed to put it on the command line with the -p flag. This is a quirk of the Dockerfile, and you can think of the PORT keyword in the Dockerfile, just there for the purpose of telling anyone using it that they will need to use the -p with flag. Just accept it as weird quirk. There is a neat way to get around this with docker-compose that I will explain later.
.
There is great documentation the keywords used in the Dockerfiles, and I encourage you to check it out
here. Even better, get hold of some docker files (like this one, this one, or this one) and play around with them. Once you spend a few days getting use to creating Dockerfiles and customising them to your needs, you will really start to see the benefits of how this might help you do data science.
Interacting with your local file system
So we can do quiet a bit now. We can build images. We can create containers from them and go into those containers and run commands. We can build custom images to suit our needs using Dockerfiles. As a data science, this gives me the control to have a any kind of tech stack I can think of set up and ready to go.
But there are still some other things we really want to be able to do here. Right at the top of the list, is the ability to interact with my local files. Imagine this scenario: I have a bunch of .R files on my local machine. But I want to use a version or RStudio Server (like the image here). I want this container to be able to see my files, and make changes to them.
This question is really realted to how you deal with persistant data You want to be able to spin up containers, as needed, and you can throw them away and grab news ones at any time. But you don’t want to do that with your data. To deal with this Docker provides some really powerful solutions. The first solution is volumes. These are actually created at the same time as containers to hold data created by the container. Wen you throw out a container, volumes still around, and you can check them out with the following:
docker volume ls
The second solution, which is far more handy for datascience, is the use of bind-mounting. This lets you access local files and directories from inside a Docker container. This is the really handy one for data scientist setups.
Using docker-compose for bind mounts and ports
bind mounting - using docker for local develo mappign of host file or dorectory to cointainer…locations point same…. host files win…. cant specifiy in Dockerfile….need to bind…but you can do this
… run -v //c/users/bret:/path/container check out dockerfile sample-2 - no volumes in here….
docker cotnainer run -d –name nginx -p 80:80 -v $(pwd):/usr/share/nginx nginx
open another termingal….edit file on the hose ,and hceck in container….
go into container and check, and edit….so edit from both places….
now think about complicated dev setups…..this solves all that!
not you can look at logs of container when you are doing dev…..
Updates Assignment Lecture 50 - bind mounts…..
Idea - mount data into….
Docker compose - bind mounting simpler…. ports simpler…. Jekyll is github pages….
use bindmount-smaple-1
edit these, then container running background
docker run -p 80:4000 -v $(pwd):/site/bretfisher/jekyll-serve # this will open things up and then refresh browser to see changes…..
you can change _posts directory
assignment - use bindmount-sample-1 directory…. docker container run -p 80:4000 -v$(pwd):/site bretfisher/jekyll-serve #image…. this will start server….
asdfasdfa
Going deeper into Docker
So at about this point, as a data science type, you should start to have a enough go forward. , you really have everything you need to to ake control of your data science stack. You are now good to go for data science. Download different images, check them out. But of course Docker has some other huge features, things like networks,docker-compose and docker swarm which lets
Volumes…. can put this into a docker file
check out mysql docker image hub….look at latest docker file….
VOLUME /var/lib/mysql ….. this creates volume and puts it in director NOte that volumes need manual delition
docker pull mysql
docker image inspect mysql # you can see the volume
docker container run -d –name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True mysql
docker cotnainer inspect mysql # will show volume, see where volume living on host (whcih is really in a linux vm)
docker volume ls
docker volume inspect
consider, rerun mysql see two volumens now…. when you stop containers, volumes still there….
Named volumes…..
docker container run -d –name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql mysql # note put pair of values, like dict betwen colon
docker volume ls docker volume inspect # much easier to work with docker volume create #couple of cases where you createa volume..but thats a future ting
Take home…..create containers with named volumes
Networks in Docker
There is super cool thing in Docker called docker-composer that lets you spin up multiple containers at once and becomes really handy in some situations. But to get into this, we need to talk about some other aspects of docker first! So lets shift gears a little and talk about networks in Docker
Because Docker is not just about images and containers, its about networks too. Docker lets us spin up any kind of containers, and then gives us a way to set these containers up on networks so the containers can communicate to themselves and the outside world. Docker is geared up to make working with networks easy, but its also completely customiseable so you can do really advanced setups.
docker container run -p # this -p means you can expose ports Need to know a bit out TCIP, Neworks detach batteries included but removable - you can do huge customisation -p exposers containers on to the physical netowrk All containers connected to private virtual network “bridge” so has its own bridge Each virtual network routes throug a NAT firewall on host IP -p not needed for containers talking All settings customisable You can attach to multiple virtual networks You can skip virtual netwoks and use host IP (–net=host) Can use different drivers ocker container run -p 80:80 –name webhost -d nginx docker container port webhost # get ports forwarding traffic container uses different IP to get IP: docker container inspect –format ‘{{.NetworkSettings.IPAddress}}’ webhost docker container inspect webhost can also get all specs on container using above docker is 172 network…. Host connected to physical network via firewall (blocks traffic) and any traffic Natted concept of virtual networks in machines (bridge or Docker0) new container conencts to virtual and then network connects to your computer address, -p 80:80 means any traffic through network second container can be set up on same network, but need to expose externally Can create another network with its own -p and will take traffic from outside You can have seperate networks and utilise publishable ports on machine, or you can connect via network Anywhere I do a docker container run nginx , where nginx is the image you should use, replace that with nginx:alpine , which still has ping command in it. bridge is default network that connects to firewall docker network create my_app_net # creates a new virtual network default driver is bridge not advanced. docker network create –help docker cotnainer run -d –name new_nginx –network my_app_net nginx creates and connects to a created network docker network connect so lets create another container docker container run -p 8080:80 -d –name apachehttpd httpd and then connect it to the same network and then run docker network inspect Also disconnect docker network disconnect app_stack 37e gives you lots of options…. makes it safer
how containers find each other DNS and the affect on containers can’t rely on IP addresses as things are dynamic naming crucial - docker users container names to talk to each other brige networks have auto dns I can do docker container run -d –name nginx –network my_app_net nginx docker container run -d –name nginxTwo –network my_app_net nginx thenI can pop into one and ping the other docker container exec nginx ping nginxTwo this solves the dynamic problem to do this you need to create new network, bridge doesn’t have this ASSIGNMENT Open 2 linux boxes, talk to each other note that you need to run debian / ubuntu with -it bash docker container run -it –name debian –network my_app_net debian bash round robin testing create network docker network create dude create container docker container run -d –net dude –net-alias search elasticsearch:2 docker container run -d –net dude –net-alias search elasticsearch:2 docker container run –rm alpine nslookup search docker container run –rm –net dude centos curl -s search:9200
IMAGES AND DOCKERHUB
application binaries and dependencies and metadata of how to run things
No OS in image…just binaries application needs
Not virtual machine….
can get images from dockerhub
log in and see all the images
use images that are popular first
start with official…will be noted this way….
no forward slash in the name
official are created at Docker
good docs in official images
note there are different versions in official…
latest is a special tag….getting latest version
docker pull nginx:1.11.9 # to specifiy version, which migh tbe latest # make sure you control version # notes in docker hub will have different docker pull nginx:1.11.0-alpine # smaller version # look for number of stars and number of pulls # images designed with union file system idea…. docker image ls # lists images docker image history neo4j # shows images that come together and run it # each layer of image gets own unique shah, and they share them # you can see DockerFile style commands in here # image share where possible…. idea of cache of image layers…. # You run container off image….docker creates read write layer on image… # and thats all the container is… # you can do copy on write….meaning you have multiple containers… # that might change config on an image….but this is kept in the # container…. # means that these are not images…just layers, don’t need image id docker image inspect neo4j # all metadata, env variables, ports etc…. # Tagging and uploading images # tagging needs specific formatting # images don’t have name…. docker image ls # no names, but shows us repo (username/repo or repo) # tag isn’t version or branch, kind of in between…pointer # tag might be 1.11.10 or 1.11.10-alpine… # you can add tag to image that you didn’t make…. docker image push jgab3103/centos #create new repo etc…
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
---
title: "Docker for Data Scientists"
output: html_notebook
---



#![Hello docker whale!](dockerLogo.png)
<br/>

<i>Please note, still in draft, expected to finalised in Jan 2019</i>
<br/>
<h4>Introduction</h4>
<div>The basic idea of Docker is that it gives you the ability to have complete control over your tech stack. Maybe you want a particular version of Linux with really specific software versions installed. Or maybe, if you are one of those data science types, you always like to work with a particular versions of RStudio or Jupyter Notebook and specific versions of libraries. Docker provides a way to create the exact set up you need, and then lets you easily pass this specification to other people, so you can be confident that all your stuff will just work on different machines.</div>

<br/>
<div>Many of the tutorials you will come across for Docker assume you are trying to set up complex, multi-server environments because you might want to deploy applications across different machines and networks.But for the data science types, the concerns are often a little different, and I will focus on how Docker can help you create data science environments that help with your workflow. At the end of the tutorial I have some of that other stuff and can point you to further resources if you would like to go deeper (and I recommend you do as Docker is just an amazing piece of tech).</div>
<br/>
<h4>First steps</h4>
<div>Now I am going to go ahead and assume you have Docker up and running. You will find that, Docker is generally pretty plug and play for Mac, Linux and Windows. I am more of Linux guy, but I use Docker mostly on my work machine which is Windows and I had to configure a couple of things to do this, but there is lots of documentation out there to to do this. There are also a  <i>ton</i> of tutorials around to help you set it up. Note that you need admin rights on your machine to use Docker.</div>
<br/>
<div>Peraps the simplest I could do to give you an intution about docker is to open a command shell (such as PowerShell on Windows, the Terminal on Mac, or Bash on Linux), and try something like the following to start up a container (and stay calm, I will explain what a container is in a moment):</div>
<br/>
<code>docker container run –p 80:80 nginx</code>
<br/><br/>
<div>Ok, lets now shut the container down. You should be able to just <code>ctrl-C</code>. If you run into any issue,  the easiest thing to do right now is to first run a command to list all the running containers (you will see a big long ID string). Then, run a second command to stop the running container. Note that if you just put the few characters of the container ID, Docker will know which one you are talking about.</div>
<br/>
<code>docker container ls </code>
<br/>
<code>docker container stop [CONTAINER ID] </code>
<br/><br/>
<div>Let's try this again and start up a new container. But this time, we will give the container a name with the <code>--name</code>flag and also run it in detached mode with the <code>-d</code> flag</div>
<br/>
<code> docker container run –publish 80:80 –detach --name webhost nginx </code>
<br/><br/>
<div>Close it down by running those close down commands above.</div>
<br/>
<div>Before we go much further, let's introduce a couple of key ideas from Docker: <strong>image</strong> and <strong>container</strong>. You can think of an image as a master copy of the stuff needed for a particular tech set up. Like maybe you have an Ubuntu image which has all libraries and binaries to run Ubuntu. Or maybe your image holds all the specs for Debian linux environment with but has React.js, Node.js and MongoDB installed.</div>
<br/>
<div>The people that created Docker also created this neat thing called DockerHub, which is a repository of all kinds of different images. You can think of Dockerhub asa really cool place that holds a huge variety of master copies for all the different tech set ups you want. It has images that hold Linux machines with Python Anaconda Stack, or an image that has the set up for a R Shiny Server, or an image that has a set up for ElasticDB. You can grab them and use them, or change them and create your own images.</div>

<br/>
<div>A container is just an instance of an image. So let's say you get hold of the Ubuntu image from dockerhub. Then you can run an instance of it. Or you might run a bunch of instances. To provide an intuition, imagine if someone handed you a copy of Ubuntu operating on a CD or flash drive (an image). Then, you could install this a whole bunch of desktop computers - each of these desktops would be like a container.</div>
<br/>
<br/>
<div>Now that we have a little more context, let's break down what happened when I ran that last run command: </div>

<br/>
<code> docker container run –publish 80:80 –detach --name webhost nginx </code>
<br/>
<br/>

1. First, Docker tried to create a container from an image called nginx. It could not find it on my local machine, so it went to try and find it on Dockerhub
2. Dockerhub had a look and found an image called nginx, and sent it across the network to my local machine
3. Now armed with image, my local machine then set about creating an instance of it (called a container)
4. Along the way it set up some things based on instructions I provided (such as <code>-publish</code>, <code>-detach</code> and <code>--name</code>. This tells docker to forward my local port 80 to the containers port 80, to run it in detached mode, or in the background, and give ia a name called 'webhost'

<div>Now lets get a little more comfortable with using different images to create different containers. You can just stop the containers as you did above, by listing them and stoping them. Note you can stop multiple containers by typing: <br/><br/>
<code>docker containenr stop [FIRST CONTIANER ID] [SECOND CONTAINER ID]...etc....</code>
<br/>

<div>Ok, create and stop following containers now</div><br/>
<code>docker container run -p 3306:3306 --detach --name mysql -e MYSQL_RANDOM_ROOT_PASSWORD=yes mysql</code>
<br/>
<code>docker container run -p 8080:80 -d --name apachehttpd httpd</code>
<br/>
<code>docker container run -p 80:80 -d --name nginx nginx</code>
<br/><br/>

<div>There are also some neat ways to get hold of information about containers. Seeing what is happening on your container is just like runining the Task Manager on Windows, or Activitity Monitor on Mac, to see the kinds of processes that are happening. There are some different commends you can use to get different types of information </div>
<br/>

<code>docker container logs [CONTAINER NAME]</code><br/>
<code>docker container top [CONTAINER NAME]</code><br/>
<code>docker container inspect [CONTAINER NAME]</code><br/>
<code>docker container stats [CONTAINER NAME]</code><br/>
<br/><br/>

<div>You might be starting to notice that Docker is full of handy ways to give you info about what is going on. You can give it all kinds of instructions such as <code>-p</code> and <code>-d</code>. Docker has really nice documentation for all of this and if you put "help" at the end of any commands, you can can access this.
</div>

<br/>
<code>docker container stats help</code><br/>
<br/>

<div>Ok, so right now we still haven't done very much. But I hope you will at least agree that the idea behind all this does seem pretty cool for data scientists. I like the idea of being able to spin up different machines and customise the software in them for my data science needs. I can have a container with all my favourite Python libraries and Juptyer notebook setup. I can have another one with R Shiny Server set up. I can have another one Graph DBs. And I can get all these to communicate to each other too (and we will get to all of this soon).</div>

<br/>
<h4>Next steps</h4>
<div>Given that these containers are kind of like machines we have spun up, a logical next question might be how to get into the machines to run commands. And of course you can do this. So close down all your running containers, and the run the following command to try this now: </div>

<br/>
<code>docker container run -it --name nginx nginx bash</code><br/>
<br/>
<div> 
 Much of this will be stuff we have seen before. The new thing though is <code>-it</code>. This tells Docker to start up a container put us into into it in interactive mode. It also tells us (with the "bash" word) that we want to be dropped into the bash prompt. This image is based off a Debian linux which has a bash prompt so its no problem
</div>
<br/>
<div>
You will notice though that, that we didn't run this container in <code>-d</code> flag. So its not running in the background, and as soon as we jump out of the container (by using <code>ctrl-C</code> or just typing 'exit') the container itself just stops. It would make more sense if we could start a container, leave it running, and jump in and out of it as needed. So lets stop any running containers, and first start up a new container.
</div>

<br/>
<code>docker container run -d  --name myUbuntu ubuntu</code><br/>
<br/>
<div>We can see that it is running by doing a <code>docker container ls</code>. Then we can jump into this container using the <code>exec</code> command. Note that we could do this by using the name of the container that we created, or the container ID (or even the first few characters of the container ID)</div>
<br/>
<code>docker container exec -it ubuntu bash</code><br/>
<br/>
<div>Once you are in the container it is just the same as being on the ubuntu command line. You can create directories, install things (in this case using apt-get as its ubuntu linux), and do anything you might do in a standard linux machine. 
</div>
<br/>
<div>
Now don't get foolled by a little gotcha here. The above calls <code>exec</code> and then passes it <code>bash</code>. So you need the bash command line to be available in the container ubuntu. If I was running an alpine linux image (which does not have a bash shell, but a sh shell), things would be a little different:
</div>
<br/>
<code>docker container run -d alpine alpine</code><br/>
<code>docker container exec -it alpine sh</code><br/>
<br/>



<h4>Dockerfiles</h4>


<div>Pretty soon you will want to customise some of the images that yo uare using. You might want a standard Debian image, but you also want to install a little extra software. You don't want to have to use the <code>exec</code> command to jump into a container and do all this set up. And this is where a Dockerfile comes in. It is kind of like a shell script that has bunch of simple commands that specify your image, and any extra set up you would like.</div>
<br/>
<div>The best way to provide an intuition around this is to start with a super simple example. Start by creating a new directory (call it DockerTest or something logical) and inside of that, create a new file and call it "Dockerfile". Note it doesn't need any kind of extension, its just "Dockerfile". Now open it up in your text editor of choice (I mostly use Atom or Sublime Text these days but anything you have is fine) and type in the following: </div>

<code>
<br/>
#<br/>
# A first super simple example of a Dockerfile <br/>
#<br/>
<br/>MAINTAINER Jamie Gabriel "email@someemail.com"
<br/>FROM ubuntu:latest
<br/>
<br/>RUN apt-get update
<br/>RUN apt-get install -y python python-pip wget
<br/>RUN pip install Flask
<br/>PORT 5000 5000
<br/>ADD hello.py /home/hello.py
<br/>WORKDIR /home
</code>
<br/>
<br/>

<div>The syntax here is pretty straightforward. Let's take a walk through it. The <code>#</code> symbol makes whatever follows it a comment. All the all-capital-letter commands, things like like <b>MAINTAINER</b>, <b>FROM</b> and <b>RUN</b> are keywords that allow Docker to have some basic instructions on building an image and adding things into it. It is also possible to include some basic metadata about this image (here using the keywoard <b>MAINTAINER</b>). </div><br/>

<div>The above Dockerfile starts with a comment, then uses the <b>MAINTAINER</b> instruction, which allows me to include a little metadata saying who is building and maintaining this Dockerfile. The it uses the <b>FROM</b> command to specifiy a base image that it will get from Dockerhub if it is not found on my local machine. </div>
<br/>
<div>Then, it uses a series of <b>RUN</b> commands to run commands on the commandline of this container one it is created. The commands tell the image (an ubuntu image in this case) to do an <code>apt-get udpate</code> to make sure all ubuntu standard packages are up to date, then do an <code>apt-get install</code> to get some libraries (in this case, python, python-pip, and wget), and then does <code>pip install Flask</code> to install the Flask web framework. Finally it uses the keyword <b>PORT</b> to indicate which port from my local machine will be forwarded to the port on the container (in this case all my local traffic on port 5000 will go to) The I am using the PORT command to indicate PORT</div> 
<br/>
<div>One done, we have built a base image of ubuntu, and then customised to our own needs, and kept all this in a handy config script. Lets now build this image by doing the following:</div> 

<br/>
<code>docker image build -t mycustomimage . </code>
<br/>
<br/>
<div>Note the use of the <code>.</code> in the above command. This tells Docker to look for something in the current directory with the name of "Dockerfile". You could also just write <code>docker image build -t cusomnginx Dockerfile </code> 
</div>
<br/>
<div>
Now you can run this customised container in a way that we could expect, as following:
</div>
<br/>
<code>docker container run -p 5000:5000 mycustomimage </code><br/>
<br/>
<div>
Note another little gotch here. Even though we specified the port with the keyword <b>PORT</b> in the Dockerfile, we still needed to put it on the command line with the -p flag. This is a quirk of the Dockerfile, and you can think of the <b>PORT</b> keyword in the Dockerfile, just there for the purpose of telling anyone using it that they will need to use the <code>-p</code> with  flag. Just accept it as weird quirk. There is a neat way to get around this with docker-compose that I will explain later. 
</div>.
<br/>
<div>There is great documentation the keywords used in the Dockerfiles, and I encourage you to check it out <a href="">here</a>. Even better, get hold of some docker files (like this one, this one, or this one) and play around with them. Once you spend a few days getting use to creating Dockerfiles and customising them to your needs, you will really start to see the benefits of how this might help you do data science.</div>


<br/>
<h4>Interacting with your local file system</h4>
<div>So we can do quiet a bit now. We can build images. We can create containers from them and go into those containers and run commands. We can build custom images to suit our needs using Dockerfiles. As a data science, this gives me the control to have a any kind of tech stack I can think of set up and ready to go. </div>
<br/>
<div>
But there are still some other things we really want to be able to do here. Right at the top of the list, is the ability to  interact with my local files. Imagine this scenario: I have a bunch of .R files on my local machine. But I want to use a version or RStudio Server (like the image here). I want this container to be able to see my files, and make changes to them.</div>
<br/>

<div>This question is really realted to how you deal with persistant data You want to be able to spin up containers, as needed, and you can throw them away and grab news ones at any time. But you don't want to do that with your data. To deal with this Docker provides some really powerful solutions. The first solution is <strong>volumes</strong>. These are actually created at the same time as containers to hold data created by the container. Wen you throw out a container, volumes still around, and you can check them out with the following: 
<br/>
<code>docker volume ls</code>
</div>

<div>
The second solution, which is far more handy for datascience, is the use of <strong>bind-mounting</strong>. This lets you access local files and directories from inside a Docker container. This is the really handy one for data scientist setups. 
</div>






<h4>Using docker-compose for bind mounts and ports</h4>


 bind mounting - using docker for local develo
 mappign of host file or dorectory to cointainer...locations point same....
 host files win....
 cant specifiy in Dockerfile....need to bind...but you can do this 
 
 ... run -v //c/users/bret:/path/container
  check out dockerfile sample-2 - no volumes in here....
  
  docker cotnainer run -d --name nginx -p 80:80 -v $(pwd):/usr/share/nginx nginx
  
  open another termingal....edit file on the hose ,and hceck in container....
  
  go into container and check, and edit....so edit from both places....
  
  now think about complicated dev setups.....this solves all that!
  
  not you can look at logs of container when you are doing dev.....
  
  
  
  
  >> Updates Assignment Lecture 50 - bind mounts.....
  
  
  
  Idea - mount data into....
  
  Docker compose - bind mounting simpler.... ports simpler.... Jekyll is github pages....
  
  use bindmount-smaple-1
  
  edit these, then container running background
  
  docker run -p 80:4000 -v $(pwd):/site/bretfisher/jekyll-serve # this will open things up and then refresh browser to see changes.....
  
  you can change _posts directory 
  
  assignment - use bindmount-sample-1 directory....
  docker container run -p 80:4000 -v$(pwd):/site bretfisher/jekyll-serve  #image.... this will start server....
  
  
  
  1. 

asdfasdfa
</div>
<h4>Going deeper into Docker</h4>
<div>So at about this point, as a data science type, you should start to have a enough go forward. , you really have everything you need to to ake control of your data science stack. You are now good to go for data science. Download different images, check them out. But of course Docker has some other huge features, things like <strong>networks</strong>,<strong>docker-compose</strong> and <strong>docker swarm</strong> which lets</div>
<br/>
<hr/>

Volumes....
can put this into a docker file

check out mysql docker image hub....look at latest docker file....

> VOLUME /var/lib/mysql ..... this creates volume and puts it in director
 NOte that volumes need manual delition

> docker pull mysql

> docker image inspect mysql # you can see the volume

> docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True mysql

> docker cotnainer inspect mysql # will show volume, see where volume living on host (whcih is really in a linux vm)

> docker volume ls 

> docker volume inspect

consider, rerun mysql see two volumens now.... when you stop containers, volumes still there....

Named volumes.....

> docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql mysql # note put pair of values, like dict betwen colon

> docker volume ls
> docker volume inspect # much easier to work with
> docker volume create #couple of cases where you createa  volume..but thats a future ting

Take home.....create containers with named volumes


<h4>Networks in Docker</h4>
<div>There is super cool thing in Docker called <strong>docker-composer</strong> that lets you spin up multiple containers at once and becomes really handy in some situations. But to get into this, we need to talk about some other aspects of docker first! So lets shift gears a little and talk about networks in Docker</div>
<br/>
<div>Because Docker is not just about images and containers, its about networks too. Docker lets us spin up any kind of containers, and then gives us a way to set these containers up on networks so the containers can communicate to themselves and the outside world.  Docker is geared up to make working with networks easy, but its also completely customiseable so you can do really advanced setups.




docker container run -p # this -p means you can expose ports
 Need to know a bit out TCIP, Neworks detach
 batteries included but removable - you can do huge customisation
 -p exposers containers on to the physical netowrk
 All containers connected to private virtual network "bridge"
so has its own bridge
 Each virtual network routes throug a NAT firewall on host IP
 -p not needed for containers talking
 All settings customisable
 You can attach to multiple virtual networks
 You can skip virtual netwoks and use host IP (--net=host)
 Can use different drivers
ocker container run -p 80:80 --name webhost -d nginx
docker container port webhost # get ports forwarding traffic
 container uses different IP
 to get IP:
docker container inspect --format '{{.NetworkSettings.IPAddress}}' webhost
docker container inspect webhost
 can also get all specs on container using above
 docker is 172 network....
 Host connected to physical network via firewall (blocks traffic) and
 any traffic Natted
 concept of virtual networks in machines (bridge or Docker0)
 new container conencts to virtual and then network connects to
 your computer address, -p 80:80 means any traffic through network
 second container can be set up on same network, but need to expose externally
 Can create another network with its own -p and will take traffic from outside
 You can have seperate networks and utilise publishable ports on machine,
 or you can connect via network
Anywhere I do a docker container run <stuff> nginx , where nginx  is the image you should use, replace that with nginx:alpine , which still has ping command in it.
 bridge is default network that connects to firewall
 docker network create my_app_net # creates a new virtual network
 default driver is bridge not advanced.
docker network create --help
docker cotnainer run -d --name new_nginx --network my_app_net nginx
 creates and connects to a created network
docker network connect <new network> <container to connect>
 so lets create another container
docker container run -p 8080:80 -d --name apachehttpd httpd
 and then connect it to the same network
 and then run docker network inspect <network_name>
 Also disconnect
docker network disconnect app_stack 37e
 gives you lots of options....
 makes it safer

 how containers find each other
 DNS and the affect on containers
 can't rely on IP addresses as things are dynamic
 naming crucial - docker users container names to talk to each other
 brige networks have auto dns
 I can do
docker container run -d --name nginx --network my_app_net nginx
docker container run -d --name nginxTwo --network my_app_net nginx
 thenI can pop into one and ping the other
docker container exec nginx ping nginxTwo
 this solves the dynamic problem
 to do this you need to create new network, bridge doesn't have this
 ASSIGNMENT
Open 2 linux boxes, talk to each other
 note that you need to run debian / ubuntu with -it bash
docker container run -it --name debian --network my_app_net debian bash
 round robin testing
 create network
docker network create dude
 create container
docker container run -d --net dude --net-alias search elasticsearch:2
docker container run -d --net dude --net-alias search elasticsearch:2
docker container run --rm alpine nslookup search
docker container run --rm --net dude centos curl -s search:9200





## IMAGES AND DOCKERHUB

# application binaries and dependencies and metadata of how to run things
# No OS in image...just binaries application needs
# Not virtual machine....

# can get images from dockerhub
# log in and see all the images
# use images that are popular first
# start with official...will be noted this way....
# no forward slash in the name
# official are created at Docker
# good docs in official images
# note there are different versions in official...
# latest is a special tag....getting latest version
docker pull nginx:1.11.9 # to specifiy version, which migh tbe latest
# make sure you control version
# notes in docker hub will have different
docker pull nginx:1.11.0-alpine # smaller version
# look for number of stars and number of pulls
# images designed with union file system idea....
docker image ls # lists images
docker image history neo4j # shows images that come together and run it
# each layer of image gets own unique shah, and they share them
# you can see DockerFile style commands in here
# image share where possible.... idea of cache of image layers....
# You run container off image....docker creates read write layer on image...
# and thats all the container is...
# you can do copy on write....meaning you have multiple containers...
# that might change config on an image....but this is kept in the
# container....
# <missing> means that these are not images...just layers, don't need image id
docker image inspect neo4j # all metadata, env variables, ports etc....
# Tagging and uploading images
# tagging needs specific formatting
# images don't have name....
docker image ls # no names, but shows us repo (username/repo or repo)
# tag isn't version or branch, kind of in between...pointer
# tag might be 1.11.10 or 1.11.10-alpine...
# you can add tag to image that you didn't make....
docker image push jgab3103/centos #create new repo etc...


```{r}
plot(cars)
```

Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).
