R and GitHub

In the previous section, we went through the somewhat laborious steps of setting up your computer to talk to GitHub. We will now look at how to integrate this with R/RStudio, how to set up a R project with GitHub, and how to clone an existing repository.

Setup SSH

To avoid having to type in your GitHub password every time you make a request (commit, clone, etc), we can set up an SSH key pair. The key pair consists of a public and private key. The public key can be given to any server you want to connect to. When this is paired with a client that has the private key (your computer), the server unlocks without need for a password.

To do this, we need to

There are a variety of ways of doing this, but the easiest is to rely on built-in tools in RStudio, so start this up now.

First, it’s worth checking to see if there is already an existing key pair. Go to [Tools] > [Global Options] > [Git/SVN]. If you see something like ~/.ssh/id_rsa in the SSH RSA Key box, you have existing keys that can be used, and you can skip the next section. If you don’t see anything, but you know that there should be a key already present, you can check this by looking at the contents of the .ssh directory on your computer:

ls -al ~/.ssh/

If you see a different key pair present, you can use these later on.

If not, you can generate a key pair by clicking on the [Create RSA Key…] button. RStudio will prompt you for a passphrase, but it is probably easiest to skip this for now and add it later. Click [Create] and RStudio will generate an SSH key pair, stored in the files ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub (and the first of these should be shown in the SSH RSA Key box.

Now let’s add the private key to the local ssh agent. You will need to do this from a terminal. Again, the easiest way is to open the terminal in RStudio (go to [Tools] > [Terminal] > [New Terminal]) to open a terminal in the console panel. Here, first type the following to check that ssh-agent (the local SSH client) is running:

eval "$(ssh-agent -s)"

And you should see a process ID like this:

# Agent pid 95727

Now add the key you have created using the following command:

ssh-add ~/.ssh/id_rsa

Note that if your key pair has a different name, you will need to use that here.

Finally, we need to add the public key to GitHub. Go to [Tools] > [Global Options] > [Git/SVN]. If your key pair has the usual name, id_rsa.pub and id_rsa, RStudio will see it and offer to “View public key”. Do that and accept the offer to copy to your clipboard. If your key pair is named differently, open it using a text editor and copy the contents to your clipboard.

Now open a browser, go to GitHub and sign in. Click on the profile picture in the top right hand corner and go to [Settings] > [SSH and GPG keys]. Click [New SSH key] and then paste what you just copied into this box. Give it a title (this can be anything you like) and click [Add SSH key].

And that’s it! If you’ve made it this far, everything should be in place for you to connect to GitHub directly as you use R. To test the connection, go back to the terminal on your computer and enter the following:

ssh -T git@github.com

And you should see a confirmation that the RSA host key has been permanently added.

RStudio and GitHub

Finally, we can now connect up RStudio with GitHub. There are a couple of ways of doing this, but the following method is pretty reliable.

1. Make a new repository on GitHub

We covered this briefly in the previous instructions, but let’s do this again. Go to GitHub, make sure you are logged in and go to [Repositories]. Click the green button to make a new repository. Call this myrrepo, and add a short description (something like testing RStudio setup). Make sure the check-box to initialize the repository with a README is checked and click [Create repository]. Once created, click the button [Clone or download], and copy the HTTPS URL (if you click on the little clipboard icon this will get copied to your clipboard).

2. Make a new RStudio project

To connect this to RStudio, we need to make an RStudio project. This is a little different from the regular R scripts, and acts like a git repository to store a bunch of files in a common setting. Open RStudio, and make a new project from [File] > [New Project]. Click on [Version Control], then [Git] (Clone a project from a Git repository).

On the next screen, add the URL you copied from GitHub to the box labeled [Repository URL]. For the next box, accept the given project directory name (this should be the same as the GitHub repository). The final box gives the directory where the repository will be cloned to on your computer. Make sure this is ok, and that the check-box [Start in new session] is checked. Now click [Create project].

RStudio will appear to restart, launching a new local RStudio Project that connects to the repository on GitHub. It will download the README file, and this should appear in the file browser panel.

If all this has worked, congratulations - you have successfully linked RStudio to your GitHub account, and can now make changes and push these to GitHub. Note that nearly all these instructions only need to be carried out once. The only part of this you will do routinely is the last part: creating new projects for different bits of work you are carrying out.

3. Start making changes to your project

Let’s try and make some changes. First, make sure the README.md file is open in the files panel. If not, click on the file browser, which will show you the contents of the project directory, and double click the README.md file. Add a small amount of text to this (I’ll leave this up to you).

If you now look at the RStudio panel with the Environment, History and Connections tabs, you should see a new one labelled ‘Git’. Click on this and you should see your READ.md file with a small blue ‘M’ next to it. This indicates that this file has been modified and is different from the version downloaded from GitHub. If you’re happy with the changes you made, click on the check-box next to the ‘M’ (under the heading ‘Staged’) to stage these changes for uploading.

Now click the [Commit] button. A new window will open showing a) all the files; b) the staged files with a check mark; c) a blank panel to add a commit message; d) a display of which lines have been changed in the currently highlit file (green for additions, red for deletions). Check that your file is still listed as ‘Staged’ and add a commit message; a short description of the changes. Once this is done, click [Commit] to commit your changes. This effectively creates an updated local version of the repository, including the staged changes. However, if you check your GitHub repository, and look at the README file there, you will see that the changes have not yet been uploaded. To do this, we need to push the changes, so click this button – you should see a short message detailing the upload. Go back to your browser and recheck the README file and you should see your changes appear (refresh the browser page if they are not there).

Now let’s add an R script. In RStudio, click [File] > [New File] > [R Script] to open a blank R script. Add the following R code to this script:

x = rnorm(100, mean = 20, sd = 5)
y = x + rnorm(100, mean = 0, sd = 2)

fit = lm(y ~ x)

summary(fit)

This simply generates some correlated random numbers and builds a linear model between them, and prints out the results of that model. Save the script file as ‘lmExample.R’ and then click ‘Run’ from the scripting window. This will run all the code and allow you to check for mistakes. If all is ok, go through the same steps to commit the changes:

  • Click on [Commit] from the ‘Git’ tab
  • Stage the files you want to push
  • Add a commit message
  • Push to GitHub
  • Check that your changes made it

Let’s add a couple more changes. First, we’ll add code to plot the data, and the model fit to the end of the file:

plot(x,y)
abline(fit, col=2)

Run to check your code (and the beautiful plot you get), and then commit and push your changes.

Finally, get the Mauna Loa csv file from the slack channel and add it to the project directory. If you’ve forgotten where this is, typing the following command will print R’s current directory

getwd()

Now if you check your Git tab, you should now see this file, with a small ‘A’ to indicate this is a new file added to the repository. Open a new script to load and plot this

co2 = read.csv("co2_mm_mlo.csv")
plot(co2$decdate, co2$interpolated, type='l', main="Mauna Loa CO2",
     xlab="Time", ylab="ppm")
lines(co2$decdate, co2$trend, col=2, lwd=2)

Save and run your script, and if it all works, commit and push this to GitHub. Finally, go back to your browser and check that everything has uploaded correctly.

Deleting GitHub repositories

If you need to delete a git repository, you will need to both delete the local copy and the one hosted by GitHub. The local copy can be deleted like any normal folder on your computer. The remote copy can be deleted by going to the repository page on GitHub, clicking on [Settings], scrolling all the way down to the Danger Zone1, and selecting [Delete this repository].


  1. Sterling Archer approves of GitHub