Source file ⇒ lec38.Rmd

Today

  1. Announcements
  2. Practice: Git/GitHub with RStudio
  3. Theory: Version control with Git/GitHub

0. Announcements

  1. Remember Course Evals (important)
  2. Lab this week (final exam review)
  3. I will be out of town Wednesday and Friday but have access to email
  4. I will send info about presentations soon.

Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

File renaming isn’t an effective form of version control.

Git is a version control system and allows you to save copies of your code throughout the entire developmental process. GitHub allows you to push your code from your local workspace to be hosted online.

Git has a distributed version control system (the optional central server is GitHub). Each user has their own repository on their local compuer. Users, can clone each other’s repositories or clone a central respository. Picture:

1. Practice: Git/GitHub with RStudio

In lecture 37 (steps 1-4) http://rpubs.com/alucas/173612 you made a GitHub account, installed git on your computer and configured RStudio to use git.

Assuming this is all working now we are going to use git/GitStudio with RStudio.

Here is a good YouTube video on how to use GitHub with RStudio

https://www.youtube.com/watch?v=uHYcDQDbMY8

Also, here is a very complete resource about GitHub with RStudio by Hadley Wickham. http://r-pkgs.had.co.nz/git.html

Creating a new project based on a remote GitHub repository

In R Studio:

  1. Execute the New Project command (from the Project menu)
  2. Choose to create a new project from Version Control
  3. Choose Git

  4. Enter the Repository URL from GitHub

  1. Project Directory Name should match the ending of the Repository URL
  2. Click on Open in new window circled below

Above I have circled my repository URL

The remote repository will be cloned into the specified directory and RStudio’s version control features will then be available for that directory.

For this lecture lets call the Project directory name myProj. This will create a local repository on your computer called myGitHub.

Open an existing project

  • Open an existing project
    • Click File -> Open Project

Alternatively, you can click on the project you wish to open using the drop down menu circled below.

If you already have a tab labeled Git next to the tabs Environment and History, skip these instructions.

  • Enable git for this project
    • Click Tools -> Version Control -> Project Setup
    • Click the dropdown box Version control system and select Git
    • If you don’t have a Git option go back to Open RStudio and set path to Git executable (part 4) in lecture 37.

Create and commit a file

  • Make your first commit
    • Create a new Rmd file
    • Click File -> New File -> R Markdown
    • Edit the file and change the title
    • Save the file
    • Check Staged (The Status will change from “? ?” to “A” for accepted. )
    • Click Commit

Push your commit to GitHub

  • Click on Push (to the right of Commit).
  • Refresh your GitHub screen and you will see your new committed file.

Make a change and revert it

  • Make an erroneous change to the file and save it
  • Click Diff and then Revert
    • The erroneous change has been undone and the previous version restored

Delete a file

  • Create a new file named doomed.Rmd
    • Enter some text and save it
    • stage it and commit it to GitHub
  • Delete this doomed file
    • Under the Files tab check the box next to doomed.Rmd
    • Click Delete
    • Under the Git tab, a red D appears next to the deleted file
    • Stage the change by clicking the checkbox and commit it

Pushing and Pulling to GitHub

  • click on one of your pushed files on GitHub.
  • Edit the file (look for pencil icon)
  • click on Commit directly to the master branch
  • Commit Changes
  • In R studio click on Pull

Single remote with shared access

For this exercise, you are going to set up a shared collaboration with one partner (the person sitting next to you). For the sake of this exercise we will call the two partners Alice and Bob. We will be using Alice’s GitHub account. This will show the basic workflow of collaborating on a project with a small team where everyone has write privileges to the same Public GitHub repository.

Task for You: sychronization example

  1. choose who is Alice and who is Bob
  2. Alice make Bob a collaborator of your GitHub account (go to settings, collaborators)
  3. Bob clone Alice’s Stat 133 GitHub repository (In RStudio File, New Project, Version Control, give Alice’s HTTPS)
  4. Bob Make a change to one of the files in the cloned Stat133 repository
  5. Bob stage, commit and push your modified file to Alice’s GitHub repo
  6. Alice pull Bob’s changes to your local repo

Next, we will have both parties make non-conflicting changes each, and commit them locally. Then both try to push their changes:

Task for You: Merging example

  1. Alice add a new file, alice.Rmd to your repository and commits.
  2. Bob add a new file bob.Rmd and commit
  3. Alice push to GitHub.
  4. Bob push to GitHub.

What happened? Read the error message and hint provided by git to see if you can figure it out.

Answer:
Since Alice and Bob are working on the same branch of the repository, the problem is that Bob’s branch doesn’t have Alice’s most recent commit, which the branch on GitHub has since Alice pushed it already. In order to push a branch to a remote, you have to already have the entire history on the remote in your local repository. The solution is for Bob to first pull the changes, and then push again. When Bob pulls from his GitHub remote, he will merge Alice’s history into his repository. Now when he attempts to push, his repository contains all the history that the remote repository on GitHub has.

Try and fix the problem.

2. Theory: Version control with git and GitHub

Core concepts

The working tree (or directory) is the tree of files you are working on in the current directory. For example: working tree:

proj1
├── code
│   ├── analysis.R
│   ├── eda.R
│   └── preprocess.R
├── data
│   └── blob.dat
├── figs
│   └── diagram.png
├── paper
│   ├── Makefile
│   ├── report.bib
│   └── report.tex
└── slides
    └── presentation.md
    

A commit is a snapshot of an entire directory tree at a given point in time, some metadata (e.g., reference to previous commits, authors name), and an identifier (called a hash).

A repository is essentially a group of linked commits (forming an directed acyclic graph, DAG). It is a hidden .git file in your current working directory.

To understand what a Git branch is, we first need to visit the idea of a head. A head is an easy to remember label (e.g., HEAD, master, feature1) that references a commit.

By default every repository has a head called master. In this figure master refers to the commit whose hash begins f30ab. This allows you to refer to the commit by the easy to remember name master rather than f30ab.

A repository can contain any number of heads. Each branch is associated with exactly one head. The head is essentially a label for the branches of a tree.

In this example, the branch master and iss53 share a common history up to commit C2. However, the branch iss53 differs from master as it has the additional commit C3.

This allows the history of both branches to diverge. For instance, your master branch may be were you integrate new features and bug fixes into the main trunk of development. If you have a bug report (perhaps labeled issue 53) that you are trying to fix, you might create a branch (labeled iss53). While you are preparing a fix for issue 53, the main trunk gains another bug fix or feature.

Here both master and iss53 have a shared history up to the commit C2. However they both have commits that the other lacks after their shared ancestor.

Once you’ve completed fixing issue 53 on branch iss53, you want to integrate your work back into the main trunk represented by the master branch. The process of integrating your work is called merging and often happens automatically. Once the iss53 branch has been merged back into the master branch, you can continue with a unified line of development:

The result of merging the iss53 branch into the master branch. At this point, you could safely delete the iss53 branch, which would result in removing the label iss53 pointing to commit C5.