Source file ⇒ lec38.Rmd
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.
File renaming isn’t an effective form of version control.
Git is a version control system and allows you to save copies of your code throughout the entire developmental process. GitHub allows you to push your code from your local workspace to be hosted online.
Git has a distributed version control system (the optional central server is GitHub). Each user has their own repository on their local compuer. Users, can clone each other’s repositories or clone a central respository. Picture:
In lecture 37 (steps 1-4) http://rpubs.com/alucas/173612 you made a GitHub account, installed git on your computer and configured RStudio to use git.
Assuming this is all working now we are going to use git/GitStudio with RStudio.
Here is a good YouTube video on how to use GitHub with RStudio
https://www.youtube.com/watch?v=uHYcDQDbMY8
Also, here is a very complete resource about GitHub with RStudio by Hadley Wickham. http://r-pkgs.had.co.nz/git.html
In R Studio:
Choose Git
Enter the Repository URL from GitHub
Above I have circled my repository URL
The remote repository will be cloned into the specified directory and RStudio’s version control features will then be available for that directory.
For this lecture lets call the Project directory name myProj
. This will create a local repository on your computer called myGitHub
.
Alternatively, you can click on the project you wish to open using the drop down menu circled below.
If you already have a tab labeled Git next to the tabs Environment and History, skip these instructions.
Commit directly to the master branch
The working tree (or directory) is the tree of files you are working on in the current directory. For example: working tree:
proj1
├── code
│  ├── analysis.R
│  ├── eda.R
│  └── preprocess.R
├── data
│  └── blob.dat
├── figs
│  └── diagram.png
├── paper
│  ├── Makefile
│  ├── report.bib
│  └── report.tex
└── slides
└── presentation.md
A commit is a snapshot of an entire directory tree at a given point in time, some metadata (e.g., reference to previous commits, authors name), and an identifier (called a hash).
A repository is essentially a group of linked commits (forming an directed acyclic graph, DAG). It is a hidden .git
file in your current working directory.
To understand what a Git branch is, we first need to visit the idea of a head. A head is an easy to remember label (e.g., HEAD
, master
, feature1
) that references a commit.
By default every repository has a head called master
. In this figure master
refers to the commit whose hash begins f30ab
. This allows you to refer to the commit by the easy to remember name master
rather than f30ab
.
A repository can contain any number of heads. Each branch is associated with exactly one head. The head is essentially a label for the branches of a tree.
In this example, the branch master and iss53 share a common history up to commit C2. However, the branch iss53 differs from master as it has the additional commit C3.
This allows the history of both branches to diverge. For instance, your master branch may be were you integrate new features and bug fixes into the main trunk of development. If you have a bug report (perhaps labeled issue 53) that you are trying to fix, you might create a branch (labeled iss53). While you are preparing a fix for issue 53, the main trunk gains another bug fix or feature.
Here both master and iss53 have a shared history up to the commit C2. However they both have commits that the other lacks after their shared ancestor.
Once you’ve completed fixing issue 53 on branch iss53, you want to integrate your work back into the main trunk represented by the master branch. The process of integrating your work is called merging and often happens automatically. Once the iss53 branch has been merged back into the master branch, you can continue with a unified line of development:
The result of merging the iss53 branch into the master branch. At this point, you could safely delete the iss53 branch, which would result in removing the label iss53 pointing to commit C5.