This tutorial shows the process necessary to commit your working directory to a github repository


1. First, sign into your github account and create a new repository



2. Next, name your repository and give it a short description

  • Choose whether privacy is set to ‘Public’ or ‘Private’

  • Make sure you click the box beside ‘Initialize this repository with a README’

  • Then click the green ‘Create repository’ button



3. Now, after installing Git Bash from https://gitforwindows.org/ (accept all defaults), open the Git Bash shell

  • See the following website for some common Git Bash shell commands https://smbc-nzp.github.io/dataSci/git_bash.html

  • Use the ‘pwd’ (print working directory) command to see where your Git is currently pointed
  • Use ‘cd ..’ (change directory) to move back one step
  • Then use cd to navigate to your directory of interest



4. Now our directory is set to where our R directory and data files are

  • We can check what files are located in our directory with ‘ls’

  • We must now create a nested sub-directory (folder), called gits



5. Next, we’ll navigate to the directory where we cloned our repository, and check the status

  • We’re good to go!



6. Now we need to add the files/documents that we want to push to our repository using ‘git add’

  • There is a way to do this is Git Bash, but I think it’s easier to just copy and paste

    • So, copy all the files/docs you want to include in your Git Hub repository and paste them in the folder nested under gits/

    • Then, use ‘git add’ to specify what files you’d like to commit in the next step



7. Now we’re ready to commit our directory to our repository on github!

  • Use the ‘git commit -m’ command in Git Bash with a message in quotations

  • The last step is to push all the commits to the github repository using ‘git push origin master’



That’s it! The files/docs magically appear in your github repository! Any github user can clone/download the repository!




One extremely useful function is sharing of data via github. Read on for instructions:


Now we simply use the read_csv function and the objects saved above to import our data into R!

# Read in the data:

read_csv(
  paste0(
    gitUrl,
    '2020-04-03_SRBC.csv'))
## Parsed with column specification:
## cols(
##   STR_STATION_ID = col_character(),
##   tag = col_character()
## )
## # A tibble: 100 x 2
##    STR_STATION_ID            tag     
##    <chr>                     <chr>   
##  1 20150709-1514-ablascovich Impaired
##  2 20150818-0959-jeremmille  Impaired
##  3 20150901-1557-ablascovic  Impaired
##  4 20150902-0836-ablascovic  Impaired
##  5 20150902-0844-ablascovic  Impaired
##  6 20150902-1310-ablascovic  Impaired
##  7 20150902-1330-ablascovic  Impaired
##  8 20150902-1351-ablascovic  Impaired
##  9 20150902-1502-ablascovic  Impaired
## 10 20150902-1507-jeremmille  Impaired
## # ... with 90 more rows



So, what happens if you send someone a dataset as an email attachment, then two hours later realize you have more data, outliers need removed, etc..?


You end up sending another email, with another dataset named ‘fishData2.csv’. Then another change is made. For complex analyses, before long your working with ‘fishData6.csv’.

The alternative using github is making changes to the dataset on in your local directory and pushing it to your repository. This way, your colleagues that are accessing your data are always working with an up to date dataset.


This logic can be extended to entire directories and Rstudio projects using the steps above! You can allow access to your R scripts (which are constantly being updated). Staff in different regions/agencies can access up to date datasets and scripts, and run your analysis exactly as you have, enhancing the ability to collaborate.