This tutorial shows the process necessary to commit your working directory to a github repository
1. First, sign into your github account and create a new repository

2. Next, name your repository and give it a short description
Choose whether privacy is set to ‘Public’ or ‘Private’
Make sure you click the box beside ‘Initialize this repository with a README’

3. Now, after installing Git Bash from https://gitforwindows.org/ (accept all defaults), open the Git Bash shell
Use the ‘pwd’ (print working directory) command to see where your Git is currently pointed
Use ‘cd ..’ (change directory) to move back one step
Then use cd to navigate to your directory of interest

4. Now our directory is set to where our R directory and data files are
We can check what files are located in our directory with ‘ls’
We must now create a nested sub-directory (folder), called gits
We can now clone our repository into the gits directory using ‘git clone https://github.com/FWeco/dataRequests.git’ (the hyperlink is from our newly created github repository)

5. Next, we’ll navigate to the directory where we cloned our repository, and check the status

6. Now we need to add the files/documents that we want to push to our repository using ‘git add’
There is a way to do this is Git Bash, but I think it’s easier to just copy and paste
So, copy all the files/docs you want to include in your Git Hub repository and paste them in the folder nested under gits/
Then, use ‘git add’ to specify what files you’d like to commit in the next step


7. Now we’re ready to commit our directory to our repository on github!
Use the ‘git commit -m’ command in Git Bash with a message in quotations
The last step is to push all the commits to the github repository using ‘git push origin master’

That’s it! The files/docs magically appear in your github repository! Any github user can clone/download the repository!

One extremely useful function is sharing of data via github. Read on for instructions:
We can leverage the hyperlink address to import datasets directly from github!
In the repository, click on the .csv file that we pushed from our directory (called ‘2020-04-03.csv’)
This is a direct link to our data, which can be accessed in R
After copying the link, we save it to the ‘gitUrl’ object
Then we can use the paste0() function to complete the hyperlink, which will be useful if we have many datasets saved in the same repository
All we have to do is change the file name (the link will not change)
gitUrl <-
'https://raw.githubusercontent.com/FWeco/dataRequests/master/'
paste0(
gitUrl,
'2020-04-03_SRBC.csv')
## [1] "https://raw.githubusercontent.com/FWeco/dataRequests/master/2020-04-03_SRBC.csv"
Now we simply use the read_csv function and the objects saved above to import our data into R!
# Read in the data:
read_csv(
paste0(
gitUrl,
'2020-04-03_SRBC.csv'))
## Parsed with column specification:
## cols(
## STR_STATION_ID = col_character(),
## tag = col_character()
## )
## # A tibble: 100 x 2
## STR_STATION_ID tag
## <chr> <chr>
## 1 20150709-1514-ablascovich Impaired
## 2 20150818-0959-jeremmille Impaired
## 3 20150901-1557-ablascovic Impaired
## 4 20150902-0836-ablascovic Impaired
## 5 20150902-0844-ablascovic Impaired
## 6 20150902-1310-ablascovic Impaired
## 7 20150902-1330-ablascovic Impaired
## 8 20150902-1351-ablascovic Impaired
## 9 20150902-1502-ablascovic Impaired
## 10 20150902-1507-jeremmille Impaired
## # ... with 90 more rows
So, what happens if you send someone a dataset as an email attachment, then two hours later realize you have more data, outliers need removed, etc..?
You end up sending another email, with another dataset named ‘fishData2.csv’. Then another change is made. For complex analyses, before long your working with ‘fishData6.csv’.
The alternative using github is making changes to the dataset on in your local directory and pushing it to your repository. This way, your colleagues that are accessing your data are always working with an up to date dataset.
This logic can be extended to entire directories and Rstudio projects using the steps above! You can allow access to your R scripts (which are constantly being updated). Staff in different regions/agencies can access up to date datasets and scripts, and run your analysis exactly as you have, enhancing the ability to collaborate.