# verify token
gitcreds_get()
<gitcreds>
protocol: https
host : github.com
username: PersonalAccessToken
password: <-- hidden -->
# test connection
gh_token_help()
Collaborative Coding and Version Control in R with GitHub: A Zero to Hero Guide
Imaging working on an R script, analysing some very complex data. You spend days cleaning the data and building models and creating some elegant visualisations. Then one small change breaks everything - and you cannot remember what the code looked like two days ago.
Now imaging another scenario where 3 other people are working on the same project, each making changes simultaneously. Without coordination, chaos ensues.
Version control is like a “time machine” for your code. It tracks every change you make, lets you revert to any previous version, and allows multiple people to collaborate without overwriting each other’s work.
Think of it as an office 365 word document on your OneDrive for code, but way more powerful.
Git: This is a local version control system installed on your computer. It manages changes to your files
Github: A cloud platform that hosts your git repositories (repos), enabling collaboration, backup and sharing.
When combined with RStudio
, the most popular IDE for R
, Git
and Github
become seamless tools for collaborative data science
✅Track changes in scripts, data, and reports
✅Collaborate with team-mates without file conflicts
✅Reproduce results by revisiting earlier versions
✅Share packages or research openly
In this tutorial, you will go from zero experience to confidently managing team-based R projects using Git
and Github
, all within RStudio
.
Before diving into version control we need to set up our environment
“🧰 You’ll need:”
Git must be installed so RStudio can communicate with GitHub.
Go to https://git-scm.com
Download Git for your OS (Windows, Mac or Linux)
Run the installer:
xcode-select --install
) or download GUI installerOpen Terminal
(macOS/Linux) or Command Prompt/PowerShell
(Windows) and run
git --version
✅Expected output: git version 2.x.x
❌If not found: Reinstall Git
and ensure ‘Add to PATH’ is selected.
Install both in order. (It is recommended to Install R before RStudio).
Open Rstudio –> Go to Tools > Global Options > Git/SVN
✅You should see:
/usr/bin/git
or C:\Program Files\Git\bin\git.exe
)If missing reinstall Git or manually set the path
We shall use modern R tools to simplify Git/GitHub setup.
In RStudio console:
# install required packages
install.packages("usethis", "devtools","gert","credentials", "gitcreds")
These packages helps to automate setting up your Git/GitHub version control seamlessly
usethis:
Automates common project tasksgert
: Lightweight git interface in Rcredentials
& gitcreds
: Handles authentication securelydevtools
: For advanced package workflows“Restart R session after installing packages”
Now have installed and verified Git, R and RStudio. Let’s configure Git globally and authenticate with GitHub.
Git needs your name and email to label commits. Use usethis
to set this globally
library(usethis)
use_git_config(
user.name = "Your Name",
user.email = "your.email@example.com",
)
Replace with your real name and the email linked to your Github account
“🔗Pro Tip: Use the same email as GitHub to ensure commits are properly attributed.”
Verify with:
# verify identity details
gh_token_help()
# OR
git_sitrep()
Instead of typing your password every time you want to connect to github, we will use an SSH key
- a secure digital key pair. Run this
library(credentials)
ssh_setup_github()
This will :
~/.ssh/id_rsa
and ~/.ssh/id_rsa.pub
."New SSH Key"
Key
box area (starts with ecdsa-sha2 ...
)"Add SSH key"
“Never share your private key (id_ecdsa
) – only the public one (id_ecdsa.pub
) goes online.
Even with SSH, some operations still require a Personal Access Token (PAT)
. More information on why to use the PAT is found here.
Generate a PAT like this:
create_github_token()
This opens GitHub’s token creation page with recommended scopes pre-selected. After generating, copy the token and store it safely (R will save it encrypted).
Then register it with:
library(gitcreds)
gitcreds_set()
Paste the token when prompted. You should see "Credentials stored securely"
if everything is successful.
Test/Verify connection
# verify token
gitcreds_get()
<gitcreds>
protocol: https
host : github.com
username: PersonalAccessToken
password: <-- hidden -->
# test connection
gh_token_help()
These codes should confirm your GitHub login and token validity
“Analogy: SSH key = house key; PAT = master keycard. Both unlock GitHub access securely
Now we will create a new project and connect them to GitHub
# choose a directory
<- file.path('~', "my_first_git_project") # ~ is your root user directory
proj_path
# Create project + initialise Git
create_project(proj_path) # creates and opens a new R project
If you are prompted that: “This directory is not a Git repository. Create On” –> Select Yes.
Otherwise, to create a git repo out of you newly created project type this in the RStudio console of the newly created project "my_first_git_project"
.
::use_git() usethis
This will initialise your project as a git repository and prompt you to commit any file/folder found in the project directory.
You now have:
.Rproj
fileNow that we have a local Git repo we can push this project to GitHub
::use_github() usethis
This does several things:
origin
)“✅You’ll see the repo creation steps with tick marks and it will automatically open a webpage containing the repo on GitHub.”
You can also clone a GitHub repository into your local environment. Example: Clone a public repo
<- "https://github.com/rstudio/learnr"
repo_url <- file.path(tempdir(), 'learnr-demo')
local_path
::create_from_github(repo_url, destdir = local_path) usethis
This clones the repo, opens it as an R project and then sets up Git remotes.
“Exercise: Try cloning your own GitHub repo or a friend’s. Make a tiny edit and try to push
Time to make your first collaborative change!
Let’s simulate a typical workflow.
In the newly created project directory
Open the .Rproj
file
Create a new R script: analysis.R
Add these lines of code to the script:
# First analysis script
data(mtcars)
summary(mtcars)
plot(mtcars$wt, mtcars$mpg)
Save the file.
Look at the Git Pane
in RStudio. You will see analysis.R
listed with status ‘untracked’ (hover mouse over yellow question mark symbol).
Click the checkbox next to it – moves to staged (now ready to commit). Click commit. In the commit message box, type:
Add initial mtcars analysis script
Click commit again (the analysis.R file disappears) and close the dialogue box.
The above process can be done programmatically with the gert
package
library(gert)
#check overview of staged and unstaged files
git_status()
This will show all files that are unstaged or not monitored by Git. You should see analysis.R
file. Stage the file like this
# stage a specific file
git_add("analysis.R")
# stage all files
git_add(".")
# check status again after staging
git_status() # this should return an empty tibble
Now we can commit the staged changes with a message
git_commit("Add initial mtcars analysis script")
“Best Practice: Write clear concise messages like”Fix typo” or “Add regression model” when committing.”
“Use gert::git_reset_mixed
to unstage a file/s”
Once the new analysis.R
file has been committed, you push it to reflect in the Github repo. Click Push
(up arrow icon) in the Git Pane. RStudio sends your commit to GitHub.
Go to your repo online - you will see analysis.R
!
Use gert
to explore from R
library(gert)
# check overview of staged and unstaged files
git_status()
# view commit history
git_log
sample output
“Analogy: Each commit is a snapshot in a photo album. Pushing uploads the album to the cloud.”
explore.R
:gert
commands to:
"Add exploration script"
Now imagine two team mates: You want to test a new model, while your colleague updates the report. How do you avoid stepping on each other’s toes?
Branches are like parallel universes for your code. You can experiment safely without breaking the main version
We can create and switch to a new branch using the git_branch_create()
function from gert
. This function creates a branch from your current commit but you can equally branch from a specific commit by specifying a value (the commit hash) for the ref
argument
# create and switch to a new branch
git_branch_create("feature/new-model")
# check if you are in a branch
git_branch()
Now we can make changes freely and they will not affect the main branch. Let us edit the analysis.R
file by adding these lines of code
# Try Linear model
<- lm(mpg ~ wt + hp, data = mtcars)
model summary(model)
Save the file, stage and commit with the message “Add multiple regression model”. Do not push yet
# create a branch from a commit hash
<- git_log()
commits <- commits$commit[2] # select last but one commit hash
hash1
# create a new branch with commit hash
git_branch_create('fix-bug', ref = hash1)
Once we have create a local git branch we can tell Git to create that same branch on GitHub using git_push
. Think of git_push()
as the command you use to send your work from your computer to the remote server (in this case, GitHub). When you push a branch that doesn’t exist on GitHub yet, Git creates it for you automatically.
git_push(set_upstream = TRUE)
Setting set_upstream = TRUE
tells Git: “Push this branch, and from now on, remember that the local branch named my-branch
is connected to the remote branch named my-branch
. You only need to do this the first time you push a new branch.
sample output
Now your team mates can see and review your work
Meanwhile, another person from the team updated README.md
on the main
branch and pushed
To get those changes, you can pull from the remote (GitHub).
# pull recent changes on GitHub
git_pull()
But wait! we are on the "feature/new-model"
branch. Won’t that create a conflict? No! Everything will workout just fine 😁.
Pulling updates your current branch with latest from its remote counterpart.
To update your local main branch, switch back to your main version (which is usually called main
or sometimes master
), you will use the git_branch_checkout()
function.
This command tells Git to switch your working directory to the state of that branch.
# switch to main branch
git_branch_checkout("master")
# update local main branch
git_pull()
“Before you switch branches, make sure you have committed any work you want to save on your current branch. If you have uncommitted changes, Git will usually prevent you from switching branches to avoid losing your work”
Once your model is approved by all members of the team, you can merge the branch to the main/master
branch
The git_merge()
function is what you use to combine the changes from one branch into another. The key thing to remember is that you always merge into the branch you are currently on.
Before you merge, it’s crucial to follow these steps to avoid issues.
main
branch: You must be on the main
branch because this is the branch you are merging into.git_merge()
command: Now you can run the merge command using the name of the branch you want to merge e.g., (‘feature/new-model’)# merge branch to master
git_merge("feature/new-model")
# push merged changes
git_push()
After running this, your main
branch will have all the changes from the feature/new-model
branch 🎉. If the merge is successful, Git will automatically create a merge commit, and you will see the updates in your files.
Merging conflicts often happen when team members edit the same line of code within a script file. For instance in the analysis.R
file team mate 1 edits the line of code containing the linear model and team mate 2 also edits the same line.
HEAD
summary(lm(mpg ~ wt, data = mtcars))
=======
summary(lm(mpg ~ wt + hp, data = mtcars))
feature/new-model
You have to resolve this manually which you can do in RStudio by editing the line of code in the file
<<<<<<<<<<
, ==============
, >>>>>>>>>>>
markersgit_add(".")
git_commit("Resolve merge conflict")
“Analogy: Two people editing a Google Doc at once. The editor highlights conflicts – you decide the final text.”