Version Control For Data Science Projects

Part 2: Synchronization of GitHub Server and RStudio Desktop

Introduction

From the last blog post Here, we explained what is Version Control and how Git tool for version control works. Now we are going to learn how to install and set Git up and running. Similarly, it will be required to install other platforms required in this short tutorial.

Installation and Set Up

Here we are going to make sure we install and set up the software that are going to be entirely used in this blog.

Installation of Git VCS

The files necessary to install Git are on Download Git for all systems. Just follow the link and choose your Operating System as it is shown on Figure below:

Figure1: The download section of git-scm.com as of July 2022

NOTE: Git is bundled with two GUI tools: gitk to review history and git-gui for basic commands. Depending on the type of operating system you have, select the icon and download it.

  • Windows Systems

Firstly, let’s go through the processes to download, install and set up it in Windows Operating Systems (WOS):

After opening the link, you will arrive at the confirmation page shown in Figure 2. Then download the executable file that corresponds to your Windows operating system. Here, the download will automatically begin, and saved in Downloads folder in your PC.

Figure2: Git Download Screen for Windows OS

Therefore, get the executable file (.exe file) and double click it to begin the installation. The first screen is licence declaration stating the terms and conditions;make sure to read it till the end. And click next you will get the following Figure 3.

Figure3: Git Install

Here, you are prompted to select which components to install. I recommend to leave the default options on. Click next after you made your choices, and you will see the default editor selection, shown in below Figure 4. Git needs you to define a default editor because you need an editor to write out commit descriptions and comments.

Figure4: Custom Editor Selection

Note: In this tutorial we prefer to use the default editor which is VIM.

Once it is done choosing Vim editor, it is time to go to the next installation screen, which is the PATH environment adjustment shown on below Figure 5. The PATH environment is a variable that holds a list of directories where executable programs are located in their values.

Figure 5: Choosing to add Git to PATH or not

If you don’t want this and only want to use Git with its own isolated console Git Bash select the first option. So, to use Git Bash, you would have to launch it from the applications list.

Configuring Git Bash

Finally continue clicking NEXT (Just leave the default options unless you have a reason not to) to proceed to the last step. After that, just launch the installation and let it finish. And that’s it! Git is installed on your Windows system. But before using it, jump to the next section to properly set it up!

  • MAC Systems

Mac OS 10.9 Mavericks comes with git installed. If you have other version rather than this, just follow these instructions to install it.

  • Using XCode

To install Git version control system from Mac computer system, it is relevantly used by XCode to be found on this LINK. Then after, you can check if you have Git by running the command from your console:

$ git --version

It should give you the version of Git currently installed. For example here below is showing the version of installed git in my macbook.

Installed Git Version

If not yet installed, quickly navigate to the LINK, and the download will begin automatically, as shown on below Figure 6.

Figure6: Git Download Screen

Drag and Drop to Applications directory and execute the downloaded file and the installation will start gently; it’s pretty easy.

  • Using Homebrew

Homebrew is the missing package manager for Mac OS X. Just navigate into Homebrew to install it. After installing Homebrew, you can also get install git by running this command in terminal:

$ brew install git

And that’s it! For Mac OS X, installing Git is way easier

  • UNIX Systems

For the Linux system, Git is installed with the package manager. For Ubuntu and Debian distributions, apt is widely used to install Git.

$ sudo apt-get install git
or
$ sudo apt install git 

For mode details you can check HERE to have a list of commands on how to install Git for each popular system distribution.

Setting up Git

Before beginning to use Git, you need a little bit of setup first. Since Git is a distributed Version Control System, you will one day need to connect to other remote repositories. To avoid making any identity mistake, it is necessary to tell Git a bit about yourself.

To set up Git, open Git Bash (for Windows systems) or the default console window.In the command prompt, just tell Git your name and email address:

#For Windows and Lunix systems 
$ git config --global user.name "Murera Gisa"
$ git config --global user.email "elgisamur@gmail.com"
#For Mac, configure git to remember your password.
$ git config --global credential.helper osxkeychain

Notice the global argument; it means that the setup is for all future Git repositories, so you don’t have to set this up again in the future.

Conclusively, .bash_history records all the commands you previously typed in the console. You can check this document if you want to check back on a command you forgot.

And that’s it! You are now ready to use Git with all its glory.

Installation of R and RStudio

R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It has been developed by R core Team and designed by Ross Ihaka and Robert Gentleman. It first appeared in August 1993 and the most recent R version which is going to be installed in this tutorial is 4.2.1 released on 23rd June 2022. To install R base for Windows just navigate this LINK, then download and execute the .exe file to install it efficiently.

At the other hand, Rstudio is an Integrated Development Environment (IDE) for R. It is available in two formats:

  • RStudio Desktop is a regular desktop application.
  • RStudio Server runs on a remote server and allows accessing RStudio using a web browser.

Rstudio has been developed by RStudio, PBC and can be run in all computer operating systems like Ubuntu, Fedora, Red Hat Linux, openSUSE, macOS, Windows NT. It has been released initially on 28 February 2011 and also it is able to support other programming languages rather than R. For example C++, Python, Stan, SQL, HTML, JavaScript (D3 Script),… To install the latest Rstudio version that has been released on 27th April 2022, visit this WEBSITE scroll down a bit till you see the below screen represented on below Figure 7, then download and install it depends on your operating system.

Figure 7: All Installers of Rstudio IDE

NOTE: RStudio requires a 64-bit operating system. If you are on a 32 bit system, you can use an older version of RStudio.

Configuration of GitHub account.

The first steps in starting with GitHub are to create an account, choose a product that fits your needs best, verify your email, set up two-factor authentication, and view your profile.

There are several types of accounts on GitHub. Every person who uses GitHub has their own personal account, which can be part of multiple organizations and teams. Your personal account is your identity on GitHub.com and represents you as an individual.

  • Creating an account

To sign up for an account on GitHub.com, navigate to GitHub and follow the prompts. Then it is highly recommended to keep your GitHub account secure. To do so, you should use a strong and unique password. For detail how to create a strong GitHub password, just visit Creating a strong GitHub Password.

  • Choosing your GitHub Product

They exist two different GitHub products which are Free and Pro accounts. You can choose GitHub Free or GitHub Pro to get access to different features for your personal account. GitHub is flexible to be upgraded at any time if you are unsure at first which product you want.

  • Verifying an Email Address

To ensure you can use all the features in your GitHub plan, verify your email address after signing up for a new account. For more detail visit Verifying your Email Address.

  • Configuring Two factor Authentication

Two factor authentication(2FA), is an extra layer of security used when logging into websites or apps. We strongly recommend you to configure 2FA for the safety of your account. For more details just visit Two-factor authentication

  • GitHub Profile and Contribution Graph

Your GitHub profile tells people the story of your work through the repositories and gists you’ve pinned, the organization memberships you’ve chosen to publicize, the contributions you’ve made, and the projects you’ve created. For detail visit GitHub Profile and GitHub contributions on Profile.

WOW!! we are done to install all required applications and creating the GitHub account, it is a time to link up and synchronize Rstudio project and GitHub server to regularly track the change and collaborate with others.

Synchronizing Rstudio and GitHub Server

This section is mostly just a way for RUsers to remember how to get an existing R project on GitHub and to track any changes made and easy to share and collaborate with others. Here below are the steps to synchronize them:

Step1: Create a GitHub Repository

Just go to your created github account and click the + sign button to create a new repository and name it (Figure 8). I typically do not initialize with the .gitignore, readme.md, or license.md files, but add them myself manually after the project is up and running.

Figure8: Creating New GitHub Repo

Step2: Enable Git in Rstudio

  • Open new project in Rstudio and navigate to Tools -> Version Control -> Project Setup

  • Click SVN/Git tab and select git as the version control system. It will ask you to initialize a new git repo and restart Rstudio

  • After Rstudio reopens, confirm that there is a Git tab in the environment pane (By default, it is in the upper right of the IDE)

Step3: Synchronize with the GitHub Repository

Open a terminal either in mac or lunix or Git Bash or R terminal and write the following commands:

# Move to the R project directory

cd ~/Desktop/Projects/...

# Initiate the upstream tracking of the project on the GitHub repository

git remote add origin https://github.com/hansenjohnson/website.git

# Pull all files from the GitHub repository (typically just readme, license, gitignore)

git pull origin master/main

# Set up GitHub repo to track changes on local machine

git push -u origin master/main

Step4: Push local files to GitHub

Click on the Git tab in Rstudio, and then click Commit. This will open a window where you can stage files to be tracked (and synced on GitHub). Select all the files you would like to track, write a commit message, then click push. This will send all changes to the GitHub repository.

Step5: Up and Running

During this synchronization activity, the extremely important thing is to always remember to commit changes and push them to the GitHub repository. Don’t forget it please!

Finally, sometimes we need to remove the directory that we don’t want to keep it from a GitHub repository. That directory, hypothetically called /public for this example, can be easily removed using:

git rm -r --cached public

As we have said before, commit the changes, and push the changes. For more details visit THIS

WOW You’re all done!