Git intro day 1, or how I learned to stop worrying and love the command line
Nerissa Nance
9/22/2021
Hi, My name is:
Nerissa Nance
I’m a:
- PhD Student in Epidemiology at University of California, Berkeley
- Senior Analyst at Kaiser Permanente Northern California Division of Research
- Note: all opinions/errors/faults in judgement are my own.
Training goals:
This two-part training hosted at the University of York is part of a series of transparency and reproducibility workshops. By the end of the workshop, you will:
- Be familiar with what Git is and be able to define key terms
- Be able to identify pieces of a Git workflow
- Be able to talk to you computer! (i.e. use a few basic commands on the command line)
What is “version control”?
- my_code.R
- my_code_v2.R
- my_code_v3.R
- my_code_v3_23Jul2021.R
- my_code_v3_25Jul2021.R
- my_code_v4.R
- my_code_v4_final.R
- my_code_v4_final_final.R
- my_code_v4_final_final_update_8Aug2021.R
What is “version control”, really, though?
“Version control” generally refers to a system that systematically records changes to a files so that the change history to the file is known and earlier versions of it can be recalled/reinstated.
So what?
You may be thinking, “that’s all fine, but I’m an economist, do I really need to use something like a version control software?”
Let’s consider some motivating examples…
Reviewer #3
You have submitted what you think is a publication-ready paper to a journal; reviewers 1 and 2 wholeheartedly agree. Reviewer #3, however, gets a little snarky about your model and insists that your model needs to take into account a nonlinear response, and ask you to run a sensitivity analysis with a squared term.
This wouldn’t be a problem, except for the fact that you can’t find the version of the code you submitted to the journal! Was is mycode_v4_update.R or mycode_v4_Aug2020.R or mycode_v4_update_Aug2020.R?
Collaborator’s code
Imagine that you work in a big group that have collected a significant amount of data about quantities of drugs used, their prices, overhead costs, recurrent supplies and utilities used in health facilities of five African countries. Your team is in charge of calculating the average cost of cancer treatment. The leader of your team asks you to check whether transportation costs were included in the cost estimation, as she would like to estimate costs with and without transportation costs, but that was done by someone who has left the group and you cannot find the last version of the estimations!
What is “Git”?
What do you already know about Git, if anything?
What is “Git”?
- Git is an open-source “distributed version control system”; a software that allows you to control the versions of a (text) file and share these versions with different users
- It was created in 2005 by Linus Torvalds, the creator of Linux, after a fallout with their previous version control system contractor, BitKeeper
- Fast forward a decade, GitHub became its own company and sold to Microsoft in 2018 for $7.5B of stock
- It is one of the most popular version control systems to date, used among software engineers, data scientists, and yes, even academics.
Ok but…. what does that mean?
- Essentially, your files and the “track changes” to your files will be saved both locally in a git repository .
- Repositories are collections of files (and their histories), generally organized by project.
- Git repositories exist locally (on your computer) and remotely (on the Git website).
Git workflow
How does this work in practice?
Git workflow
![]()
What is the “command line”?
The command line is a small but mighty command prompt that allows you to interact directly with your computer and its applications, without a user interface. It can do basically all the same things you do on a daily basis, but without the pretty graphical interfaces you’re used to dealing with.
![]()
Git Bash vs. Git Desktop
You can use Git on your personal computer through either the GUI Git Desktop, or directly through the terminal/Git Bash.
Preferences are personal, but I teach Git on the command line because:
- If you are an analyst or work with data regularly then learning new code should excite you! (not scare you);
- The commands are simple and in my opinion Git Desktop can over complicate things; and
- Using the command line will make you feel like a cool kid.
The command line on different operating systems
- In Macs: we use the program Terminal which uses Unix commands (there will NOT be a test on this later, so don’t worry)
- In PCs: they use windows command prompt, which is not Unix-based, so to use Git we additionally install Git Bash (more on this in a minute).
Yes You Can!
Operating on the command line will be empowering!
![]()
Vocabulary review:
- Version Control System
- Repository
- Remote vs. local repository
- Staging area
- Command line
Important commands for operating on the command line
General navigation:
- ls - see what’s in your current directory
- cd - navigate to a different directory
- mkdir - make a new directory
- sudo - run command as another user (see below)
Cheatsheet of basic unix commands here
![ref: www.xkcd.com]()
Operating Git on the command line
![]()
Operating Git on the command line
- git add
- git commit -m“add message here”
- git push
- git pull
- git checkout
See Git cheatsheet here
Let’s do this!
Without further ado, it’s time for us to get up close and personal with our own command lines. Please follow along as I demo.