What is version control?

What is Git? What is Github?

  • Git is a version control system (local, on your computer)
  • Github is a hosting service for Git repositories (remote, on the internet)
  • “GitHub is like DropBox or Google Drive, but more structured, powerful, and programmatic”

Why does this matter for research?

  • Git was built for collaborative software development; it has been re-purposed for data science and various kinds of empirical research including in economics
  • Git is very powerful, maybe “too powerful” for our purposes
  • Our aim is not total mastery of Git, but how we can use it to complement our needs

Setting up Git and Github

Set up a Github account

Introduce yourself to Git

Run the code below in your shell.

  • user.name: use some version of your real name so we know who is making changes
  • user.email: you MUST use the email associated with your Github account
git config --global user.name 'weiyangtham'
git config --global user.email 'weiyang.tham@gmail.com'
git config --global --list

Git client

  • GitKraken is a good option for us as it works across all Windows, Mac, and Linux
  • Minimize having to deal with command line
  • Makes version control a lot more intuitive by visualizing it
  • You can have a GUI and also do stuff from the command line if you have to
  • If you use R, RStudio has a simple Git GUI
  • DO NOT use the free Github client
  • More Git clients here

Your first Github commit

Create a repository on Github

  • Go to your Github account: https://github.com/yourname
  • Click on “Repositories”
  • Click on the green “New” button
  • Check “Initialize this repository with a README”
  • Click “Create Repository”
  • Congratulations! You have made your first Github repository! 🎉 `🎊

Clone your Github repo

  • Click “Clone or download” and copy the URL that appears
  • Open Gitkraken
  • Click the folder 📂 icon and select “Clone”
  • Select the directory where you want to store your project
    • For example, I have a “Projects” directory where I keep all my projects
    • Note, you DO NOT have to manually create a folder for the workshop_example repo
  • Go to the directory where you chose to clone your project and check that you now have a new folder called workshop_example

Your first commit

  • In the directory where your project resides, open the README.md file with a plain text editor (like Notepad for Windows or TextEdit for Mac)
  • Delete the current text, then type in the text from the image below and then save the file
  • In GitKraken, go to “Unstaged Files” and click on README.md. What do you see?
  • Click on the “Stage File” button to the right of README.md, then click in the Commit Message box below where it says “Summary”. Type “First commit from GitKraken”.
  • Now you’re ready to commit your first file!
    • Click on the giant green button in the bottom right corner
    • Notice how your commit has been added to the master branch (middle of the GitKraken app)

Github as a project website

  • Go to the Github page for the repository you just created; notice how the README file has been rendered nicely
  • You can use Github Pages to create a project website. Example
  • If you need to share data, Github renders .tsv and .csv files nicely (smaller than 512Kb)
  • Browsing code and files: convenient and pleasant way to browse through your files or even a manuscript
  • Github issues: use as a to-do list. Comes with Markdown formatting
    • Go to your repo and try filing an issue

Workflow

Git/Github is like working with a Google Doc style - only one copy of the document exists and lives in the cloud - but with the ability to work offline and seperately, and then integrate changes made locally on different computers.

  1. In this system, Github is the clearinghouse and holds the master copy of the project. Each collaborator has their own complete copy of the repo and its history.
  2. You pull regularly to receive and integrate changes from your collaborators. Likewise, you push regularly to Github so that it maintains its status as the master copy
  3. A simple and helpful rule: If you have a project that exists on multiple computers (even if you are the only collaborator!), pull every time you open it up to work on it
  4. Commit small changes and push to Github frequently

What happens if two people try to push changes to Github?

  • Suppose your collaborator has made a change and pushed it to Github, but you haven’t pulled the latest version to your computer. Now if you make and commit a change locally and try to push that to Github, you will fail and Github will prompt you to pull first.
  • Usually Github can integrate your changes with your collaborator’s changes smoothly - if not, you will have to resolve the merge conflict
  • Better to avoid merge conflicts by committing small changes and syncing regularly with Github

Which files to commit?

Types of files:

  • Source files: Code (Stata .do files, R scripts), Markdown, LaTex files
  • Configuration files: These files modify the behavior of a tool, for example .gitignore identifies files Git should not track and some-project.Rproj records RStudio project settings.
  • Derived products (e.g. reports, images, intermediate data products): pdf, .docx, .png, .csv, .tsv
  • Intermediates: e.g. .aux or .log LaTex files - Generally don’t commit these files and might add them to .gitignore

Most importantly, is it useful to someone? If it is, track it!

Next steps

  • Start out with a solo project and get used to the practice of committing your work regularly and pushing it to Github
  • Once you get comfortable with the idea of pushing and pulling, you can try more advanced commands like git branch
  • Don’t be afraid to keep things simple - for what we do, knowing how to push, pull, and work with Github will get you 90% of the value