Learning Goals

By the end of this workshop, participants will be able to:

  1. Understand where different files should be stored.

  2. Perform basic Git workflows using GitHub Desktop.

  3. Understand how the team is using Box.

  4. Handle large files effectively using Box LFS.

  5. Collaborate smoothly without merge conflicts or file chaos.


Part 1: Data Storage Workflows

In the Wildfire Water Security project we primarily use the following tools to store and share files which both have their benefits and limitations:

Box

Git/GitHub

Both these tools are preferable over a organization specific network drive because they:

  • allow easy collaboration across organizations

  • automatically back up work and save version history

I have a file where does it go?

Generally, the following rules apply:

  • Box: Large files, collaborative Office files, files not related to a specific research project

  • GitHub: Project files, code, manuscript files (besides document itself)

If you’re unsure of where to put a file you can follow the flow chart below:

Where to NOT put files:

  • Network drive: even keeping shared GitHub repositories on the network drive can cause issues. Keep your own copies of files on a local drive (C)
  • OneDrive
  • Sharepoint
  • Google Drive
  • Dropbox

Data belongs to the project, not individuals. Store it where the team can find and it.

Activity:

Where should the following files be stored?

  1. Word document containing text for a manuscript

    • Box in 02_Nodes/your node/Publications_Presentations/Manuscripts/folder for your paper
  2. meeting notes from Node 1

    • Box in 02_Nodes/01_Empirical/01_Meetings
  3. exploratory figure associated with the Bedrock project

    • GitHub repo: WWS-Node1-BDRK-bedrock-microbes/figures/exploratory
  4. large dataset associated with the Sonde project

    • GitHub repo: WWS-Node1-SONDE-postfire-sonde-network/data
  5. SOP for filtering water samples

    • GitHub repo: WWS-standard-methods /filtering
  6. figure for a manuscript

    • GitHub repo for project, on branch for associated manuscript in figure folder

Part 2: Folder and File Naming Conventions

Keeping your files and folders organized makes it easier for everyone on the team to find what they need and avoids confusion down the road.

Folder Organization

  1. Git Repo First
    Your top-level project folder (root directory) should be a GitHub repository. This ensures you have version control and backups from the start.

  2. Limit the Number of Folders in the Root
    Aim for fewer than 10 top-level folders — for example: data/, code/, figures/, methods/

  3. Use Nested Folders for Subcategories
    For example inside data/:

    • raw-data/
    • processed-data/
    • metadata/
  4. Avoid Spaces & Special Characters
    Use - or _ instead of spaces. Avoid characters like .:*?"<>|[]&$.

  5. Descriptive Names Name folders so someone unfamiliar with your project can still guess what’s inside.

  6. Organize by Date (if needed)

  7. Force Folder Order with Numbers

File Organization

  1. Avoid Spaces & Special Characters
    Use - or _ instead of spaces. Avoid characters like .:*?"<>|[]&$.

  2. Be Concise but Descriptive

    • DON’T: use words like “the” or “and”

    • DO: use standard abbreviations and keywords

  3. Self-Contained File Names – name files so they still make sense outside the folder:

  4. Let Git (or Box) Handle Versions – don’t add dates, initials, or “final_v2” to file names.

  5. No Duplicate Files – edit the original and commit often instead of making copies.

Activity:

What is wrong with these file paths?

  1. sonde & other instrument data/08-12-2025 data_JS.csv

    - special characters

    - spaces in path

    - initials used for versioning

    - non descriptive name

  2. SWAT-modeling/final map (2).png

    - spaces in path

    - non-nested file structure

    - multiple copies of a single file

    - non descriptive name

  3. Aqualog/methods/running EEM's analysis SOP_25_12_01.docx

    - special characters

    - spaces in path

    - dates used for versioning


Part 3: Setting up Git and GitHub

Now that we know where to put files and how to name them. Let’s start working with Git and GitHub.

In this activity we’ll follow a tutorial to make sure that everyone has all their tools set up (R, R Studio, Git, GitHub, and GitHub desktop).

You’ll need to do these steps on every computer you work on.

Activity:

Open up the tutorial and follow the directions to do the following:

  1. Install R and R Studio (if needed)

    • check installation worked by opening up R Studio, type the following in the console, and hit enter:

      print("hello world")

  2. Install Git

    • check installation worked by switching to the terminal in RStudio, type the following, and hit enter:

      git --version

  3. Create GitHub account (if needed)

    • ensure you have access to the WWS GitHub
    • if you don’t have access, send your email to
    • check your email to accept the invite to the organization
  4. Tell Git who you are

  5. Connect RStudio to GitHub

    • once you’ve got your token, you can run the following lines in R to check that your credentials are stored correctly:
    gitcreds::gitcreds_get()
  6. Install GitHub Desktop


Part 4: Using Git and GitHub

It’s time to start learning how to use Git and GitHub! In the following activities we’re going to learn how to:

Activity: Clone a Repository

  1. In GitHub Desktop: File → Clone repository (CTRL + SHIFT + O)

  2. Choose the following repo to clone: WWS-TEST-example-repo

  3. Set the local path to C:\Users\your username\Documents

  4. Hit Clone

  5. Find the repository you just cloned in file explorer. Check out the folders and files in the repository.

Activity: Create a Branch

Find a partner.

Partner 1:

  • Create a new repository branch with the name: your name-their name

    • Go to GitHub Desktop

    • Click the Current branch button

    • Click New Branch

  • Publish the branch so your partner can see the branch by hitting the Publish branch button in the top right.

Partner 2:

  • Pull the repository using the Pull origin button in the top right of GitHub desktop to get the new branch.

  • Check out the repository by hitting the middle Current branch button in GitHub Desktop and selecting the one with your names.

This is where you’ll be working for the rest of the Github Activities. The changes you make to this branch will not affect anyone else’s files unless they are also on this branch.

Activity: Turn the Repository in an R Project

R Projects make your data/code portable and keep file paths consistent across computers.

  1. Open up code/example-script1.R

    • Edit line 3 with your username then run lines 4-5 to get the file path to the .csv file.

    • Notice this is a long file path to have to type, and is unique to the user so this isn’t ideal for sharing code across the project.

  2. Let’s fix this by turning the repo into an R project by running the following code in the R console:

     #install.package("usethis") #only need to run the first time 
     library(usethis)
    
     #the location you of your project folder 
        repo <- file.path(fs::path_home(), "Documents/WWS-Node1-TEST-example-repo/")
    
     #create an R project from the directory
        usethis::create_project(repo)
  3. The new R project should open up in a new R window. In this project go back to code/example-script1.R and change the file path to data/example-csv.csv. Run the rest of the lines. What happens?

HINT: In the project on the lower right of R Studio, you will see all your files in the repository, you can use this to open up example-code1.R.

Because the repository is now an R project, it will automatically treat WWS-TEST-example-repo as the working directory and base all file paths from here. Meaning that the file paths will work for anyone who clones the repository.

Activity: Add a .gitignore File

We changed some files, let’s go see the changes in GitHub Desktop. On the left it shows the files that have been added, changed, or removed in your repo. You should see four:

  • .gitignore
  • example-code1.R
  • example-csv.csv
  • WWS-TEST-example-repo.Rproj

HINT: If you don’t see example-code1.R, make sure the script is saved.

What is the .gitignore file? It tells Git which files/folders not to track.

  1. Open up the file via file explorer, right now it only has one line:

    • .Rproj.user (used to store project-specific temporary files)
  2. There are many of other temporary files we may not want to include in our repository (ie. lock files for Office files). The easiest way to ignore all of these files is to use gitignore.io.

    • Open the link, enter: MicrosoftOffice, R, Windows, and Mac OS and hit create
    • Copy all the text and paste into the .gitignore file. Save the file.

Activity: Committing Changes

  1. We’ve now made some changes to the files in this project, will others see them? Go check out the project on GitHub, do you see the changes you’ve made?

    • No, we need commit and push those changes first!
  2. Go back to the project in GitHub Desktop, on the left side you can see the files that have been changed. Select all the files.

  3. Add a commit message:

    • Summary: GitHub Workshop
    • Description:
      • creating R project
      • changing file path to relative to be be more collaborative
      • ignoring nuisance files from R, Office, Windows, and Mac
  4. Commit your changes.

  5. Go back to the repository on GitHub, do you see your changes?

Activity: Pushing Changes

Your changes are tracked locally by Git after you make a commit, but in order to share your commits with others, you need to push those changes to the remote (GitHub).

Partner 1:

  1. In GitHub Desktop, push the Push origin button in the upper right to send your changes to GitHub.

  2. Go back to the repo on GitHub, do you see your changes now?

Partner 2:

  1. Without pulling your partner’s changes, push your changes.

  2. You’ll get a warning about needing to fetch changes, go ahead and click Fetch, then Pull origin. This will attempt to reconcile your changes with the changes you just pulled.

Uh Oh! We’ve got a merge conflict! This means that Git has two different versions of the same file and it doesn’t know which one to keep.

This is why it’s always best to pull changes right before you push changes, to avoid merge conflicts. We’ll practice that again later.

Activity: Merge Conflicts

When you have a merge conflict you’ll have to manually tell Git which version of the file you want to keep. You can do that in two ways. We’ll explore using both ways:

  1. Merge in GitHub Desktop:

    • GitHub Desktop will let you keep a specific version.
      • Use the modified files from <branch>: keep the version on the GitHub branch
      • Use the modified file: keep the version in your commit.
    • Choose the local version for the .Rproj file.
  2. Merge Manually:

    • GitHub Desktop also gives you the option to open in the default program, click that option on the .csv file.

    • Take a look at the file, take note of the conflict markers, which show the conflicting file lines:

      <<<<<<< HEAD
      your local version
      =======
      remote version
      >>>>>>> branch-name

    • To resolve the conflict we can choose to delete the lines for one version, or keep both. Lets keep both, so we’ll just remove the conflict markers (above) and save the file.

  3. Go back to GitHub Desktop, you should now see that your merge conflicts are all resolved, so now we can continue the merge:

  4. Check out the history of your commits, you should now see your initial commit and a commit from merging the commits.

  5. Push both to GitHub.

Activity: Working with Binary Files

GitHub was originally designed for coding and works really well with file that can be opened in text editor like .txt, .csv, .R, .py. However, in our project you may find yourself working with binary files (i.e. .docx, .pdf, .xlsx).

Let’s see how Git treats these different kinds of files by making some changes:

HINT: We’re about to make changes to files in our repo, pull the current version to avoid merge conflicts!

  1. Open up data/example-code2.R in RStudio.

  2. In your merge conflict, we decided to keep both sets of data, lets use those as groups in our plot by adding the following code after line 5:

      data$X <- c(rep("Group 1", 4), rep("Group 2", 4))
  3. Then edit line 7 (previously line 6) to look like this:

      ggplot(data, aes(x=site, y=value, color=X)) + geom_point() + geom_line()
  4. Run the whole code to create a new figure 1. Make sure to save your changes to data/example-code2.R.

  5. Next, open methods/example-SOP.docx, on a new line type: Written by <your name>. Then close and save the file.

  6. Open up GitHub Desktop to look at the how it displays the changes you made. On the left you should see three changed files, click on each and see what is displayed on the right.

    • example-code2.R: notice that it shows the line by line changes that were made?

      • Green and + indicate a line was added

      • Red and - indicate a line was removed

    • figure1.png: notice that it shows what the previous figure looked like and what it looks like now. At the top you can also choose additional options to compare the two figures.

    • example-SOP.docx: we just get a message that the binary file has changed. Git doesn’t natively know how to reconcile changes in binary files, making them more challenging to work on collaboratively through Git.

      IMPORTANT: Merge conflicts are also more challenging with binary files. If you choose to keep the remote version during a conflict, you’ll lose your local work, so merge carefully!

Activity: Stashing Changes

Great, we’ve made new changes to our files, so let’s share them again.

Partner 2:

  1. Commit your changes, practice writing a descriptive commit message

  2. In GitHub Desktop, push the Push origin button in the upper right to send your changes to GitHub.

Partner 1:

  1. We’re going to use best practices and pull the changes from the branch before we make a commit so we have the most recent version.

  2. Uh Oh! We get an error, that pulling those changes will overwrite some of our changes.

  3. It gives us the option to stash our changes and continue with the pull. This means that it will temporarily save your current changes in a separate location so we can pull without a merge conflict. Click to stash your changes.

  4. Great, now we have the most recent copy of the repo, but where did our changes go? Notice right above the commit message, there’s a bar that says stashed changes. Clicking this will show all those changes you made, and you can click restore to bring them back to your changes.

  5. Alright, now the copies of our files are up to date with the repo, and we’ve still got our changes. Now we can commit and push to GitHub without merge conflicts!

    • When you do this, it’s a good idea to check out the new commit messages and file changes to make sure you don’t write over others work.

    • Git will combine your work on non-binary file, but when you push, your version of the binary files will be the one on the repo.

IMPORTANT: These are the main GitHub workflows you should use. Other Git commands exist, but using them without guidance can cause problems for the whole repository. If you’re unsure or run into an issue, stop and ask Katie, we’ll work through it together.


Part 5: Using Box

Now let’s explore the WWS Box folder and go over a few best practices when using Box:

Working Online

Box has the following options for working on documents:

  1. Microsoft (preferred)

    • Requires downloading Box Tools
    • Work in normal Office, changes are saved directly to Box
  2. Microsoft Online (preferred)

    • Multiple people can work on document at the same time
  3. Google Suite

    • Multiple people can work on document at the same time
    • Need to manually unlock file which finished

Best Practices for Sharing Files and Handling Feedback

  1. Let Box manage versions

    • Don’t create multiple copies with dates or initials in the file name. Stick with a single file and let Box track versions automatically.
  2. Share through Box whenever possible

    • If you’re sharing files within the project, share the file using a Box link and ask collaborators to edit directly in Box. This keeps all feedback in one place and preserves version history.
  3. If collaborators can’t use Box [box requires an account to edit]

    • Send them the document by email. When you get it back, upload their edits as a new version in Box
    • Rename the file if needed so Box recognizes it as the same file

Activity: Working Online

  1. Open up the data-management-workshop folder in Box and click on example-doc.docx

  2. In the top right, click the open button and choose Microsoft Word Online and type your name in the document.

  3. Close the document and refresh, notice how everyone’s edits now show up.

  4. We can also check out the version history

    • Click the three dots at the top → version history

    • Click through the different versions to see the changes over time

Activity: Handling Feedback and Edits Online

Partner 1

  1. Upload the example-SOP.docx we worked on before to the data-management-workshop folder.

  2. Share the Box link with your partner.

Partner 2

  1. Edit the shared file directly in Box.

Together

  1. Refresh the file and check version history to see how Box has tracked the change.

Activity: Handling Feedback and Edits Offline

Partner 2

  1. Download the file and send to your partner.

Partner 1

  1. Open the file and make an edit to the file, save with your initials and send back.

Partner 2

  1. Edit the file name to remove the initials.

  2. Upload the file as a new version to Box.

  3. Notice how Box stores the history of both online and manual updates.


Part 6: Handling Large Files (Box LFS)

GitHub caps file sizes at 100 MB, which makes it less ideal for storing certain kinds of files (e.g., geospatial datasets, images, or zipped archives). Most project files will be smaller than this, but we still want to be able to:

To achieve this, we’ll use Box LFS. This is an R package (with plans to eventually make it a standalone executable) that works similarly to Git LFS. It links large files stored on Box with GitHub so that Git maintains versioning, while the actual files live outside GitHub’s storage limits.

How it works:

With Box LFS:

Activity: Explore the structure of a Box LFS tracked project

We’ll use the example reop WWS-TEST-Box-LFS. It’s similar to the WWS-TEST-example-repo, but now includes a large file tracked with Box LFS.

  1. Open the repo and compare its structure to WWS-TEST-example-repo.

    • Notice the new box-lfs folder and the files inside.

    • Open a .boxtracker file to see how GitHub stores “pointers” to large files.

    • Check the README — it tells you Box LFS is in use.

    • Confirm that the large file itself is not in GitHub. It lives in Box instead.

Activity: Clone a Box LFS tracked project

  1. First, install the Box LFS R package:

    remotes::install_github("wildfire-water-security/WWS-box-lfs", subdir="blfs")
  2. Then clone the repo as usual in GitHub Desktop:

    • File → Clone Repository (CTRL + SHIFT + O)

    • Choose WWS-TEST-Box-LFS

    • Set the local path: C:\Users\your username\Documents

    • Click Clone

  3. Now use Box LFS to fetch the large files:

    library(blfs)
    dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
    clone_repo_blfs(dir = dir, download=NULL)

    NOTE: If your downloads go to a different folder than the default Downloads folder, update the download argument with the correct path.

  4. You’ll get a message with a Box link to download the large files.

    • Download the .zip from location given by the function.

    • Once downloaded, hit Enter in R to confirm.

    • The function places files automatically in the right locations.

  5. Check the data folder. Notice the new large-file1.docx. Box LFS renamed and restored it into the correct location.

Activity: Push and Pull Changes in a Box LFS tracked project

Now let’s practice collaborative editing with Box LFS.

Partner 1

  1. Create a new branch with your names (like before).

  2. Open example-csv.csv, make a change, and commit it. Don’t push yet.

  3. Run Box LFS push_repo_blfs to check for updates:

    dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
    push_repo_blfs(dir = dir)
    • If tracked files changed, you’ll see messages.
    • If not, it runs silently.
  4. Now push your changes in GitHub Desktop.

Partner 2

  1. Pull Partner 1’s changes in GitHub Desktop.

  2. Run Box LFS pull_repo_blfs to check for updates:

        dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
        pull_repo_blfs(dir = dir, download = NULL)
    • Runs silently unless large files need updating.
  3. Make your own changes:

    • Edit large-file1.docx.

    • Save and close.

    • Commit (but don’t push yet).

  4. Run Box LFS push check:

    dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
    push_repo_blfs(dir = dir)
    • You’ll get a message to upload the updated large file to Box.

    • Follow the link and upload your version of large-file1.docx.

Note: Box LFS doesn’t support branches yet. Uploading overwrites the file in Box for all branches, but older versions are still stored as history.

Activity: Initialize Box LFS in a repo

Up till now we’ve been working with pre-made Git repositories. Let’s create a new one and use Box LFS to start tracking any large files in the repo. This can be done in two ways:

  1. From the start, before you have any files

  2. From an existing folder

Work with a partner to try both ways using this tutorial.

Method 1: Brand New Project

  1. Create the repository on GitHub

    • Call it WWS-TEST-New-yournames
  2. Clone the repo to your computer

  3. Make into an R project

  4. Copy the files from WWS-TEST-Box-LFS/data into the repository

Method 2: Existing Project

  1. Create a folder in C:/Users/your username/Documents called WWS-TEST-Existing-yournames

  2. Copy the files from WWS-TEST-Box-LFS/data into the folder

  3. Make into an R project

  4. Initialize Git in the project

  5. Add to GitHub Desktop

  6. Publish the repo to GitHub

Now on both repositories initialize Box LFS by running the following code after updating the folder name:

folder <- "WWS-TEST-New-yournames" #replace with folder name
dir <- file.path(fs::path_home(), "Documents", folder)
new_repo_blfs(dir = dir, size = 10) #the 10 indicates the files greater than 10 MB will be tracked

After running:

  • GitHub stops tracking your large files

  • box-lfs folder is created with:

    • .boxtracker files

    • upload folder with real large files (hashed names)

    • path-hash.csv

  • A message tells you to upload your files from box-lfs/upload into the right Box project folder.

  • Paste the Box share link when prompted.


Best practices for Box LFS:

  • Pull first, push often → prevents merge conflicts with .boxtracker files

  • Box LFS doesn’t store file diffs — each update is a full new file

  • If two people change the same large file without pulling first, you’ll get a merge conflict

Box LFS is a new project and may still have bugs. If you run into issues or have suggestions for new features please email Katie or create an issue on GitHub.