By the end of this workshop, participants will be able to:
Understand where different files should be stored.
Perform basic Git workflows using GitHub Desktop.
Understand how the team is using Box.
Handle large files effectively using Box LFS.
Collaborate smoothly without merge conflicts or file chaos.
In the Wildfire Water Security project we primarily use the following tools to store and share files which both have their benefits and limitations:
Pros: Good for big files, real time Office document collaboration, easy to use
Cons: Clunky backups, limited version history, confusing shared folders
Git/GitHub
Pros: Full version history, easy backups, easy collaboration on non-binary files
Cons: Steep learning curve, can’t store large files
Both these tools are preferable over a organization specific network drive because they:
allow easy collaboration across organizations
automatically back up work and save version history
Generally, the following rules apply:
Box: Large files, collaborative Office files, files not related to a specific research project
GitHub: Project files, code, manuscript files (besides document itself)
If you’re unsure of where to put a file you can follow the flow chart below:
Where to NOT put files:
Data belongs to the project, not individuals. Store it where the team can find and it.
Where should the following files be stored?
Word document containing text for a manuscript
meeting notes from Node 1
exploratory figure associated with the Bedrock project
large dataset associated with the Sonde project
SOP for filtering water samples
figure for a manuscript
Keeping your files and folders organized makes it easier for everyone on the team to find what they need and avoids confusion down the road.
Git Repo First
Your top-level project folder (root directory) should be a GitHub
repository. This ensures you have version control and backups from the
start.
Limit the Number of Folders in the Root
Aim for fewer than 10 top-level folders — for example:
data/, code/, figures/,
methods/
Use Nested Folders for Subcategories
For example inside data/:
raw-data/processed-data/metadata/Avoid Spaces & Special Characters
Use - or _ instead of spaces. Avoid characters
like .:*?"<>|[]&$.
Descriptive Names Name folders so someone unfamiliar with your project can still guess what’s inside.
Organize by Date (if needed)
Force Folder Order with Numbers
Avoid Spaces & Special Characters
Use - or _ instead of spaces. Avoid characters
like .:*?"<>|[]&$.
Be Concise but Descriptive –
DON’T: use words like “the” or “and”
DO: use standard abbreviations and keywords
Self-Contained File Names – name files so they still make sense outside the folder:
Let Git (or Box) Handle Versions – don’t add dates, initials, or “final_v2” to file names.
No Duplicate Files – edit the original and commit often instead of making copies.
What is wrong with these file paths?
sonde & other instrument data/08-12-2025 data_JS.csv
- special characters
- spaces in path
- initials used for versioning
- non descriptive name
SWAT-modeling/final map (2).png
- spaces in path
- non-nested file structure
- multiple copies of a single file
- non descriptive name
Aqualog/methods/running EEM's analysis SOP_25_12_01.docx
- special characters
- spaces in path
- dates used for versioning
Now that we know where to put files and how to name them. Let’s start working with Git and GitHub.
In this activity we’ll follow a tutorial to make sure that everyone has all their tools set up (R, R Studio, Git, GitHub, and GitHub desktop).
You’ll need to do these steps on every computer you work on.
Open up the tutorial and follow the directions to do the following:
Install R and R Studio (if needed)
check installation worked by opening up R Studio, type the
following in the console, and hit enter:
print("hello world")
Install Git
check installation worked by switching to the terminal in
RStudio, type the following, and hit enter:
git --version
Create GitHub account (if needed)
Tell Git who you are
Connect RStudio to GitHub
gitcreds::gitcreds_get()Install GitHub Desktop
It’s time to start learning how to use Git and GitHub! In the following activities we’re going to learn how to:
Clone: Make a local copy of a GitHub repository you can work in
Pull: Update your local copy with the latest changes
Commit: Save your file changes locally in your Git history
Push: Send your changes to GitHub so others can see them
Branch: Create a parallel version of your repository
Merge: Combine changes across two branches
Stash: Temporarily store changes
In GitHub Desktop: File → Clone repository
(CTRL + SHIFT + O)
Choose the following repo to clone:
WWS-TEST-example-repo
Set the local path to
C:\Users\your username\Documents
Hit Clone
Find the repository you just cloned in file explorer. Check out the folders and files in the repository.
Find a partner.
Partner 1:
Create a new repository branch with the name:
your name-their name
Go to GitHub Desktop
Click the Current branch button
Click New Branch
Publish the branch so your partner can see the branch by hitting
the Publish branch button in the top right.
Partner 2:
Pull the repository using the
Pull origin button in the top right of GitHub desktop to
get the new branch.
Check out the repository by hitting the middle
Current branch button in GitHub Desktop and selecting the
one with your names.
This is where you’ll be working for the rest of the Github Activities. The changes you make to this branch will not affect anyone else’s files unless they are also on this branch.
R Projects make your data/code portable and keep file paths consistent across computers.
Open up code/example-script1.R
Edit line 3 with your username then run lines 4-5 to get the file path to the .csv file.
Notice this is a long file path to have to type, and is unique to the user so this isn’t ideal for sharing code across the project.
Let’s fix this by turning the repo into an R project by running the following code in the R console:
#install.package("usethis") #only need to run the first time
library(usethis)
#the location you of your project folder
repo <- file.path(fs::path_home(), "Documents/WWS-Node1-TEST-example-repo/")
#create an R project from the directory
usethis::create_project(repo)The new R project should open up in a new R window. In this
project go back to code/example-script1.R and change the
file path to data/example-csv.csv. Run the rest of the
lines. What happens?
HINT: In the project on the lower right of R Studio, you will see all your files in the repository, you can use this to open up
example-code1.R.
Because the repository is now an R project, it will automatically
treat WWS-TEST-example-repo as the working directory and
base all file paths from here. Meaning that the file paths will work for
anyone who clones the repository.
We changed some files, let’s go see the changes in GitHub Desktop. On the left it shows the files that have been added, changed, or removed in your repo. You should see four:
.gitignoreexample-code1.Rexample-csv.csvWWS-TEST-example-repo.RprojHINT: If you don’t see example-code1.R, make sure the script is saved.
What is the .gitignore file? It tells Git which
files/folders not to track.
Open up the file via file explorer, right now it only has one line:
There are many of other temporary files we may not want to include in our repository (ie. lock files for Office files). The easiest way to ignore all of these files is to use gitignore.io.
createWe’ve now made some changes to the files in this project, will others see them? Go check out the project on GitHub, do you see the changes you’ve made?
Go back to the project in GitHub Desktop, on the left side you can see the files that have been changed. Select all the files.
Add a commit message:
Commit your changes.
Go back to the repository on GitHub, do you see your changes?
Your changes are tracked locally by Git after you make a commit, but in order to share your commits with others, you need to push those changes to the remote (GitHub).
Partner 1:
In GitHub Desktop, push the Push origin button in
the upper right to send your changes to GitHub.
Go back to the repo on GitHub, do you see your changes now?
Partner 2:
Without pulling your partner’s changes, push your changes.
You’ll get a warning about needing to fetch changes, go ahead and
click Fetch, then Pull origin. This will
attempt to reconcile your changes with the changes you just pulled.
Uh Oh! We’ve got a merge conflict! This means that Git has two different versions of the same file and it doesn’t know which one to keep.
This is why it’s always best to pull changes right before you push changes, to avoid merge conflicts. We’ll practice that again later.
When you have a merge conflict you’ll have to manually tell Git which version of the file you want to keep. You can do that in two ways. We’ll explore using both ways:
Merge in GitHub Desktop:
.Rproj file.Merge Manually:
GitHub Desktop also gives you the option to open in the default
program, click that option on the .csv file.
Take a look at the file, take note of the conflict markers, which show the conflicting file lines:
<<<<<<< HEAD
your local version
=======
remote version
>>>>>>> branch-name
To resolve the conflict we can choose to delete the lines for one version, or keep both. Lets keep both, so we’ll just remove the conflict markers (above) and save the file.
Go back to GitHub Desktop, you should now see that your merge conflicts are all resolved, so now we can continue the merge:
Check out the history of your commits, you should now see your initial commit and a commit from merging the commits.
Push both to GitHub.
GitHub was originally designed for coding and works really well with
file that can be opened in text editor like .txt,
.csv, .R, .py. However, in our
project you may find yourself working with binary files
(i.e. .docx, .pdf, .xlsx).
Let’s see how Git treats these different kinds of files by making some changes:
HINT: We’re about to make changes to files in our repo, pull the current version to avoid merge conflicts!
Open up data/example-code2.R in RStudio.
In your merge conflict, we decided to keep both sets of data, lets use those as groups in our plot by adding the following code after line 5:
data$X <- c(rep("Group 1", 4), rep("Group 2", 4))Then edit line 7 (previously line 6) to look like this:
ggplot(data, aes(x=site, y=value, color=X)) + geom_point() + geom_line()Run the whole code to create a new figure 1. Make sure to save
your changes to data/example-code2.R.
Next, open methods/example-SOP.docx, on a new line
type: Written by <your name>. Then close and save the
file.
Open up GitHub Desktop to look at the how it displays the changes you made. On the left you should see three changed files, click on each and see what is displayed on the right.
example-code2.R: notice that it shows the line by
line changes that were made?
Green and + indicate a line was added
Red and - indicate a line was removed
figure1.png: notice that it shows what the previous
figure looked like and what it looks like now. At the top you can also
choose additional options to compare the two figures.
example-SOP.docx: we just get a message that the
binary file has changed. Git doesn’t natively know how to reconcile
changes in binary files, making them more challenging to work on
collaboratively through Git.
IMPORTANT: Merge conflicts are also more challenging with binary files. If you choose to keep the remote version during a conflict, you’ll lose your local work, so merge carefully!
Great, we’ve made new changes to our files, so let’s share them again.
Partner 2:
Commit your changes, practice writing a descriptive commit message
In GitHub Desktop, push the Push origin button in
the upper right to send your changes to GitHub.
Partner 1:
We’re going to use best practices and pull the changes from the branch before we make a commit so we have the most recent version.
Uh Oh! We get an error, that pulling those changes will overwrite some of our changes.
It gives us the option to stash our changes and continue with the pull. This means that it will temporarily save your current changes in a separate location so we can pull without a merge conflict. Click to stash your changes.
Great, now we have the most recent copy of the repo, but where did our changes go? Notice right above the commit message, there’s a bar that says stashed changes. Clicking this will show all those changes you made, and you can click restore to bring them back to your changes.
Alright, now the copies of our files are up to date with the repo, and we’ve still got our changes. Now we can commit and push to GitHub without merge conflicts!
When you do this, it’s a good idea to check out the new commit messages and file changes to make sure you don’t write over others work.
Git will combine your work on non-binary file, but when you push, your version of the binary files will be the one on the repo.
IMPORTANT: These are the main GitHub workflows you should use. Other Git commands exist, but using them without guidance can cause problems for the whole repository. If you’re unsure or run into an issue, stop and ask Katie, we’ll work through it together.
Now let’s explore the WWS Box folder and go over a few best practices when using Box:
Same file and folder naming rules apply
Do not make changes or add folders to the first four levels of the folder directories
Wildfire_Water Security
02_Nodes
<0*_Your Node>
01_Meetings → 06_Projects
Take advantage of Box’s integration with Office
Box has the following options for working on documents:
Microsoft (preferred)
Microsoft Online (preferred)
Google Suite
Let Box manage versions
Share through Box whenever possible
If collaborators can’t use Box [box requires an account to edit]
Open up the data-management-workshop
folder in Box and click on example-doc.docx
In the top right, click the open button and choose
Microsoft Word Online and type your name in the
document.
Close the document and refresh, notice how everyone’s edits now show up.
We can also check out the version history
Click the three dots at the top → version history
Click through the different versions to see the changes over time
Partner 1
Upload the example-SOP.docx we worked on before to
the data-management-workshop folder.
Share the Box link with your partner.
Partner 2
Together
Partner 2
Partner 1
Partner 2
Edit the file name to remove the initials.
Upload the file as a new version to Box.
Notice how Box stores the history of both online and manual updates.
GitHub caps file sizes at 100 MB, which makes it less ideal for storing certain kinds of files (e.g., geospatial datasets, images, or zipped archives). Most project files will be smaller than this, but we still want to be able to:
Keep version control on large files
Make them easily shareable
Ensure they are accessible through the local repository clones
To achieve this, we’ll use Box LFS. This is an R package (with plans to eventually make it a standalone executable) that works similarly to Git LFS. It links large files stored on Box with GitHub so that Git maintains versioning, while the actual files live outside GitHub’s storage limits.
How it works:
Large files are stored in Box, inside a
box-lfs folder within your project.
GitHub only stores .boxtracker pointer
files, which record the file’s location and version
history.
When working in a GitHub repository, Box LFS makes sure your local copy has the correct large files.
With Box LFS:
You will still manually download and upload files to Box.
Box LFS will:
Tell you when to download or upload files
Tell you where the files are located in Box
Place downloaded files in the correct location with the correct name
We’ll use the example reop WWS-TEST-Box-LFS. It’s similar to the WWS-TEST-example-repo, but now includes a large file tracked with Box LFS.
Open the repo and compare its structure to WWS-TEST-example-repo.
Notice the new box-lfs folder and the files inside.
Open a .boxtracker file to see how GitHub stores “pointers” to large files.
Check the README — it tells you Box LFS is in use.
Confirm that the large file itself is not in GitHub. It lives in Box instead.
First, install the Box LFS R package:
remotes::install_github("wildfire-water-security/WWS-box-lfs", subdir="blfs")Then clone the repo as usual in GitHub Desktop:
File → Clone Repository (CTRL + SHIFT + O)
Choose WWS-TEST-Box-LFS
Set the local path:
C:\Users\your username\Documents
Click Clone
Now use Box LFS to fetch the large files:
library(blfs)
dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
clone_repo_blfs(dir = dir, download=NULL)
NOTE: If your downloads go to a different folder than the default Downloads folder, update the
downloadargument with the correct path.
You’ll get a message with a Box link to download the large files.
Download the .zip from location given by the
function.
Once downloaded, hit Enter in R to confirm.
The function places files automatically in the right locations.
Check the data folder. Notice the new
large-file1.docx. Box LFS renamed and restored it into the
correct location.
Now let’s practice collaborative editing with Box LFS.
Partner 1
Create a new branch with your names (like before).
Open example-csv.csv, make a change, and
commit it. Don’t push yet.
Run Box LFS push_repo_blfs to check for updates:
dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
push_repo_blfs(dir = dir)
Now push your changes in GitHub Desktop.
Partner 2
Pull Partner 1’s changes in GitHub Desktop.
Run Box LFS pull_repo_blfs to check for updates:
dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
pull_repo_blfs(dir = dir, download = NULL)
Make your own changes:
Edit large-file1.docx.
Save and close.
Commit (but don’t push yet).
Run Box LFS push check:
dir <- file.path(fs::path_home(), "Documents", "WWS-TEST-Box-LFS")
push_repo_blfs(dir = dir)
You’ll get a message to upload the updated large file to Box.
Follow the link and upload your version of
large-file1.docx.
Note: Box LFS doesn’t support branches yet. Uploading overwrites the file in Box for all branches, but older versions are still stored as history.
Up till now we’ve been working with pre-made Git repositories. Let’s create a new one and use Box LFS to start tracking any large files in the repo. This can be done in two ways:
From the start, before you have any files
From an existing folder
Work with a partner to try both ways using this tutorial.
Method 1: Brand New Project
Create the repository on GitHub
WWS-TEST-New-yournamesClone the repo to your computer
Make into an R project
Copy the files from WWS-TEST-Box-LFS/data into the
repository
Method 2: Existing Project
Create a folder in C:/Users/your username/Documents
called WWS-TEST-Existing-yournames
Copy the files from WWS-TEST-Box-LFS/data into the
folder
Make into an R project
Initialize Git in the project
Add to GitHub Desktop
Publish the repo to GitHub
Now on both repositories initialize Box LFS by running the following code after updating the folder name:
folder <- "WWS-TEST-New-yournames" #replace with folder name
dir <- file.path(fs::path_home(), "Documents", folder)
new_repo_blfs(dir = dir, size = 10) #the 10 indicates the files greater than 10 MB will be tracked
After running:
GitHub stops tracking your large files
box-lfs folder is created with:
.boxtracker files
upload folder with real large files (hashed
names)
path-hash.csv
A message tells you to upload your files from
box-lfs/upload into the right Box project folder.
Paste the Box share link when prompted.
Pull first, push often → prevents merge
conflicts with .boxtracker files
Box LFS doesn’t store file diffs — each update is a full new file
If two people change the same large file without pulling first, you’ll get a merge conflict
Box LFS is a new project and may still have bugs. If you run into issues or have suggestions for new features please email Katie or create an issue on GitHub.