In the Wildfire Water Security project we primarily use the following tools to store and share files which both have their benefits and limitations:
Pros: Good for big files, real time Office document collaboration, easy to use
Cons: Clunky backups, limited version history, confusing shared folders
Git/GitHub
Pros: Full version history, easy backups, easy collaboration on non-binary files
Cons: Steep learning curve, can’t store large files
Both these tools are preferable over a organization specific network drive because they:
allow easy collaboration across organizations
automatically back up work and save version history
Generally, the following rules apply:
Box: Large files, collaborative Office files, files not related to a specific research project
GitHub: Project files, code, manuscript files (besides document itself)
If you’re unsure of where to put a file you can follow the flow chart below:
Where to NOT put files:
Data belongs to the project, not individuals. Store it where the team can find and it.
Where should the following files be stored?
Word document containing text for a manuscript
meeting notes from Node 1
exploratory figure associated with the Bedrock project
large dataset associated with the Sonde project
SOP for filtering water samples
figure for a manuscript
Keeping your files and folders organized makes it easier for everyone on the team to find what they need and avoids confusion down the road.
Git Repo First
Your top-level project folder (root directory) should be a GitHub
repository. This ensures you have version control and backups from the
start.
Limit the Number of Folders in the Root
Aim for fewer than 10 top-level folders — for example:
data/, code/, figures/,
methods/
Use Nested Folders for Subcategories
For example inside data/:
raw-data/processed-data/metadata/Avoid Spaces & Special Characters
Use - or _ instead of spaces. Avoid characters
like .:*?"<>|[]&$.
Descriptive Names Name folders so someone unfamiliar with your project can still guess what’s inside.
Organize by Date (if needed)
Force Folder Order with Numbers
Avoid Spaces & Special Characters
Use - or _ instead of spaces. Avoid characters
like .:*?"<>|[]&$.
Be Concise but Descriptive –
DON’T: use words like “the” or “and”
DO: use standard abbreviations and keywords
Self-Contained File Names – name files so they still make sense outside the folder:
Let Git (or Box) Handle Versions – don’t add dates, initials, or “final_v2” to file names.
Note: Do use dates and initials when emailing files or working outside Git and Box.
No Duplicate Files – edit the original and commit often instead of making copies.
What is wrong with these file paths?
sonde & other instrument data/08-12-2025 data_JS.csv
- special characters
- spaces in path
- initials used for versioning
- non descriptive name
SWAT-modeling/final map (2).png
- spaces in path
- non-nested file structure
- multiple copies of a single file
- non descriptive name
Aqualog/methods/running EEM's analysis SOP_25_12_01.docx
- special characters
- spaces in path
- dates used for versioning
Store .shp files and other ‘multi-file’ geospatial layers within their own folder
To share these files:
zip the file together before sharing
upload to the Data Sharing folder in Box
Maintain Metadata
Immediately after downloading a dataset, create a
readme.txt file which lists at a minimum:
When the file was downloaded
Where the file was downloaded from: the link and the owner in case the link breaks
A short description of the dataset
Directly load when possible
For smaller datasets, read directly into the code to preserve file provenance:
You can read many types of files (.csv,
.txt, .xslx) directly into R using the link
instead of a file path
If you can’t read in directly, consider downloading to a temporary directory and then loading in
download.file(url, destfile=tempdir())There are many R packages which allow direct access to useful data:
dataRetrieval: USGS and Water Quality Portal data
FedData: Land cover database, SSURGO soil data, Daymet meteorological data
nhdplusTools: Stream and HUC layers
elevatr: DEM layers
climateR: Many different kind of gridded climate data
data
folder of the test repository.
readme.txt to accompany the data
set.code/load-data-nicely.R from the
test respository
Avoid storing very large files on Box unless needed, instead:
Create a subset of the data to work with
Include information or code detailing:
where to get data
how to subset data
code/subsetting-vlarge-data.Rdata folder in the test
repositoryThe goals for code we write are that it be:
Replicable: Anyone should be able to pick up the code and have it run
Use R Projects which automatically sets the working directory to the project folder so file paths work for anyone who opens the project
Use relative file paths which specify the location of the file relative to the project directory
Understandable: Use comments to describe what the code is doing so both later you and others know what you’re doing
Organized: Create functions and loops to avoid repeating the same code over and over again
Flexible: Avoid ‘hard-coding’ or manually specifying values as these can be easy to overlook if your data changes
messy-code.R script from
WWS-TEST-example-repo/codeTo avoid cluttering project repositories, create a new branch in the project repository
Name the branch: lastname-manuscript-year
All work should be stored on this branch
Keep files organized for future data package:
Data
input
output
Figures
exploratorymanuscriptCodeIf you’re co-writing, store the manuscript text on Box
02_Nodes/your node/Publications_Presentations/ManuscriptsStandard methods (SOPs and QA/QC scripts) are valuable files, allowing consistency and knowledge transfer across the project.
Don’t hide them within project folders.
Store in the standard-methods GitHub repository
Follow directions in the README for where to store files.
Keeping detailed sample records is critical to ensure high quality data.
Note method deviations
can help explain outliers during analysis
important in publishing high quality data
Keep track of processing steps and storage locations so samples aren’t lost
To do this, create a copy of the sample-tracking.xlsx
spreadsheet for your project.
Feel free to add columns as needed to keep detailed records
Store in Box within project folder so multiple people can edit