Introduction
- What not to do: Ceci’s MSc ~~R Project~~ chaos
R Projects
- How to create an R Project
- File management within an R Projects folder
Working with R scripts: A beginner’s guide

Introduction

Today, you’ll learn how to get started with R Projects - a powerful tool for keeping your work organized and reproducible. As your projects grow, with multiple inputs, scripts, and outputs, managing files can become overwhelming.

R Projects create a structured, self-contained working directory with consistent file paths, environments, and settings. Starting with R Projects from the outset helps you stay organized and work efficiently as your research progresses.

What not to do: Ceci’s MSc R Project chaos

👎 What’s wrong with this?

All analyses are contained within a single 2,533-line R script 🫠
- This is overwhelming and makes it difficult to find relevant code
- Terrible for reproducibility and accountability
Poor file naming conventions
- R scripts:
  - sandfish.stats.R
  - sandfish.stats.2.R
  - sandfish.thesis.final.R
- Excel spreadsheets:
  - DATA_MASTER.xlsx
  - DATA_MASTER_OLD (before fixing FL/TL issue).xlsx
- All types of files (scripts, figures, and input data) are located in a single folder (chaos!)

🤷🏻‍♀️ Any positives?

In R script:
- Use of indented section headings and subheadings makes navigation possible via the outline tab (though it’s still challenging with a script this large)
- Liberal use of comments throughout script

R Projects

Put simply, an R Project is just a folder (directory) on your machine that organizes all your files for a specific project.

How to create an R Project

In R studio…

Go to File > New Project…
Select New Directory > New Project
- Set a name for your directory under “Directory name:”
  - Ex: CB_project
- Under “Create project as subdirectory of:”, click “Browse”, and choose a location on your machine to store the project, along with all your input data, scripts, and outputs.
Click “Create project”
- I always check the “Open in new session” box in case I have another project open.

This will open a new R session and set your working directory to the folder that you just created. You can use the getwd() function to double check that it’s set correctly.

getwd()

## [1] "/Users/ceciliacerrilla/My Drive/Employment/Freelance ecology/2025/UCT/Data clinic/R Projects/Intro-to-R-Projects"

Now it’s time to set up your file management system outside of R to keep your workflow neat and tidy moving forward.

File management within an R Projects folder

How you choose to organize your files is entirely up to you, but I’ll show you how I do it. You may find it useful to start with this setup as a template and adjust it to suit your needs.

Within my R project folder, I have the following subfolders:

input
scripts
output

And the following file:

R_project_name.Rproj

^ This is automatically generated when you create an R project. To open an R project, simply double-click this file.

You can create these folders just like you would any other folders on your machine (you do not need to do this within R).

input

This is where all of my data goes. In my case, it’s Excel files. When you want to load a particular dataset in R studio, you will know the exact file path to call, because all of your data will be stored in the input folder within your R project folder (directory).

This is how you would access data from your input folder:

# df_name <- read_excel("input/data_file_name.xlsx")

Be sure to replace data_file_name.xlsx with the name of the file you want to load.
Make sure the readxl() package is installed and loaded. If it isn’t, install it, then run this code (without the #):

# library(readxl)

scripts

This is where all of your R scripts are stored.

TIP: Naming your R scripts. I number my scripts sequentially as I create them within an R project. You’ll see why this system is useful when we get to the output folder. For example, an R script in one of my projects might be named “10_Discharge-calibration.R”.

output

This is where your exported output files will be stored. In my case, my only output files were figures (PDF and JPG files). However, you might be exporting manipulated datasets, tables, or other types of output files, and you can house them here as well.

TIP: Naming your figures. This is where the sequential naming system I use in my scripts becomes useful. When naming each figure that I export, I use the same “prefix” as the script it came from. For example, if I create a regression plot in my “10_Discharge-calibration.R” script, I’ll name the figure “10_level-discharge_regression.pdf”. This way, if I look at the figure inside my output folder and want to access the source script to make changes, I know I need to open the script starting with “10_”.
TIP: Creating sub-folders. You can create as many sub-folders as needed within the input, script & output folders. I find this especially useful within the output folder to keep my figures organized.

Example: PhD Chapter 5 R Project

Working with R scripts: A beginner’s guide

Now that we know how to create and manage R Projects, let’s dive into working with R scripts.

Script header

Whenever I start a new R script, I include a script header with important information about the script. Below is an example of what a header on one of my scripts looks like. Feel free to use this as a template and adjust it as needed.

## TITLE:               05_Returns_Data-prep
## PURPOSE:             Prepare PIT tag antenna data for analysis & produce summary stats
## AUTHOR:              Cecilia Cerrilla (cecilia.cerrilla@gmail.com)
## DATE STARTED:        6-Dec-2022
## DATE LAST UPDATED:   19-Sep-2024

## Libraries
# library(tidyverse)
# library(readxl)
# library(ggpubr)
# library(janitor)
# library(lubridate)
# library(ggplot2)
# library(openxlsx)

## Master data files
# releases <- read_excel("input/PIT_antenna_hits.xlsx")

## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Top five lines: These lines give anyone (including your future self) a clear overview of what the script does. I make sure to update the ## DATE LAST UPDATED line each time I work on the script.
Libraries: As I write the code, I often come across functions that require packages I either don’t have installed or haven’t loaded. In such cases, I install the required package and add the corresponding library() function to the ## Libraries section of the script header. This ensures all the necessary packages are loaded at the start of the script. Note that I’ve commented out the library() calls here (with #), but in your own script, you would remove the # to run them.
Master data files: I load and assign any “master” data files used in the script at the beginning, so I know exactly what data I’m working with.
- Pay attention to the file path: it refers to the input folder to locate the specified file. If you have subfolders within the input folder, just add them to the path (e.g., input/subfolder_name/filename.xlsx).
- Also note that I’ve included the package required for the read_excel() function in the ## Libraries section, with the library(readxl) call.

Script navigation

When working with R, a well-structured script makes it easier to navigate, debug, and collaborate. Without proper organization, an R script can quickly become overwhelming, especially as it grows in length and complexity.

One simple but effective way to improve script organization is by using section headings and subheadings. These help break your script into logical parts, making it easier to find specific sections and understand the overall workflow.

Headings & the navigation bar

In R, you can create headings using the # symbol. The more # symbols you use, the more hierarchical the heading becomes:

# Main Section (Top Level)
## Subsection
### Sub-subsection
#### Detailed Breakdown

While using # symbols helps visually separate sections in your script, RStudio’s Outline view provides an even better way to navigate your code efficiently. To make your section headings appear in the Outline pane for quick access, you need to add four dashes ---- after the section name, like this:

# Main Section (Top Level) ----
## Subsection ----
### Sub-subsection ----
#### Detailed Breakdown ----

This formatting allows you to see an outline of your script in RStudio’s navigation bar, making it easier to jump between sections—especially in long scripts. You can find the navigation bar in two places of your Source Editor Pane (where you write your script):

Top-right corner – Click the button with a symbol of stacked, offset horizontal lines. This opens a clickable table of contents, allowing you to quickly navigate to a specific section.
Bottom-left corner – You’ll see a # symbol followed by the name of the section where your cursor is currently located, along with an up/down arrow. Clicking this opens a dropdown menu with a structured list of your script’s sections.

Exporting files

Let’s finish by looking at how to properly export files to the correct folders. You’ve already seen how to access files in your input folder by specifying the correct file path (refer to the Master data files bullet under the Script Header section). Now, let’s do the reverse—export files to the appropriate directories.

By saving outputs (such as cleaned datasets, figures, or reports) in designated folders, you maintain an organized workflow and ensure your files are easy to locate.

Sidenote: It’s up to you which packages you use to export files, but I like to do most of my coding within the Tidyverse. The Tidyverse is a collection of R packages designed for data science, emphasizing a consistent and streamlined approach to data manipulation, visualization, and analysis. It provides a coherent set of tools that follow a shared philosophy, making it easier to work with data in an intuitive and readable way. At its core, the Tidyverse is built around the idea of tidy data, where each variable is in its own column, each observation is in its own row, and each value has its own cell. This structure makes data easier to manipulate and visualize. When you load the Tidyverse library(tidyverse), you get access to several essential useful packages. My favourites include:

ggplot2 – for data visualization
dplyr – for data manipulation (e.g., filtering, summarizing, grouping)

Next, I’ll show you how to export figures using the ggsave() function, which is part of the ggplot2 package (a key component of the Tidyverse).

EXAMPLE:

Load the ggplot2 package if you haven’t done so already in your Script header

library(ggplot2)

Let’s use the mtcars built-in data frame from R for our example. No need to load it as it’s built-in. First, let’s take a look at what the data look like:

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Now, let’s create a scatter plot to visualize the relationship between fuel efficiency (mpg) and horsepower (hp). Name this plot “cars_scatterplot”, and view the plot.

cars_scatterplot <- ggplot(mtcars, aes(x = mpg, y = hp)) + 
  geom_point()

cars_scatterplot # View the plot

Finally, save the plot using the ggsave() function.

TIP: You can hover over a function to see its formatting and input variables. This gives you a quick preview of how to use the function, including the expected arguments and their structure.

#ggsave(plot = cars_scatterplot,
       #filename = "01_cars_mpg_hp_scatterplot.pdf",
       #path = "output/figures/cars",
       #width = 30, height = 22, units = "cm")

Understanding the ggsave() function and its arguments:

Paste the name of your plot into the plot argument.
Name your output file using the filename argument. The file type (e.g., .pdf, .jpg, .png) is determined by the suffix you choose.
The path argument specifies where your figure will be saved. Make sure it’s directed to the correct folder (or subfolder).
The width, height, and units arguments control the dimensions of your plot. Feel free to play around with these until you end up with your desired size.

Introduction to R Projects and basic script organisation

Cecilia Cerrilla

2025-03-19