RStudio Global General Options

  • A few simple options can greatly help you.

  • Workspace:

    • Set save workspace to Never
  • Maintain defaults for

    • Restore most recently opened project at startup

    • Always save history

    • These two options can help you if R crashes or your computer does.

General Options

General Options

RStudio Markdown Global Options

  • Quarto files are one type of Markdown file.

  • The R Markdown options shown here will make your Quarto file easier to navigate and read.

  • R Notebooks are a basic draft Markdown file that we don’t use in this course.

    • Leave R Notebooks section as is.

R Markdown Options

R Markdown Options

RStudio Appearance Global Options

  • By default the RStudio panes are white.

  • Changing the appearance is completely optional but can help with eye fatigue.

  • It also makes working in the RStudio environment a little more interesting.

Appearance Options

Appearance Options

RStudio Code Options

  • There are many code options that can be changed.

  • I recommend maintaining the defaults until you know what you want to change and why.

  • There are however, three options under Syntax in the Display tab.

  • Checking all three of these options makes it easier to write and proofread your code.

Code Syntax Options

Code Syntax Options

Reminders:

Assignment Reminders

  • Pre-class Survey Due 9/3/25

  • HW 1 (Parts 1, 2, and 3) Due 9/3/25

Week 1 - File Management

  • Creating a Quarto Project

    • Quarto project automatically creates a Quarto (.qmd) file.

    • Adding data and img folders to your project.

    • Creating and editing a setup chunk in your Quarto file.

    • Creating and editing code chunks.

Week 1 - Data Management

  • Selecting data by rows and columns with square brackets

  • Examining data with R commands: glimpse, summary, unique, table

  • Types of variables

    • numeric variables (<dbl>, <int>)

    • categorical variables (<chr>, <fct>, <ord>)

    • Type of variable dictates how we examine, summarize and present the data

  • Using piping, |> to write R code more efficiently.

  • Using the c() operator to create a group of values

  • Using $ or pull or select to specify a variable within a dataset

Additional R syntax

Operators are used to filter data or create new variables

  • For example:

    • Filter a dataset of heights to heights <= 6 feet

    • Filter a dataset of cars to exclude SUVs

Operators in R is a good reference for some of the common operators used for data management in R.

💥 Week 2 In-class Exercises - Q1 💥

Poll Everywhere

Use the Operators in R reference link to find the operator that is put before = to indicate not equal to.

This same operator can be put before any value, e.g., X, to indicate not X.

Introduction to dplyr

Recall the starwars data from Week 1 Online dataset documentation

Original Data

#|label: original data
my_starwars |> glimpse(width=40)
Rows: 87
Columns: 14
$ name       <chr> "Luke Skywalker", "…
$ height     <int> 172, 167, 96, 202, …
$ mass       <dbl> 77, 75, 32, 136, 49…
$ hair_color <chr> "blond", NA, NA, "n…
$ skin_color <chr> "fair", "gold", "wh…
$ eye_color  <chr> "blue", "yellow", "…
$ birth_year <dbl> 19.0, 112.0, 33.0, …
$ sex        <chr> "male", "none", "no…
$ gender     <chr> "masculine", "mascu…
$ homeworld  <chr> "Tatooine", "Tatooi…
$ species    <chr> "Human", "Droid", "…
$ films      <list> <"A New Hope", "Th…
$ vehicles   <list> <"Snowspeeder", "I…
$ starships  <list> <"X-wing", "Imperi…

Modified Data

#|label: modified data
my_starwars_plot_dat |>        
  glimpse(width=40)
Rows: 24
Columns: 5
$ species <chr> "Human", "Droid", "Dro…
$ sex     <chr> "male", "none", "none"…
$ height  <int> 172, 167, 96, 202, 150…
$ mass    <dbl> 77, 75, 32, 136, 49, 1…
$ bmi     <dbl> 26.02758, 26.89232, 34…

Data Mgmt for a Boxplot Visualization

  • In Week 1, we previewed some data management of the starwars data for a boxplot visualization.

  • Today we will examine each data management step above in the subsequent panels of this slide.

#|label: starwars data management

# select, filter, and mutate commands are part of tidyverse suite
# bmi = weight(kg)/height(m)^2

my_starwars_plot_dat <- my_starwars |>         # my_starwars_plot_dat created for plot
  select(species, sex, height, mass) |>        # select specific variables
  filter(species %in% c("Human", "Droid")) |>  # filter data to humans and droids only
  mutate(bmi = mass/((height/100))^2) |>       # use mutate to create new variable, bmi
  filter(!is.na(bmi))                          # filter data to remove missing BMI values
  • Use the select command in the dplyr package to select variables.

  • The select command also orders the variables as written in the command.

  • We save this dataset with fewer variables as a NEW dataset, my_starwars_plot_dat.

  • glimpse is NOT required at each step but we will use here to examine the dataset modifications.

#|label: selecting variables

my_starwars_plot_dat <- my_starwars |>            # save as new dataset my_starwars_plot_dat
  select(species, sex, height, mass) |>           # select variables of interest         
  glimpse(width=60)
Rows: 87
Columns: 4
$ species <chr> "Human", "Droid", "Droid", "Human", "Human…
$ sex     <chr> "male", "none", "none", "male", "female", …
$ height  <int> 172, 167, 96, 202, 150, 178, 165, 97, 183,…
$ mass    <dbl> 77, 75, 32, 136, 49, 120, 75, 32, 84, 77, …
  • The filter command in the dplyr package is one common way to filter data.

  • Datasets can be filtered by numeric values, or character (text), or factor levels

  • A very useful operator for selecting data from specific categories is %in%, contained in.

#|label: filter observations

# filter data to include only two species categories Human and Droid
my_starwars_plot_dat <- my_starwars_plot_dat |>                     # overwrite dataset
  filter(species %in% c("Human", "Droid")) |>                       # filter data
  glimpse(width=60)
Rows: 41
Columns: 4
$ species <chr> "Human", "Droid", "Droid", "Human", "Human…
$ sex     <chr> "male", "none", "none", "male", "female", …
$ height  <int> 172, 167, 96, 202, 150, 178, 165, 97, 183,…
$ mass    <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0…
  • The mutate command in the dplyr package can be used to create a new variable.

  • New variables can be created from other variables or can overwrite variables (be careful).

  • We will use mutate for many varied tasks throughout this course.

#|label: mutate or create variable

my_starwars_plot_dat <- my_starwars_plot_dat |>           # overwrite dataset
  mutate(bmi = mass/((height/100))^2) |>                  # create new calculated variable, bmi
  glimpse()
Rows: 41
Columns: 5
$ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Human",…
$ sex     <chr> "male", "none", "none", "male", "female", "male", "female", "n…
$ height  <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 180,…
$ mass    <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.0, …
$ bmi     <dbl> 26.02758, 26.89232, 34.72222, 33.33007, 21.77778, 37.87401, 27…
  • A common task in data management is removing missing values.

  • In R, missing values are denoted as NA

  • Missing values can be filtered out using filter, the command is.na, and the operator !.

#|label: remove NAs

my_starwars_plot_dat <- my_starwars_plot_dat |>     # overwrite dataset
  filter(!is.na(bmi)) |>                            # filter out NA's with !is.na(...)
  glimpse(width=60)
Rows: 24
Columns: 5
$ species <chr> "Human", "Droid", "Droid", "Human", "Human…
$ sex     <chr> "male", "none", "none", "male", "female", …
$ height  <int> 172, 167, 96, 202, 150, 178, 165, 97, 183,…
$ mass    <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0…
$ bmi     <dbl> 26.02758, 26.89232, 34.72222, 33.33007, 21…

💥 Week 2 In-class Exercises - Q2 💥

Session ID: bua455s25

Fill in the blanks to make this sentence correct:


The select command is used to select ___ (or deselect them) and can be used to ___ them.

Comparing slice and filter

In both examples below, three variables of the my_starwars data are selected.

#|label: slice by row number
# slice data to first 5 rows and last 5 rows
(my_starwars_sliced <- my_starwars |>
  select(name, height, species) |>
  slice(1:5, 83:87)) 
# A tibble: 10 × 3
   name           height species
   <chr>           <int> <chr>  
 1 Luke Skywalker    172 Human  
 2 C-3PO             167 Droid  
 3 R2-D2              96 Droid  
 4 Darth Vader       202 Human  
 5 Leia Organa       150 Human  
 6 Finn               NA Human  
 7 Rey                NA Human  
 8 Poe Dameron        NA Human  
 9 BB8                NA Droid  
10 Captain Phasma     NA Human  
#|label: filter by height
# filter data to heights < 200 cm
(my_starwars_tall <- my_starwars |> 
  select(name, height, species) |>
  filter(height >= 200))
# A tibble: 11 × 3
   name         height species 
   <chr>         <int> <chr>   
 1 Darth Vader     202 Human   
 2 Chewbacca       228 Wookiee 
 3 IG-88           200 Droid   
 4 Roos Tarpals    224 Gungan  
 5 Rugor Nass      206 Gungan  
 6 Yarael Poof     264 Quermian
 7 Lama Su         229 Kaminoan
 8 Taun We         213 Kaminoan
 9 Grievous        216 Kaleesh 
10 Tarfful         234 Wookiee 
11 Tion Medon      206 Pau'an  

Why do these data management tasks?

  • Filtering data to a subset by value

  • Slicing data by row number

  • Selecting variables

  • Removing missing values

  • Creating new variables


These are all the most common tasks that are done to raw data to make it usable.

Uses for Managed Useable Data

Usable data can communicate information:

  • can be summarized in a table for presentation.
  • can be visualized in a plot.
  • can be analyzed using statistical models.
  • can be presented or published.

In the next demonstration we review

  • creating a simple plot from managed data.
  • formatting the plot for presentation.



Plot is saved as sw_box_1

Plot is NOT printed in this column.

#|label: save sw plot
sw_box_1 <- my_starwars_plot_dat |> 
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi))

Code is hidden in this column.

Unformatted plot is shown.

Hidden code chunk calls plot by name:

sw_box_1



Plot is saved as sw_box_2

Plot is NOT printed in this column.

#|label: plot with fill option
sw_box_2 <- my_starwars_plot_dat |> 
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi, fill=sex))

Code is hidden in this column.

Hidden code chunk calls plot by name:

sw_box_2



Plot is saved as sw_box_3

Plot is NOT printed in this column.

#|label: plot with fill option
sw_box_3 <- my_starwars_plot_dat |> 
  ggplot() +
  geom_boxplot(aes(x=species,y=bmi,fill=sex)) +
  theme_classic()

Code is hidden in this column.

Hidden code chunk calls plot by name:

sw_box_3

Previous plot code from sw_box_3 is on lines 9 - 12.

The rest of the code above and below includes formatting details.

#|label: final complete plot code 
#| code-line-numbers: "9-12"

my_starwars_plot_dat <- my_starwars_plot_dat |>
  mutate(sexF = factor(sex,                                    # create factor variable, sexF
                       levels = c("male", "female", "none"),   # specify order (levels)
                       labels =c("Male", "Female", "None")))   # specify labels

sw_box_final <- my_starwars_plot_dat |>
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi, fill=sexF)) + 
  theme_classic() + 
  labs(title="Comparison of Human and Droid BMI",              # labs specifies text labels
       subtitle="22 Humans and 4 Droids from Star Wars Universe",
       caption="Data Source: dplyr package in R",
       x="",y="BMI", fill="Sex") + 
  theme(plot.title = element_text(size = 20),                  # theme formats plot elements
        plot.subtitle = element_text(size = 15),
        axis.title = element_text(size=18),
        axis.text = element_text(size=15),
        plot.caption = element_text(size = 10),
        legend.text = element_text(size = 12),
        legend.title = element_text(size = 15),
        panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2),
        plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))

Showing Plots in A 2x2 Grid

  • Another common presentation task is to show a grid, row, or column of plots.
  • Final plot is simplified for showing in the 2x2 grid
#|label: 4 plots in a grid 
grid.arrange(sw_box_1, sw_box_2, sw_box_3, sw_box4_grid, ncol=2) 

Exporting a Plot: Two Methods

Method 1 (quick for one plot):

  1. Right click on plot on right
  2. Select ‘Save image as…’
  3. Save image as .png file (or other preference) to img folder

Method 2: (ideal for multiple plots)

  • Use ggsave command
  • Defaults to last plot displayed
  • In the code shown, I override the default to specify plot I want.
#|label: export final plot 
ggsave("img/StarWars_BMI_Boxplot.png", 
       plot=sw_box_final, width = 8, height = 6)

sw_box_final

Creating a README File

In HW 2 you will:

  • create and modify an R Quarto (.qmd) file.

  • render the Quarto file to create an HTML file.

  • create a README file.

A README file documents all files in your R project.

  • README files can be simple or complex.

  • BUA 455 will use one README file format.

Editing a README.txt file

  • in RStudio: File > Open File > click on file

  • in Notepad (Windows OS) or TextEdit (Mac OS)

💥 Week 2 In-class Exercises 💥

Session ID: bua455s25

Question 3. What type of file should the README file be saved as?


Question 4. We will use the pacman R package, in every lecture, and assignment because it simplifies installing a loading other packages.

There is a package suite that includes both dplyr and ggplot2 that we will will we use in every lecture, assignment, and quiz in BUA 455.

The name of this package suite is ____.

Introduction to HW 2

In class we will work through HW 2 Instructions

Students are encouraged but not required to collaborate with classmates at least once for HW2, HW 3, or HW 4.

  • All students are responsible for understanding all coding.

Collaboration Options

  • An easy collaboration option for now:

    • Each student makes their own R Quarto project.

    • Students can email or share .qmd files in a cloud drive.

  • Posit Cloud allows for “google-drive” style collaboration on an R project.

    • Posit Cloud will be used for the course projects.
  • GitHub is used for collaboration and version control in more advanced courses and other disciplines (Not required in this course).

Homework 2 NOTES

  • This assignment will not take too long but it includes important data management details.

  • Completing this assignment will allow you to practice the skills covered in class in Weeks 1 and 2.

  • Your completed assignment will create an HTML file with a clickable Table of Contents.

    • This HTML will provide concise notes on the code and concepts we have covered.

    • OPTIONAL: You can also render HW 1 for your notes.

  • NOTE: I provide code for you to copy, paste and MODIFY BUT you are responsible for reviewing and understanding this code before Quiz 1.

  • I will also provide a set of short demo videos that guide you through the assignment.

  • The remainder of this week’s lecture time will be devoted to working through this assignment.

Homework 2 INSTRUCTIONS

Instructions HTML File