R Projects, R Quarto Files, and R Syntax

Author

Penelope Pooler Eisenbies

Published

August 19, 2025

Why this course is essential

Data Management is a public-facing aspect of all industries.

  • This course focuses on data management NOT database management.

  • You will learn how to acquire, combine, ‘wrangle’ and curate data.

  • ALL assignments, tests, and final dashboard project are done using these three software environments together:

BUA 455 Hardware Requirements

All students must have the following to complete this course:

  • Hardware

    • Personal Laptop with Windows or MAC OS required

      • laptop can’t be a loaner

      • Chromebooks and Ipads are insufficient.

      • Ideally should have a minimum. of 4 GB RAM and 256 GB of Storage (more preferred)

        • Storage of a little less than 256 GB may be okay.
  • Note: Temporary laptop issues of one week or less can be managed using a cloud option, but you cannot complete this course without your own laptop.

BUA 455 Software Requirements

  • Software:

    • Install latest versions of

    • Uninstall previous versions and reinstall latest version, if needed.

    • If you have trouble installing/Reinstalling R, RStudio, or Quarto, TA’s and/or I will help you.

    • You are responsible for maintaining up-to-date software on your laptop.

    • All of these software environments are open-source and post updates every few months.

    • Students should always do their best to update their software as soon as possible.

BUA 455 Course Website

  • Course Website Includes Links to:

    • Syllabus

    • Schedule of Assignments on Project Deadlines

    • Lecture Slides and PDFs

    • Homework Assignments

  • If substantial changes are made, I will post an announcement on Blackboard.

Demo Videos

  • Installation Demos for Windows

  • Creating a new project with correct file structure (Required for all assignments.)

  • Using Quarto Files

  • More to come as semester progresses.

    • I updated all aspects of this course in 2024.

    • I updated all demo videos in 2024.

    • This semester I will update videos as needed.

HW 1 - Part 1

This Blackboard Assignment counts as 20% of HW Assignment 1.

  • Six questions including questions about:

    • some course policies from the syllabus.

    • the hardware requirements for this course (not negotiable).

    • the current versions of R and RStudio.

Week 1 In-class Exercises - Q1

Poll Everywhere

Each version of R is given a unique number to differentiate them.

What is the current version of R?

  • 4.1.0
  • 4.3.0
  • 4.5.0
  • 4.5.1
  • 4.5.3


  • Code to see the current R version: R.version.string
  • Code to see the current RStudio version: RStudio.Version()$version

HW 1 - Parts 2 and 3

Introduced today and covered in-depth in Lecture 2

During Lecture 1 we will:

  • We will create a folder where all coursework will be stored:

    • Create a Quarto Project for HW 1 with file structure required for all work in this course.

      • Note that Creating a Quarto Project creates a Quarto (.qmd) file.
    • Create data and img folders within the project.

    • Modify the Quarto (.qmd) file header and create a setup code block.

HW 1 - Parts 2 and 3 Continued

During Lecture 2 we will:

  • Create, edit and run R code blocks in our Quarto (.qmd) file.

  • Use provided code to export a plot to the img folder.

When work in R is completed students will:

  • Answer Blackboard questions based on R output.

  • Submit zipped HW 1 R project with name and and correct file structure and edited .qmd file.

  • Instructions for HW 1 - Parts 2 & 3 are on the BUA Course Website Assignments page.

R Project Structure

We will use the same project structure for assignments in this course.

These steps will also be shown in a demo video.

  1. Create a folder named BUA 455 on your desktop or somewhere convenient.

    • This folder should NOT be on your One Drive or Google drive.
  2. Open RStudio which also opens R and select:

    • File > New Project... > New Directory > Quarto Project

    • Create project as HW 1 <Your Full Name> in your desktop BUA 455 folder.

    • See example on next slide.

Saving a New Project

Screenshot Example of Saving HW 1

Screenshot Example of Saving HW 1

Adding Folders to a Project

Every R Project in this course will have two additional folders within the project folder.

  1. In the Files tab in the lower-right pane click the folder with the geen plus sign to create a new folder.

Create a new folder

Create a new folder
  1. Create a new folder named data where data files will be stored.

  2. Create a new folder named img where image files will be stored.

Basic File Structure for HW 1

Basic File Structure for HW 1

Unedited Quarto (.qmd) file header.

  • Creating a Quarto project also creates a basic Quarto (.qmd) file with the same name as the project.

  • Soon we will learn how to create these files.

  • First lets edit this file header so it is more functional.

  • Here is the unedited file header.

    • Note that the dashed lines are required.

Unedited .qmd header

Unedited .qmd header
  • Note that this header is NOT CODE

Editing the Quarto (.qmd) file header

Replace the unedited header with this text which is also included in HW 1 Instructions.

This header will add a table of contents and provide options for seeing or hiding code.

---
title: "HW 1 Penelope Pooler"                                                                               
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
  html:
    code-line-numbers: true
    code-fold: true
    code-tools: true
execute:
  echo: fenced 
---

Visual view and Source view

  • There are two different ways to view your .qmd file, Source and Visual.

  • The header will look the same in BOTH views.

  • Source view is easier to edit and is recommended for most editing.

  • Visual view gives you a preview of what the document will look like and allows for some editing.

Visual view

Visual view

Visual view and Source view

  • There are two different ways to view your .qmd file, Source and Visual.

  • The header will look the same in BOTH views.

  • Source view is easier to edit and is recommended for most editing.

  • Visual view gives you a preview of what the document will look like and allows for some editing.

Source view

Source view

Editing the Quarto File

In source view, we will delete the default text and code block so the file blank below the header.

Then we will add this text:

## Setup

- This code block (code chunk) will include code to specify some options for the whole document.

- It will also install and load required packages using the pacman package.

- The final command, p_loaded() will list the loaded packages so we can verify that all required packaged have been loaded correctly.

Adding an empty code code chunk

  • Quarto files are the newest type of R markdown files

  • Markdown files allows the user to have text, code, and output together in one document.

  • All Quarto files in this course will start with a code chunk labeled setup.

  • To create an empty code chunk:

    • Click Ctrl+Alt+i (Cmd+Alt+i in Mac)

    • OR Click green C icon at top of file pane. Add chunk icon

Empty R code chunk

Empty R code chunk

Adding options to a code chunk

  • In code chunks in Quarto, options are added two ways

    • using fences, #|

    • within the top curly brackets

  • You will see both methods used throughout this course

  • In the setup chunk we add the label using a fence, #| so that it appears in the printout.

HW 1 Setup

HW 1 Setup
  • label: setup labels the the chunk so we can find it easily.

Complete setup chunk code

  • In addition to specifying options, we use the setup to install and load packages and set options for the whole document.

  • The final command, p_loaded() shows all the packages that are loaded.

  • commands are explained with comments (# followed by text.)

Code
```{r}
#|label: setup

knitr::opts_chunk$set(echo=T, highlight=T)  # specifies default options for all chunks
options(scipen=100)                         # suppress scientific notation
                                            # install pacman if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
```
Loading required package: pacman
Code
```{r}
pacman::p_load(pacman, tidyverse, gridExtra, magrittr) # install and load required packages 
p_loaded()                                  # verify loaded packages
```
 [1] "magrittr"  "gridExtra" "lubridate" "forcats"   "stringr"   "dplyr"    
 [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
[13] "pacman"   
  • We run the whole chunk of code by clicking green arrow in upper left corner or by clicking Ctrl+Shift_Enter.

  • Output from setup should show 13 packages are loaded.

Document after Setup

  • In HW 1 we will look at two R datasets to learn some basic skills.

  • I will also provide example code from class to

    • export a dataset to your data folder.

    • create a formatted plot and export it to your img folder.

  • You are not expected to learn these export commands for HW 1 or to understand this plot code yet.

  • The plot and export comands are provided to verify that your data and img folders and are setup correctly to receive exported files.

HW 1 - Part 2

Importing and examining data

Code
```{r}
#|label: import and examine cars data                                          

# cars is an internal R dataset
# this code saves a copy of the cars data in the Global Environment
my_cars <- cars

# examine the dataset mycars using glimpse
glimpse(my_cars)

# same command with piping:
# read |> as 'is sent to' or 'goes into'
my_cars |> glimpse()
```
Rows: 50
Columns: 2
$ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13…
$ dist  <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34…
Rows: 50
Columns: 2
$ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13…
$ dist  <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34…

Week 1 In-class Exercises - Q2

Poll Everywhere

NOTE: This question is also Question 1 of the Blackboard portion of HW 1.

cars is a dataset that is internal to the R software. The previous chunk of code saves a copy of this dataset with a new name.


  • Where is this new copy of the cars dataset temporarily stored after running the previous chunk of code so we can work with it?

Week 1 In-class Exercises - Q3

Poll Everywhere

NOTE: This question is also Question 3 of the Blackboard portion of HW 1.

The my_cars dataset, which is a saved copy of the R dataset, cars has

  • ____ rows (observations)

  • ____ columns (variables)

Number rows and columns of a dataset can be seen in the Global Environment or by viewing the output from glimpse.

Using Square Brackets to Select Data

  • All datasets in R are matrices with rows and columns.

  • Using Square brackets is the most basic and reliable way to select rows, columns, and observations.

  • For example, we will a created a dataset called `my_cars’

  • Square brackets are placed after name of dataset.

    • Values BEFORE the comma in the square bracket specify selected ROW(S).

    • Values AFTER the comma in the square bracket specify selected COLUMN(S).

    • If space before or after comma is left blank, that indicates ALL

    • Examples:

      • my_cars[3,2]: observation in 3rd row and 2nd column of my_cars

      • my_cars[3,]: entire 3rd row of my_cars including ALL columns

      • my_cars[,2]: entire 2nd column of my_cars including ALL rows

Using Square Brackets to Select Data

  • Additional examples that demonstrate how to select any part of a dataset matrix.
Code
```{r}
#|label: square bracket examples
my_cars[3:5,]  # select rows 3, 4 and 5 both columns
my_cars[,1]    # select all rows of column 1
my_cars[10:12, 1] # select obs 10, 11, and 12 within col 1
my_cars[c(20,30,40),2] # select obs 20, 30, and 40, within col 2                        
```
  speed dist
3     7    4
4     7   22
5     8   16
 [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25
[1] 11 11 12
[1] 26 40 48

HW 1 Questions 6 and 7 on Blackboard

Students will use the above examples to add code to the chunk below as specified in Blackboard Questions 6 and 7.

We will work on Question 7 together in class.


Code
```{r}
#|label: square bracket exercises

# HW 1 Question 6                                                              


# HW 1 Question 7                                                               
```

Week 1 In-class Exercises - Q4

Poll Everywhere

What is the correct R command below to select and print rows 10, 15, and 20 of column 1 of the my_cars dataset to the screen?

  1. Remove # from the incomplete R command.

  2. Remove ____ (blank spaces) and replace with correct values.

    • Note that c() is used to group non-consecutive elements.
Code
```{r}
#|label: square bracket in-class exercise

# completed command should be added to HW 1                                                  
# it will also provide answers to HW 1 - Question 7

# my_cars[c(____,____,____), ____]
```

HW 1 - Part 3

  • The third part of HW 1 uses the starwars dataset from R.

  • We use this dataset to

    • examine different types of variables.

    • learn basic ways to summarize and examine data.

    • preview data and image export skills to verify internal project file structure.

Star Wars Logo

Star Wars Logo

R starwars dataset

  • First we save the R dataset, starwars to our Global Environment as my_starwars and examine the data.
Code
```{r}
#|label: save and examine starwars data
my_starwars <- starwars
glimpse(my_starwars, width=60) # width option added to fit screen
```
Rows: 87
Columns: 14
$ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Da…
$ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 1…
$ mass       <dbl> 77, 75, 32, 136, 49, 120, 75, 32, 84, 7…
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brow…
$ skin_color <chr> "fair", "gold", "white, blue", "white",…
$ eye_color  <chr> "blue", "yellow", "red", "yellow", "bro…
$ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47…
$ sex        <chr> "male", "none", "none", "male", "female…
$ gender     <chr> "masculine", "masculine", "masculine", …
$ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatoo…
$ species    <chr> "Human", "Droid", "Droid", "Human", "Hu…
$ films      <list> <"A New Hope", "The Empire Strikes Bac…
$ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike…
$ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>…

Default Variable Types

  • Variable type dictates how variable is managed and presented.

  • By default, R assumes all variables are numeric or character (text) variables.

  • Numeric variables:

    • decimal, dbl

    • integer, int

  • Non-numeric variables (may appear as numbers:)

    • Character variables, chr are text strings

      • Numeric data may be classified as character variables.

      • Numeric character variables can be converted to numeric values.

Star Wars Var Types

Star Wars Var Types

Created Variable Types

  • Factors (common) and Logical variables can be created from numeric or character variables as needed.

    • Factor variables, fct , or ord

      • Can be created from character variables or numeric variables

      • Interpreted as categorical variables by R

      • ord refers to ordinal or ordered factor variables

      • Factors are essential for plots, tables, and analyses

    • Logical variables, lgl

      • TRUE/FALSE (R interprets logical data as 0 or 1)

Star Wars Var Types

Star Wars Var Types

unique for Character or Factor Variables

  • The unique command list all levels (categories) in a text variable.
  • Great for detecting typos.
Code
```{r}
#|label: unique command without piping                                            
unique(my_starwars$species) # use $ to specify species within my_starwars
```
 [1] "Human"          "Droid"          "Wookiee"        "Rodian"        
 [5] "Hutt"           NA               "Yoda's species" "Trandoshan"    
 [9] "Mon Calamari"   "Ewok"           "Sullustan"      "Neimodian"     
[13] "Gungan"         "Toydarian"      "Dug"            "Zabrak"        
[17] "Twi'lek"        "Aleena"         "Vulptereen"     "Xexto"         
[21] "Toong"          "Cerean"         "Nautolan"       "Tholothian"    
[25] "Iktotchi"       "Quermian"       "Kel Dor"        "Chagrian"      
[29] "Geonosian"      "Mirialan"       "Clawdite"       "Besalisk"      
[33] "Kaminoan"       "Skakoan"        "Muun"           "Togruta"       
[37] "Kaleesh"        "Pau'an"        
```{r}
#|label: unique command with piping
my_starwars |> pull(species) |> unique() # pull species var then pipe to unique()
```

table for Character or Factor Variables

  • table outputs tally of the number of observations in each variable category
  • Useful for finding typos and making data management decisions
Code
```{r}
#|label: table command  with and without piping                                            
table(my_starwars$species) # use $ to specify species within my_starwars
my_starwars |> pull(species) |> table() # pull species var then pipe to table()
```

        Aleena       Besalisk         Cerean       Chagrian       Clawdite 
             1              1              1              1              1 
         Droid            Dug           Ewok      Geonosian         Gungan 
             6              1              1              1              3 
         Human           Hutt       Iktotchi        Kaleesh       Kaminoan 
            35              1              1              1              2 
       Kel Dor       Mirialan   Mon Calamari           Muun       Nautolan 
             1              2              1              1              1 
     Neimodian         Pau'an       Quermian         Rodian        Skakoan 
             1              1              1              1              1 
     Sullustan     Tholothian        Togruta          Toong      Toydarian 
             1              1              1              1              1 
    Trandoshan        Twi'lek     Vulptereen        Wookiee          Xexto 
             1              2              1              2              1 
Yoda's species         Zabrak 
             1              2 

        Aleena       Besalisk         Cerean       Chagrian       Clawdite 
             1              1              1              1              1 
         Droid            Dug           Ewok      Geonosian         Gungan 
             6              1              1              1              3 
         Human           Hutt       Iktotchi        Kaleesh       Kaminoan 
            35              1              1              1              2 
       Kel Dor       Mirialan   Mon Calamari           Muun       Nautolan 
             1              2              1              1              1 
     Neimodian         Pau'an       Quermian         Rodian        Skakoan 
             1              1              1              1              1 
     Sullustan     Tholothian        Togruta          Toong      Toydarian 
             1              1              1              1              1 
    Trandoshan        Twi'lek     Vulptereen        Wookiee          Xexto 
             1              2              1              2              1 
Yoda's species         Zabrak 
             1              2 

table for examining 2 variables

  • To save an object: assign it to a name with <-
    • Objects can be values, vectors, datasets, plots, tables, etc.
  • To print an object to the screen as you save it, enclose it in parentheses.
```{r}
#|label: using table to summarize 2 vars
table(my_starwars$sex, my_starwars$hair_color) # $ specifies variables in dataset
```
Code
```{r}
#|label: table for 2 vars with piping and saving result
(sex_haircolor_smry <- my_starwars |> select(sex, hair_color) |>table())
```
                hair_color
sex              auburn auburn, grey auburn, white black blond blonde brown
  female              1            0             0     3     0      1     5
  hermaphroditic      0            0             0     0     0      0     0
  male                0            1             1     9     3      0    11
  none                0            0             0     0     0      0     0
                hair_color
sex              brown, grey grey none white
  female                   0    0    5     1
  hermaphroditic           0    0    0     0
  male                     1    1   29     3
  none                     0    0    3     0

Displaying a Saved Object

You may want to save a plot or a summary and display it elsewhere in a document or website.

Code
```{r}
#|label: display saved object
sex_haircolor_smry #type name of object to display it                                   
```
                hair_color
sex              auburn auburn, grey auburn, white black blond blonde brown
  female              1            0             0     3     0      1     5
  hermaphroditic      0            0             0     0     0      0     0
  male                0            1             1     9     3      0    11
  none                0            0             0     0     0      0     0
                hair_color
sex              brown, grey grey none white
  female                   0    0    5     1
  hermaphroditic           0    0    0     0
  male                     1    1   29     3
  none                     0    0    3     0

Summarizing Numeric Data

  • summary and mean are two quick ways to summarize numeric variables.
Code
```{r}
#| label: summary and mean examples
my_starwars |> select(height) |> summary()   # summarize height of starwars characters
mean(my_starwars$height, na.rm=T)            # calculate the mean (without piping)
# na.rm=T is required to remove missing obs before calculation
my_starwars |> pull(height) |> mean()        # mean with piping, NA's not removed
my_starwars |> pull(height) |> mean(na.rm=T) # mean without piping NA's removed
```
     height     
 Min.   : 66.0  
 1st Qu.:167.0  
 Median :180.0  
 Mean   :174.6  
 3rd Qu.:191.0  
 Max.   :264.0  
 NA's   :6      
[1] 174.6049
[1] NA
[1] 174.6049
  • na.rm=T used in mean, var, sd, min, max, median, etc.

Week 1 In-class Exercises - Q5

Poll Everywhere

The output from the glimpse(my_starwars) command lists each variable and shows its type. This dataset has

  • ____ character variables (labeled <chr>) and
  • ____ numeric variables (labeled <int>, <dbl>, or <num>).


  • NOTES:

    • To answer this question, examine the glimpse output and count the the nuber of variables of each type.

    • RECALL: Numeric variables include both decimal (<dbl>) AND integer (<int>) variables.

    • The glimpse command is a newer alternative to str and will only work if the tidyverse package suite is installed and loaded.

Data Mgmt for a Basic Boxplot

“The greatest value of a picture is when it forces us to notice what we never expected to see. -John W. Tukey

  • This is a preview of

    • data management skills we will cover soon in this course

    • data visualization skills using ggplot, which we will use throughout this course.

  • This preview of plot code will be included in HW 1 for you to run.

  • You do not have to fully understand this code in HW 1, but you do have to run it.

  • The HW 1 code will

    • create the final plot from this week’s lecture and export it to your HW 1 img folder.

    • create a small summary dataset and save it to your HW 1 data folder.

  • Running the provided code to export the plot and table is a required part of HW 1.

Data Management Demonstration

  • Recall the original raw my_starwars dataset:
Code
```{r}
#|label: R starwars dataset
# data can also be viewed in Glabal Environment
my_starwars |> glimpse(width=60)                                              
```
Rows: 87
Columns: 14
$ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Da…
$ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 1…
$ mass       <dbl> 77, 75, 32, 136, 49, 120, 75, 32, 84, 7…
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brow…
$ skin_color <chr> "fair", "gold", "white, blue", "white",…
$ eye_color  <chr> "blue", "yellow", "red", "yellow", "bro…
$ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47…
$ sex        <chr> "male", "none", "none", "male", "female…
$ gender     <chr> "masculine", "masculine", "masculine", …
$ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatoo…
$ species    <chr> "Human", "Droid", "Droid", "Human", "Hu…
$ films      <list> <"A New Hope", "The Empire Strikes Bac…
$ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike…
$ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>…

Code shows data management for a plot with comments

Using piping, |> results in efficient streamlined data amangement.

Code
```{r}
#|label: starwars data management

# select, filter, and mutate commands are part of tidyverse suite
# bmi = weight(kg)/height(m)^2

my_starwars_plot_dat <- my_starwars |>         # my_starwars_plot_dat created for plot
  select(species, sex, height, mass) |>        # select specific variables
  filter(species %in% c("Human", "Droid")) |>  # filter data to humans and droids only
  mutate(bmi = mass/((height/100))^2) |>       # use mutate to create new variable, bmi
  filter(!is.na(bmi))                          # filter data to remove missing BMI values
```

Original Data

Code
```{r}
#|label: original data
my_starwars |> glimpse(width=40)
```
Rows: 87
Columns: 14
$ name       <chr> "Luke Skywalker", "…
$ height     <int> 172, 167, 96, 202, …
$ mass       <dbl> 77, 75, 32, 136, 49…
$ hair_color <chr> "blond", NA, NA, "n…
$ skin_color <chr> "fair", "gold", "wh…
$ eye_color  <chr> "blue", "yellow", "…
$ birth_year <dbl> 19.0, 112.0, 33.0, …
$ sex        <chr> "male", "none", "no…
$ gender     <chr> "masculine", "mascu…
$ homeworld  <chr> "Tatooine", "Tatooi…
$ species    <chr> "Human", "Droid", "…
$ films      <list> <"A New Hope", "Th…
$ vehicles   <list> <"Snowspeeder", "I…
$ starships  <list> <"X-wing", "Imperi…

Modified Data

Code
```{r}
#|label: modified data
my_starwars_plot_dat |> glimpse(width=40)
```
Rows: 24
Columns: 5
$ species <chr> "Human", "Droid", "Dro…
$ sex     <chr> "male", "none", "none"…
$ height  <int> 172, 167, 96, 202, 150…
$ mass    <dbl> 77, 75, 32, 136, 49, 1…
$ bmi     <dbl> 26.02758, 26.89232, 34…
Code
```{r fig.align='center',fig.dim=c(5,4)}
#|label: unformatted unsaved plot
my_starwars_plot_dat |>                     # most basic boxplot code
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi))
```



Plot is saved as sw_box_1

Plot is NOT printed in this column.

Code
```{r}
#|label: save sw plot
sw_box_1 <- my_starwars_plot_dat |> 
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi))
```

Code is hidden in this column.

Unformatted plot is shown.

Hidden code chunk calls plot by name:

sw_box_1



Plot is saved as sw_box_2

Plot is NOT printed in this column.

Code
```{r}
#|label: plot with fill option
sw_box_2 <- my_starwars_plot_dat |> 
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi, fill=sex))
```

Code is hidden in this column.

Hidden code chunk calls plot by name:

sw_box_2



Plot is saved as sw_box_3

Plot is NOT printed in this column.

Code
```{r}
#|label: plot with fill option
sw_box_3 <- my_starwars_plot_dat |> 
  ggplot() +
  geom_boxplot(aes(x=species,y=bmi,fill=sex)) +
  theme_classic()
```

Code is hidden in this column.

Hidden code chunk calls plot by name:

sw_box_3

Previous plot code from sw_box_3 is on lines 9 - 12.

The rest of the code above and below includes formatting details.

Code
```{r}
#|label: final complete plot code 
#| code-line-numbers: "9-12"

my_starwars_plot_dat <- my_starwars_plot_dat |>
  mutate(sexF = factor(sex,                                    # create factor variable, sexF
                       levels = c("male", "female", "none"),   # specify order (levels)
                       labels =c("Male", "Female", "None")))   # specify labels

sw_box_final <- my_starwars_plot_dat |>
  ggplot() +
  geom_boxplot(aes(x=species, y=bmi, fill=sexF)) + 
  theme_classic() + 
  labs(title="Comparison of Human and Droid BMI",              # labs specifies text labels
       subtitle="22 Humans and 4 Droids from Star Wars Universe",
       caption="Data Source: dplyr package in R",
       x="",y="BMI", fill="Sex") + 
  theme(plot.title = element_text(size = 20),                  # theme formats plot elements
        plot.subtitle = element_text(size = 15),
        axis.title = element_text(size=18),
        axis.text = element_text(size=15),
        plot.caption = element_text(size = 10),
        legend.text = element_text(size = 12),
        legend.title = element_text(size = 15),
        panel.border = element_rect(colour = "lightgrey", fill=NA, linewidth=2),
        plot.background = element_rect(colour = "darkgrey", fill=NA, linewidth=2))
```

Key Points from This Week

  • File Management:

    • REQUIRED: current versions of R, RSudio, Quarto

    • Create a BUA 455 folder on your laptop

      • NOT in your Downloads folder.
    • R Projects and Quarto files are used for all coursework

  • Data Management:

    • Examining data:

      • glimpse, unique, table, summary
    • Selecting data by row and column using square brackets

    • Different types of variables and how to summarize them

You may submit an ‘Engagement Question’ about each lecture until midnight on the day of the lecture. A minimum of four submissions are required during the semester.