R for Beginners

Alex Manos, M.S.

Pinellas County Division of Environmental Management

July 10, 2025

DISCLAIMER

I am not a computer scientist

Outline

If you want to follow along with the slides (optional) https://rpubs.com/aman11/r_workshop

Introductions
R vs. RStudio
Use cases
Installing R & RStudio
R coding basics
Coding practice
Tidy data principles

Introductions

Name
Division
Experience with R (if any)

Goals for This Workshop

Basic understanding of R and RStudio
Become comfortable with the RStudio interface and running code
Develop a VERY basic understanding of R coding principles
Create a community of practice for R users within Pinellas County

Background

What is R?

Wikipedia says:

“R is a programming language for statistical computing and data visualization.”

Data wrangling/tidying
Statistical analysis
Data visualization
Automation

Why use R?

R is one of the most popular programming languages in environmental science.
Open-source (free!) with a community of users and developers around the world and throughout numerous sectors.
Provides powerful tools for data analysis, visualization, and reporting.

What is RStudio?

RStudio is the IDE (Integrated Development Environment) for R.
Used for writing, running, and debugging R code.
Increases productivity and efficiency.

R vs. RStudio

R is the programming language
RStudio is the user interface for R (IDE)

R is the engine

RStudio is the informational display

CRAN

CRAN (Comprehensive R Archive Network) is the official repository for the R language and R packages
CRAN is a network of servers that store R packages and documentation
Maintained by POSIT
When you install a package, it is (usually) downloaded from CRAN
As of 07/10/2025, there are 22,392 packages on CRAN!

R and R Package Security

R and RStudio are safe and maintained by POSIT.
POSIT runs unit tests on CRAN packages before they are released for security.
Be careful about downloading packages from GitHub or other sources unless you review all of the code or is coming from a verified and trusted source.
Sources like Stack Overflow and other coding websites are great for snippets

Use Cases

Use Case - Water Quality Monitoring

Sampling divided into:
- Streams
- Lakes/Coastal

Nutrients, chlorophyll, Turbidity, bacteria, Dissolved Oxygen, etc.

Sampling divided into “runs” with tidally influenced streams

Stream Sampling Dates

No previous automation

Tide charts inspected for each site for optimal sampling dates

rtide package used to automatically pull tidal height

Algorithm developed to select optimal sampling dates based on time of outgoing tide

Randomized Strata Sampling – Old

SAS code 20+ years old and divided into multiple files
Only 1 license available within our group
Limited staff knowledge of SAS

Randomized Strata Sampling – New

Old SAS code converted to R
Code available to all staff to view, edit, run
In-house expertise allows for customization
Combined with stream date code to generate yearly sampling schedule for streamlined process

Sample Bottle Kits

Data QA/QC Semi-Automation

Previous automation processes were limited to HACH systems, all other QA/QC checks were done manually (Time/date, missing data)

Current process performs all checks automatically and generates PDF report detailing each check conducted
- Reproducible QA/QC
- Provides digital paper trail

Processing time before automation: 5-8 weeks

Processing time after automation: 2-4 weeks

Dashboards

https://pcdem.shinyapps.io/dashboard/

Presentations

This entire presentation was created using R and RStudio!
The code used to create this presentation is available on GitHub
The presentation is in Quarto format, which allows for easy conversion to HTML, PDF, and Word formats

---
title: "R for Beginners"
author: 
  - name: "Alex Manos, M.S."
    affiliation: "Pinellas County Division of Environmental Management" 
date: "2025-07-10"
format: 
  revealjs:
    controls: false
    date-format: "MMMM D, YYYY"
    theme: ["styles.scss"]
    logo: www/pinellas_logo.png
---

AI and Coding

AI is a powerful tool for generating code, but it is not perfect
You can use it to help learn code but you should try to figure it out yourself first

Don’t vibe-code!
- You can get code that works and generates results but are wrong
- You need to understand the code and know how to problem-solve

Installing R & RStudio

Installing R

R can be downloaded for free from posit (maintainer of R/RStudio software)
Click “Download and Install R” which will take you to the R CRAN page
Click “Download R for (your OS)”
- Windows: Click “base” and then “Download R x.x.x for Windows”
- macOS: Select the correct .pkg file based on your chip specs
Follow the installation instructions

Installing RStudio

RStudio is downloaded on the same posit page where R was downloaded from
Click “Download RStudio Desktop For (your OS)”
Open the RStudio .exe file and follow the instructions
You can open RStudio by clicking the icon or searching your applications

Changing R Version

RStudio allows you to change the version of R you are using

R Coding Basics

Common Terms

R-specific

Object: A variable, data frame, or other data structure
Script: A file containing R code
Function: A block of code that performs a specific task (function())
Argument: A value passed to a function
Package: A collection of functions and data sets

RStudio-specific

Panel: A section of the RStudio interface
Project: A collection of files and settings for a specific task
Environment: A list of objects currently in memory

Navigating RStudio

Code can be
written here

Imported data and
variables are here

Files, plots, and
packages are here

Output from code
displayed here

Installing Packages

Packages can be installed through 2 methods: console/script or RStudio interface
Console/Script: install.packages("package_name")
- If you want to install multiple packages at once you can do install.packages(c("package1", "package2"))
RStudio: Click the “Packages” tab in the bottom-right window pane.Then click “Install” and search for the package you want to install.

Loading Packages

Once the packages are installed we need to load them into our R session with the library() function

# Use the '#' symbol to talk to ourselves by 'commenting out' lines.
# R will not run any line with a '#' in front of it.
library(package1)
library(package2)

Notice too that you don’t need quotes around the package names anymore.
R now recognizes these packages as defined objects with given names

Setting Your Working Directory

Your working directory is where all your files live
R does not know where your files are unless you tell it
If you want to use any data that does not come with a package, you need to tell R where it lives

# The working directory where R will look for files 
getwd()

[1] "C:/Users/bcc105701/OneDrive - Pinellas County/Desktop/workshops/R_intro_workshop"

# Set the working directory to the folder where your files are
setwd("C:/Users/YourName/Documents/YourProject")

Running Code

You can run a line of code by placing the cursor before the line or anywhere in the line and pressing ‘Ctrl + Enter’ (Windows) or ‘Cmd + Enter’ (Mac).

[1] 500

Notice the ‘500’ printed to the console.

You can also highlight a block of code and run it all at once.

[1] 100

[1] 200

[1] 300

Basic Math

R is equipped with many mathematical operations

5 + 5 # Addition

[1] 10

5 - 5 # Subtraction

[1] 0

5 * 5 # Multiplication

[1] 25

5 / 5 # Division

[1] 1

5 ^ 5 # Exponentiation (or **)

[1] 3125

log(100) # Natural log

[1] 4.60517

6*4/(2^2*3)-2 # PEMDAS

[1] 0

Logical Statements & Booleans

Test	Meaning
`x < y`	Less than
`x > y`	Greater than
`==`	Equal to
`x <= y`	Less than or equal to
`x >= y`	Greater than or equal to
`x != y`	Not equal to
`x \| y`	Or
`x & y`	And
`x %in% y`	Is in
`is.na(x)`	Is missing
`!is.na(x)`	Is not missing

Logical Statements & Booleans (cont.)

1 > 2

[1] FALSE

1 < 2

[1] TRUE

1 == 2

[1] FALSE

1 != 2

[1] TRUE

1 < 2 | 3 > 4 ## only one test needs to be true to return TRUE

[1] TRUE

1 < 2 & 3 > 4 ## both tests need to be true to return TRUE

[1] FALSE

Logicals and Boolean Precedence

R like most other programming languages will evaluate the logical operators (==, >, etc) before the booleans (&, |, etc).

1 > 0.5 & 2

[1] TRUE

In this case, R is evaluating two separate logical statements:

- 1 > 0.5, which is TRUE

- 2, which is TRUE because R evaluates it as as.logical(2) which is always TRUE

1 > 0.5 & 1 > 2 # Evaluate two separate logical statements together

[1] FALSE

1 > 0.5 | 1 > 2 # Evaluate two separate logical statements individually

[1] TRUE

Cool, Now What?

Many different tasks can be done by combining the basic math and logical statements.

But we may need to set up a group of tests to do something.

We need to Assign them to reuse them later in functions, loops, etc.

Assignment

The most popular assignment operator in R is <- which is just < followed by -
- Read aloud as “gets”

x <- "Hello, World!" # note that text need to be wrapped in quotes
x # here you would "call" the variable to return the assigned value

[1] "Hello, World!"

x is the variable and "Hello, World!" is the value assigned to it

Using = as an assignment operator also works but is not recommended since it is also used to define arguments within functions
Mostly a matter of preference but will be easier for other programmers to read
Just keep it consistent

Naming Variables

Variable names can be anything you want but there are some rules to help make your code more readable:

Use descriptive names for objects and functions

Use camelCase or snake_case for naming objects and functions

Keep names short but descriptive

Avoid using special characters (e.g. $, %, &, etc.) in variable names

Don’t start variable names with a number

Some of these rules must be followed to avoid errors, others are just good practice

Naming Variables (cont.)

Good Names

avgTemp <- 75 # camelCase
avg_temp <- 75 # snake_case

Bad Names

avarageTemperature_1_2  <- 75 # misspelled "average" and too long
a.T <- 75 # not descriptive enough

Note the errors in the naming below

1stAvgTemp <- 75 # Variable name cannot start with a number

Error in parse(text = input): <text>:1:2: unexpected symbol
1: 1stAvgTemp
     ^

avgTemp% <- 75 # Variable name cannot contain special characters

Error in parse(text = input): <text>:1:8: unexpected input
1: avgTemp% <- 75 # Variable name cannot contain special characters
           ^

Naming Variables (cont.)

There are some names we can never use because they are reserved for R

# Control structures
if
else
while
for

# Function definition
function

# Constants
TRUE
FALSE
NULL
Inf
NaN
NA

There are more that can’t be used but you’ll see an error if you try to use them

Functions

Functions are blocks of code that perform a specific task
Functions can take arguments (inputs) and return values (outputs)
Functions can be built-in or user-defined

mean(x = 1:10)

[1] 5.5

min(x = 1:10)

[1] 1

sd(1:10) # note the argument is not named

[1] 3.02765

The argument does not have to be named for all functions, but may be necessary the more arguments a function has.

Use the ? operator followed by the function name to view documentation about that function

?mean # don't need to include parentheses

Working with objects

e <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # use c() to combine values into a vector

length(e) # How many things are in there?

[1] 10

sum(e)/length(e) # Hand calculate mean

[1] 5.5

mean(e) # R function to calculate mean

[1] 5.5

e <- data.frame(x = 1:5,
                y = 11:15)
e

mean(y)

Error: object 'y' not found

Global Environment

Error: object 'y' not found

The global environment is where all your objects live and can give us a hint about what went wrong

Fixing Our Issue

To get the mean of y we need to “index” e using the $ operator

mean(e$y)

[1] 13

R will look for named objects in the environment
If the interpreter can’t find y or any other object, it will give up because it does not think it exists
You need to tell the interpreter what to look for inside of the object

What are Objects?

Objects are what we work with in R

 [1] "is"                      "is.array"               
 [3] "is.atomic"               "is.call"                
 [5] "is.character"            "is.complex"             
 [7] "is.data.frame"           "is.double"              
 [9] "is.element"              "is.empty.model"         
[11] "is.environment"          "is.expression"          
[13] "is.factor"               "is.finite"              
[15] "is.finite.POSIXlt"       "is.function"            
[17] "is.hashtab"              "is.infinite"            
[19] "is.infinite.POSIXlt"     "is.integer"             
[21] "is.language"             "is.leaf"                
[23] "is.list"                 "is.loaded"              
[25] "is.logical"              "is.matrix"              
[27] "is.mts"                  "is.na"                  
[29] "is.na.data.frame"        "is.na.numeric_version"  
[31] "is.na.POSIXlt"           "is.na<-"                
[33] "is.na<-.default"         "is.na<-.factor"         
[35] "is.na<-.numeric_version" "is.name"                
[37] "is.nan"                  "is.nan.POSIXlt"         
[39] "is.null"                 "is.numeric"             
[41] "is.numeric.Date"         "is.numeric.difftime"    
[43] "is.numeric.POSIXt"       "is.numeric_version"     
[45] "is.object"               "is.ordered"             
[47] "is.package_version"      "is.pairlist"            
[49] "is.primitive"            "is.qr"                  
[51] "is.R"                    "is.raster"              
[53] "is.raw"                  "is.recursive"           
[55] "is.relistable"           "is.single"              
[57] "is.stepfun"              "is.symbol"              
[59] "is.table"                "is.ts"                  
[61] "is.tskernel"             "is.unsorted"            
[63] "is.vector"               "isa"                    
[65] "isatty"                  "isBaseNamespace"        
[67] "isClass"                 "isClassDef"             
[69] "isClassUnion"            "isdebugged"             
[71] "isFALSE"                 "isGeneric"              
[73] "isGrammarSymbol"         "isGroup"                
[75] "isIncomplete"            "islands"                
[77] "isNamespace"             "isNamespaceLoaded"      
[79] "ISOdate"                 "ISOdatetime"            
[81] "isOpen"                  "isoreg"                 
[83] "isRematched"             "isRestart"              
[85] "isS3method"              "isS3stdGeneric"         
[87] "isS4"                    "isSealedClass"          
[89] "isSealedMethod"          "isSeekable"             
[91] "isSymmetric"             "isSymmetric.matrix"     
[93] "isTRUE"                  "isVirtualClass"         
[95] "isXS3Class"

Vectors

Vectors are the most basic data structure in R and are a sequence of values
Come in two types:
- Atomic: Must be the same type
- Lists: can be different types

myVec <- 1:10
is.vector(myVec) # Check if my_vec is a vector

[1] TRUE

myList <- list(a = 1:4, b = "Hello, World!", c = data.frame(x = 1:5, y = 11:15))
is.vector(myList) # Check if my_list is a vector

[1] TRUE

Atomic Vectors

Can come in a variety of types:
- Numeric: can contain whole numbers and decimals
- Logicals: can only take two values TRUE or FALSE
- Characters: holds character string
- Factors: used to store categorical data

Accessing Vector Elements

You can access the elements of a vector by “indexing” the position of that element
We use the [] operator to index vectors
The first element of a vector is at position 1, not 0
You can use : to index a range of elements
You can use c() to index multiple, non-consecutive elements
You can also use negative indexing to exclude elements
Lists require [[]] to access the vector, followed by [] to access the elements of that vector

Indexing Vectors

vec <- rnorm(n = 10, mean = 30, sd = 10) # vector of random values
vec

 [1] 33.42356 34.28651 18.16700 41.23607 22.98001 30.52817 22.30879 20.68921
 [9] 23.96772 26.46790

vec[1] # first element

[1] 33.42356

vec[1:5] # first 5 elements

[1] 33.42356 34.28651 18.16700 41.23607 22.98001

vec[c(1, 3, 5)] # first, third, and fifth elements

[1] 33.42356 18.16700 22.98001

vec[-c(1, 3, 5)] # all but the first, third, and fifth elements

[1] 34.28651 41.23607 30.52817 22.30879 20.68921 23.96772 26.46790

vec[vec > 30] # all elements greater than 30

[1] 33.42356 34.28651 41.23607 30.52817

Indexing Lists

myList <- list(num = 1:5, name = c('Joe','Bob','Mary'))
myList

$num
[1] 1 2 3 4 5

$name
[1] "Joe"  "Bob"  "Mary"

myList[[1]] # vector element in the list

[1] 1 2 3 4 5

myList[[2]][1] # first element of the second vector in the list

[1] "Joe"

You can also use the $ operator to access list elements by name

myList$num # first vector in the list

[1] 1 2 3 4 5

myList$name[1] # first element of the second vector in the list

[1] "Joe"

Your Turn

Install and load the packages tidyverse, psych, and palmerpenguins using the install.packages() and library() functions.
Make a vector of 50 random values with a mean and standard deviation of your choice using the rnorm() function and assign it to a variable named vec (hint: use ?rnorm to read the argument descriptions).
Get summary statistics for the vector using the describe() function.
Use [] indexing to return the first 10 values of the vector.
Plot your data using the plot() function.

10:00

Starting a New Project

Create New Project

Follow these steps to create a new project:
1. Go to File in the upper left corner of RStudio
2. Select New Project
3. Select New Directory
4. Select New Project
5. Set the Directory name: to penguins
6. Click Browse... and set the directory to your desktop or other location
7. Click Create Project

Create New File

Once the new project session has opened make a new file to write code in:
1. Go to File in the upper left corner of RStudio
2. Select New File
3. Select R Script
4. Save the file as penguins.R

You can also use keyboard shortcuts to make a new file (Ctrl + Shift + N) and save it (Ctrl + S)

Data we Will Use

Artwork by @allison_horst

Our Data

library(palmerpenguins)

head(penguins) # view the first 10 rows

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>

str(penguins) # view the structure of the data

tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

View(penguins) # view the data in a new window

Indexing Data Frames

We can use column position to index objects.
There are two slots we can use to extract data: rows and columns
object_name[row, column]
We can also subset out data by column position using : or c(column_1, column_2)

penguins[1,1]

# A tibble: 1 × 1
  species
  <fct>  
1 Adelie

penguins[1, 1:2]

# A tibble: 1 × 2
  species island   
  <fct>   <fct>    
1 Adelie  Torgersen

penguins[1, c(1, 4)]

# A tibble: 1 × 2
  species bill_depth_mm
  <fct>           <dbl>
1 Adelie           18.7

Negative Indexing

We can also exclude various elements using - and/or logical tests

penguins[,-1]

# A tibble: 344 × 7
   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex    year
   <fct>           <dbl>         <dbl>             <int>       <int> <fct> <int>
 1 Torge…           39.1          18.7               181        3750 male   2007
 2 Torge…           39.5          17.4               186        3800 fema…  2007
 3 Torge…           40.3          18                 195        3250 fema…  2007
 4 Torge…           NA            NA                  NA          NA <NA>   2007
 5 Torge…           36.7          19.3               193        3450 fema…  2007
 6 Torge…           39.3          20.6               190        3650 male   2007
 7 Torge…           38.9          17.8               181        3625 fema…  2007
 8 Torge…           39.2          19.6               195        4675 male   2007
 9 Torge…           34.1          18.1               193        3475 <NA>   2007
10 Torge…           42            20.2               190        4250 <NA>   2007
# ℹ 334 more rows

names(penguins)

[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"

`$` Indexing

A more common way to index data is to use the $ operator to reference data by name rather than position.

penguins$species

  [1] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
  [8] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [15] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [22] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [29] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [36] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [43] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [50] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [57] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [64] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [71] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [78] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [85] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [92] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
 [99] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[106] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[113] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[120] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[127] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[134] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[141] Adelie    Adelie    Adelie    Adelie    Adelie    Adelie    Adelie   
[148] Adelie    Adelie    Adelie    Adelie    Adelie    Gentoo    Gentoo   
[155] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[162] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[169] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[176] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[183] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[190] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[197] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[204] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[211] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[218] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[225] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[232] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[239] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[246] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[253] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[260] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[267] Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo    Gentoo   
[274] Gentoo    Gentoo    Gentoo    Chinstrap Chinstrap Chinstrap Chinstrap
[281] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[288] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[295] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[302] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[309] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[316] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[323] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[330] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[337] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
[344] Chinstrap
Levels: Adelie Chinstrap Gentoo

penguins$species[1:10]

 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Adelie Chinstrap Gentoo

Indexing by Tests

penguins[penguins["sex"] == 'female', c('species', 'island')]

# A tibble: 176 × 2
   species island   
   <fct>   <fct>    
 1 Adelie  Torgersen
 2 Adelie  Torgersen
 3 <NA>    <NA>     
 4 Adelie  Torgersen
 5 Adelie  Torgersen
 6 <NA>    <NA>     
 7 <NA>    <NA>     
 8 <NA>    <NA>     
 9 <NA>    <NA>     
10 Adelie  Torgersen
# ℹ 166 more rows

However, this is not the most efficient or (more importantly) readable way to do this…

A (very) Brief Introduction to Tidyverse

The tidyverse is a collection of packages that work together to make data analysis easier and more efficient.
The tidyverse is built around the idea of “tidy data” which is a way of organizing data so that it is easy to work with.
Part of the tidyverse is the idea of “piping” which allows you to chain together multiple functions to create a single workflow.
- The pipe operator can be written as %>% or |>

The Tidyverse Ecosystem

The tidyverse is a package that installs other packages used for data wrangling, analysis, and visualisation and include:

ggplot2: data visualisation
dplyr: data manipulation
tidyr: data tidying
readr: data import
purrr: functional programming
tibble: data frames
stringr: string manipulation
forcats: categorical variables
lubridate: date and time manipulation

Piping Functions

Any guesses?

round(log(mean(penguins$body_mass_g[penguins$species == 'Adelie' & penguins$island == 'Torgersen'], 
           na.rm = TRUE)), 0)

[1] 8

library(tidyverse)

penguins |> # start with the dataset
  filter(species == 'Adelie', island == 'Torgersen') |> # filter the data 
  summarise(mean_mass = mean(body_mass_g, na.rm = TRUE)) |> # calculate the mean
  pull(mean_mass) |> # pull out the mean mass
  log() |> # notice no arguments inside parentheses
  round(0) # round the result

[1] 8

Although the piping adds lines, it makes the code much more readable