PSY460: Advanced Quantitative Methods

Week #3: Getting Started with R

Today, we’ll dive into R programming and ensure that you feel comfortable with the basics. Then, you’ll work in teams to finalize your research question(s) and your plan for variable operationalizations.

Happy Groundhog Day!

Quiz

What does R do if you type mean(Scooby$imdb, na.rm = TRUE)?
What does R do if you type filter(Scooby, imdb == NA)?
What does R do if you type Scooby2 <- Scooby %>% mutate(type = “cartoon”)?
What is the difference between a <chr> variable and an <int> variable?

Why Switch to R?

RStudio provides a user-friendly interface for running code to analyze data.
R is open-source, and its power derives heavily from packages created by members of the data science community.
R has greatly surpassed other programs like SPSS in its abilities to run complex analyses, and it can create truly stunning graphs with ease.

Installing packages

Aside from basic functions, R does not come preloaded with all of its possibile functionalities. You will need to use relevant packages depending on your goals.
- NOTE: R is case-sensitive, and there are some cases where quotation marks are necessary.
- NOTE: Hashtags allow you to make comments to yourself that are invisible to R.

install.packages("tidyverse", repos = "https://cloud.r-project.org/") 
# This gives you access to many useful functions 
install.packages("palmerpenguins", repos = "https://cloud.r-project.org/") 
#This gives us some data about penguins

Loading packages

The first necessary step for data analysis is to load packages and data into the RStudio environment.
You only need to install each package once; it will then be available to you to load in the future. However, you will need to load relevant packages every time you restart R.

library("tidyverse") 
library("palmerpenguins")

Loading data

If you want to work with data that you have saved locally on your computer, you will need to direct R to the correct folder.
- This can be done with the following code, but if you have any issues, you can simply navigate to Session –> Set Working Directory.

setwd("~/Desktop/PSY 460 - Advanced Quantitative Methods/Slides")

Once you have set your working directory, you can read a CSV file and save it as a dataframe within the R environment with the following code:

penguins <- read.csv("penguins.csv", header = TRUE, 
                     stringsAsFactors = FALSE)

Time to Learn about Penguins!

Let's learn about penguins!

Penguin Data

In some of the examples we’ll use in class, you will import data from an R package rather than loading it from your files.

myownpenguins <- palmerpenguins::penguins
# This moves the publicly available dataset into your own environment.

Inspecting the dataset

glimpse(myownpenguins) # This allows you to look at the dataset.

Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Variable structure

Each line provides R with a command, often in the form of “object <- function”

new.variable <- c(1, 2, 3)

To check a variable’s structure, use the “str” command.
To change the structure of a variable, you can convert it into a new object that has a structure that you specify.

str(new.variable)

 num [1:3] 1 2 3

new.variable.factor <- factor(new.variable, levels = c(1, 2, 3),
                              labels = c("Not at all", 
                                         "Sorta", "A lot"))
str(new.variable.factor)

 Factor w/ 3 levels "Not at all","Sorta",..: 1 2 3

Summarizing variables of interest

myownpenguins %>% 
  group_by(species) %>% 
  summarize(mean.bodymass = mean(body_mass_g, na.rm = TRUE), 
            sd.bodymass = sd(body_mass_g, na.rm = TRUE),
            totalnumber = n())

# A tibble: 3 × 4
  species   mean.bodymass sd.bodymass totalnumber
  <fct>             <dbl>       <dbl>       <int>
1 Adelie            3701.        459.         152
2 Chinstrap         3733.        384.          68
3 Gentoo            5076.        504.         124

Saving your work

Make sure to save your code often, so that you don’t lose anything!
When you quit RStudio, it will ask if you would like to save your workspace. You don’t need to do that, because you can reproduce everything next time by rerunning your code.
If you want to save a new csv file based on a dataframe you’ve created, you can use the following command. However, you will typically only work with the raw data.

write.csv(new.variable, "ExcitingData.csv")