Load built-in datasets
Import data
Create and export a data table
Understand the use of basic data types
Understand and use the basic container types
- Vectors
- Lists
- Matrices
Use vectorized operations
Extra-spicy, fully optional vectorized operations problem for those interested

Load built-in datasets

List the datasets in dplyr.

data(package='dplyr')

Load the built-in dataset starwars and use glimpse() to see an overview.

data(starwars)
glimpse(starwars)

## Rows: 87
## Columns: 14
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
## $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
## $ sex        <chr> "male", "none", "none", "male", "female", "male", "female",…
## $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini…
## $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
## $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
## $ films      <list> <"A New Hope", "The Empire Strikes Back", "Return of the J…
## $ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
## $ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…

Convert the built-in base R mtcars dataset to a tibble (you will need to find the function for this; it isn’t in the chapter), and store it in the object mt.

mt <- tibble::as_tibble(mtcars)

Import data

Download the the zip file and unzip it into a “data” folder that is a subfolder of your working directory (e.g., a folder called “4.2” or something like that).

Read “disgust_scores.csv” into a table.

disgust <- readr::read_csv("data/disgust_scores.csv")

How many rows and columns are in the disgust dataset?

disgust_rows <- nrow(disgust)
disgust_cols <- ncol(disgust)

In the space provided directly below, write down what type of variable disgust_rows and disgust_cols are. You should be able to tell this just by looking in the environment at the values.

disgust_rows and disgust_cols are both _______ variables.

Create and export a data table

Create a tibble with the columns name, age, and country of origin for 2 people you know.

people <- people <- tibble::tibble(
  name = c("Alex", "Jamie"),
  age = c(22, 24),
  country = c("Canada", "India")
)

Export this data table in your “data” folder as a CSV and an RDS file. When you save the RDS file, use “gz” compression to reduce file size.

readr::write_csv(people, "data/people.csv")
saveRDS(people, "data/people.rds", compress = "gzip")

Understand the use of basic data types

Set the following objects to the number 1 with the indicated data type:

one_int (integer)
one_dbl (double)
one_chr (character)

one_int <- 1L
one_dbl <- 1
one_chr <- "1"

Understand and use the basic container types

Vectors

Create a vector of the numbers 3, 6, and 9.

threes <- c(3, 6, 9)

The built-in vector letters contains the letters of the English alphabet. Use an indexing vector of integers to extract the letters that spell ‘cat’.

cat <- letters[c(3, 1, 20)]

The function colors() returns all of the color names that R is aware of. What is the length of the vector returned by this function? (Use code to find the answer.)

color_length <- length(colors())

Lists

Create a named list called random_list that lists the objects “cat”, “threes”, and “color_length” (i.e., the three objects you just saved above).

random_list <- list(
  cat = cat,
  threes = threes,
  color_length = color_length
)

random_list

## $cat
## [1] "c" "a" "t"
## 
## $threes
## [1] 3 6 9
## 
## $color_length
## [1] 657

Run the code below and consider what seems to be happening here, considering the output above when you run random_list to display it.

Then, write a new line of code that only prints the letter “a” from within the random_list.

random_list$cat[2]

## [1] "a"

Matrices

The following code provided to you defines a matrix as a field of “O”s with a single “X” (X marks the spot!) Write a single line of code to extract the specific “X” from the field of “O”s

given_mat <- matrix(c(rep("O", times = 39), "X", rep("O", times = 9)),
                    nrow = 7, ncol = 7)

given_mat[given_mat == "X"]

## [1] "X"

Use vectorized operations

Set the object x to a vector containing the integers 1 to 100 (increasing by 1).

Use vectorised operations to define y as x squared. Use plot(x, y) to visualise the relationship between these two numbers.

x <- 1:100
y <- x^2

plot(x, y)

Extra-spicy, fully optional vectorized operations problem for those interested

The function call runif(n, min, max) will draw n numbers from a uniform distribution from min to max. If you set n to 10000, min to 0 and max to 1, this simulates the p-values that you would get from 10000 experiments where the null hypothesis is true. Create the following objects:

pvals: 10000 simulated p-values using runif()
is_sig: a logical vector that is TRUE if the corresponding element of pvals is less than .05, FALSE otherwise
sig_vals: a vector of just the significant p-values
prop_sig: the proportion of those p-values that were significant

set.seed(8675309) # ensures you get the same random numbers each time you run this code chunk

pvals    <- NULL
is_sig   <- NULL
sig_vals <- NULL
prop_sig <- NULL

4.2: Working with Data