We have spent the first few weeks of the course visualizing and transforming data. Now we will start learning to “Wrangle” data using the tidyverse. This involves obtaining data from various sources and cleaning it up or ‘tidying’ to get it into standard usable form. Load the tidyverse before starting.
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
Chapter 10 - Tibbles
1. The standard table format in R has been ‘data frames’ for some time. We have already worked with a couple. For instance, run
class(mtcars)
print(mtcars)
You can tell that mtcars is a data frame and note how it prints.
mtcars to a tibble using as_tibble and use the same two commands. Tibbles are data_frames with slightly different rules. One is that they print differently.From the class of a tibble, you can see where they get their name. Describe below how tibbles print differently from data frames using the mtcars example.
Your answer here.
However, even when the screen won’t fit all the variables, you might want to print them all. Try
print(nycflights13::flights, width=Inf)
Trying to make sense of the output makes clear why the standard is to only go to screen width. A better option is View(). Try it on flights:
Nice, huh?
2. Creating Tibbles from scratch: You can create a tibble handily by defining each of the columns by name in a tibble command. For example:
dolphin <- tibble(
x = 1:5,
y = 1,
z = x ^ 2 + y,
w = sample(letters,5)
)
Look at dolphin and decide what sample(letters) is doing.
Create a tibble where
Column 1 is called “rand”" and is a random draw from a uniform distribution on [0,2]
Column 2 is called “cat” and consists of the sequence of integers from 10 to 20.
Column 3 called “test_score” and is a random draw from the integers between 1 and 100. (this may take a bit of work)
#Your amazing code here
tb <- tibble(
`:)` = "smile",
` ` = "space",
`2000` = "number"
)
It’s awkward though, because you have to use the tickmarks whenever you call on the variables by name. (Personally, I would avoid this without a compelling reason) I have put an exercise with this kind of file naming below, but usually, we will just use R-acceptable variable names as the names of columns/variables. Practice referring to non-syntactic names in the following data frame by:
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
3. Subsetting tibbles- A) You can pull out a single variable from a tibble using $. For example, look at dolphin$z. You may be thinking, ‘wait a minute, I already know how to pull out a column using select’. Compare the results of pulling out z using select and $ and describe the difference. You may find the class command useful for this
z from dolphin both ways using the subsetting symbol ‘[[]]’. Compare the results of the following commands, note the difference in output.dolphin[,3]
dolphin[[3]] # and
dolphin[3,]
dolphin[["z"]]
dolphin[,"z"]
Summarize the different outputs in a textbox.
w column of dolphin# Your amazing code here
Give Section 10.5 Exercise 2 a try.
What does the following line do? (See .rmd file for the commands.)
Note This appears to mean at least some html commands work in RMarkdown documents.
4. Run the following code and for each variable, write a sentence to your (hypothetical) younger sibling describing what each variable is in the resulting tibble.
tibble(
a = lubridate::now() + runif(1e3) * 86400,
b = lubridate::today() + runif(1e3) * 30,
c = 1:1e3,
d = runif(1e3),
e = sample(letters, 1e3, replace = TRUE)
)
```
5. R Projects - Enlighten your instructor. I have read carefully about R projects in three different sources and I think I have the idea, but I need you to verify it with another in the class.
Put the code defining the tibble dolphin into an R-script and put the R-script in a new working directory. Call dolphin by another name though (be clever, you can do it.)
Create a new R-Project and put it in the new working directory.
Now zip up the entire directory and send it to your frenemy. See if they can open the project and run the script.
We will report on the results and I will tell you what I think R-Project is really for and you can help clarify my understanding.