Reminder: Make sure to save this file (File > Save As …) in the folder you created for this class (NOT in the Downloads folder!).
Knit your document to HTML frequently so you can more easily find your mistakes. [Things to consider when you knit your documents: Does everything look as you suspected? If not, try to figure out the problem and fix it.]
Be sure to change the author in the YAML to your name. Remember to keep it inside the quotes.
Tidy Data
The United Nations collects information about the estimates of international migration by age, sex, and origin. In this Challenge Problem, you are going to examine the data from 2015 that provides information on migrants by destination country and origin country.
For Questions 1 to 3, refer to the United Nations’ migration spreadsheet (worksheet titled Table 16) found on the Canvas page (titled UN_MigrantStockByOriginAndDestination_2015.xlsx).
Extra challenge (not required): Re-write the spreadsheet in a tidy form for the first 10 cases using Google Sheets and providing a link to the data in this document. Be sure to change the Share settings to “Anyone with the link” can view so that the link works.
R programming
Note: The following questions are adapted from the exercises presented in Chapter 3 of the Data Computing textbook.
Result <- %>% filter(BabyNames, name == "Prince")
# Wrangling the data: to count the number babies named Prince, grouped by year and sex
Princes <-
BabyNames %>%
filter(name == "Prince") %>%
group_by(year, sex) %>%
summarise(yearlyTotal = sum(count))
# Graphing the results
Princes %>%
ggplot(data = Princes, aes(x = year, y = yearlyTotal)) +
geom_point(aes(color = sex)) +
geom_vline(xintercept = 1978)
There are several kinds of components in the above expressions.
function name
data table name
variable name
argument name
constant
Match each of the following to what kind of component ( (a) through (e) ) it is.
ggplot(a)
data = (d)
Princes (b)
aes(a)
x =(d)
year(c)
geom_point(a)
color =(d)
xintercept =(d)
1978(e)
Putting It All Together
Now let’s put the topics of R Markdown, R programming, and tidy data
together. You are going to do this by using the msleep
data
table from the {ggplot2}
package.
{ggplot2}
package and the msleep
dataset. Be sure to show your work in the code chunk below.library(ggplot2)
data(msleep)
msleep
dataset in at least two ways (using
at least two functions). Also, modify the code chunk option so that only
the output appears in the knitted document but prevents the code from
displaying. Be sure to show your work.## tibble [83 × 11] (S3: tbl_df/tbl/data.frame)
## $ name : chr [1:83] "Cheetah" "Owl monkey" "Mountain beaver" "Greater short-tailed shrew" ...
## $ genus : chr [1:83] "Acinonyx" "Aotus" "Aplodontia" "Blarina" ...
## $ vore : chr [1:83] "carni" "omni" "herbi" "omni" ...
## $ order : chr [1:83] "Carnivora" "Primates" "Rodentia" "Soricomorpha" ...
## $ conservation: chr [1:83] "lc" NA "nt" "lc" ...
## $ sleep_total : num [1:83] 12.1 17 14.4 14.9 4 14.4 8.7 7 10.1 3 ...
## $ sleep_rem : num [1:83] NA 1.8 2.4 2.3 0.7 2.2 1.4 NA 2.9 NA ...
## $ sleep_cycle : num [1:83] NA NA NA 0.133 0.667 ...
## $ awake : num [1:83] 11.9 7 9.6 9.1 20 9.6 15.3 17 13.9 21 ...
## $ brainwt : num [1:83] NA 0.0155 NA 0.00029 0.423 NA NA NA 0.07 0.0982 ...
## $ bodywt : num [1:83] 50 0.48 1.35 0.019 600 ...
## [1] 83
## [1] "name" "genus" "vore" "order" "conservation"
## [6] "sleep_total" "sleep_rem" "sleep_cycle" "awake" "brainwt"
## [11] "bodywt"
## [1] "carni" "omni" "herbi" NA "insecti"
msleep
dataset? How
many variables?brainwt
variable.vore
variable and what
do they represent? Note: the documentation in the help file for the
msleep
data set is not consistent with the number of groups
in the data. Please base your answer on the number of categories in the
actual data.There are 5 categories in the ‘vore’ variable.
*carni represents carnivore animals
*omni represents omnivore animals
*herbi represents herbivore animals
*insecti represents insect animals
*NA represents missing values
eval=FALSE
in the options of the code chunk
so that the code executes in your assignment]:library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
ggplot(data = msleep) +
geom_point(mapping = aes(x = bodywt, y = sleep_total))
omnivore <- msleep %>%
filter(vore == "omni")
omnivore