Loading Libraries:
knitr::opts_chunk$set(warning = FALSE, fig.align = "center", out.width = "85%",
message = FALSE, cache = TRUE)
library(openintro)
library(tidyverse)
Global Theme Set Up:
my_theme <- theme(
plot.title = element_text(size = rel(1.8), hjust = 0.5,
margin = margin(10,15,10,10)),
axis.title = element_text(size = rel(1.5), hjust = 0.5),
axis.title.x = element_text(margin = margin(10,5,5,5)),
axis.title.y = element_text(margin = margin(5,10,5,5)),
axis.text = element_text(size = rel(1.1)))
Lab 1:
Run the following code to create a random number list. Then use
logical subsetting to find all numbers that are greater than one and
less than two.
Code:
rand_num <- runif(100, 0, 3)
rand_num[rand_num > 1 & rand_num < 2]
## [1] 1.082171 1.285525 1.829140 1.714602 1.477539 1.477167 1.281946 1.315248
## [9] 1.943766 1.477770 1.001603 1.792081 1.263464 1.544191 1.624726 1.915867
## [17] 1.408941 1.073963 1.798901 1.843595 1.231767 1.901173 1.036204 1.763537
## [25] 1.084374 1.142768 1.171027 1.826148 1.693367 1.275600 1.032825 1.175974
## [33] 1.897936
Lab 2: Create a tibble in R for the following data set
Student Name Test 1 Score Test 2 Score Test 3 Score
Alice 80 90 85
Bob 70 75 80
Charlie 85 88 91
Denise 60 65 70
Code:
score <- tribble(
~`Student Name`, ~`Test 1 Score`, ~`Test 2 Score`, ~`Test 3 Score`,
#---------------|---------------|----------------|----------------
"Alice", 80, 90, 85,
"Bob", 70, 75, 80,
"Charlie", 85, 88, 91,
"Denise", 60, 65, 70
)
Lab 3: Tell for each table of Table 2-4,
what symptoms of untidiness are there?
Answer: Table 2 combines multiple variables into a
single column; the cases
and population
values
should be separated into distinct columns. Similarly, Table 3 faces the
similar issue and requires separate the values of rate
column into two different varaibles – cases
and
population
. In Table 4a, the variables 1999
and 2000
should be merged into a single column with clearly
specified labels (e.g year
) for the values. Table 4b has
the same issue as Table 4a and requires the same adjustment.
Lab 4: Compute the rates per 10,000 in
population for each year and country using table2 instead of table1.
What is the difficulty?
Answer: A direct mutation of the data is challenging
because multiple variables are stored in a single column. To enable
easier further data transformation, the cases
and
population
values need to be separated into two distinct
columns.
Lab 5: Use pivot_longer() to tidy the
following data set
Student Name Test 1 Score Test 2 Score Test 3 Score
Alice 80 90 85
Bob 70 75 80
Charlie 85 88 91
Denise 60 65 70
Code:
pivot_longer(score, c(`Test 1 Score`,`Test 2 Score`, `Test 3 Score`),
names_to = "test_label", values_to = "score") -> new_score
new_score
## # A tibble: 12 × 3
## `Student Name` test_label score
## <chr> <chr> <dbl>
## 1 Alice Test 1 Score 80
## 2 Alice Test 2 Score 90
## 3 Alice Test 3 Score 85
## 4 Bob Test 1 Score 70
## 5 Bob Test 2 Score 75
## 6 Bob Test 3 Score 80
## 7 Charlie Test 1 Score 85
## 8 Charlie Test 2 Score 88
## 9 Charlie Test 3 Score 91
## 10 Denise Test 1 Score 60
## 11 Denise Test 2 Score 65
## 12 Denise Test 3 Score 70
Lab 6: Tidy the simple tibble below. Do you
need to make it wider or longer?
Code:
preg <- tribble(
~pregnant, ~male, ~female,
"yes", NA, 10,
"no", 20, 12
)
pivot_longer(preg, c(`male`,`female`), names_to = "gender",
values_to = "preg_count") -> preg_new
preg_new
## # A tibble: 4 × 3
## pregnant gender preg_count
## <chr> <chr> <dbl>
## 1 yes male NA
## 2 yes female 10
## 3 no male 20
## 4 no female 12
0. Do the data tidying of who data set step by step by
yourself, and then make the following plot with your
summary:
Code:
glimpse(who)
## Rows: 7,240
## Columns: 60
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ iso2 <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF…
## $ iso3 <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ year <dbl> 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 198…
## $ new_sp_m014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_m1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_m2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_m3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_m4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_m5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_m65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sp_f65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_m65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_sn_f65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_m65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_ep_f65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_m65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f014 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f1524 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f2534 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f3544 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f4554 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f5564 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ newrel_f65 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
who1 <- who %>%
pivot_longer(
cols = new_sp_m014:newrel_f65,
names_to = "key",
values_to = "cases",
values_drop_na = TRUE) %>%
print()
## # A tibble: 76,046 × 6
## country iso2 iso3 year key cases
## <chr> <chr> <chr> <dbl> <chr> <dbl>
## 1 Afghanistan AF AFG 1997 new_sp_m014 0
## 2 Afghanistan AF AFG 1997 new_sp_m1524 10
## 3 Afghanistan AF AFG 1997 new_sp_m2534 6
## 4 Afghanistan AF AFG 1997 new_sp_m3544 3
## 5 Afghanistan AF AFG 1997 new_sp_m4554 5
## 6 Afghanistan AF AFG 1997 new_sp_m5564 2
## 7 Afghanistan AF AFG 1997 new_sp_m65 0
## 8 Afghanistan AF AFG 1997 new_sp_f014 5
## 9 Afghanistan AF AFG 1997 new_sp_f1524 38
## 10 Afghanistan AF AFG 1997 new_sp_f2534 36
## # ℹ 76,036 more rows
who2 <- who1 %>%
mutate(key = str_replace(key, "mewrel", "new_rel")) %>%
print()
## # A tibble: 76,046 × 6
## country iso2 iso3 year key cases
## <chr> <chr> <chr> <dbl> <chr> <dbl>
## 1 Afghanistan AF AFG 1997 new_sp_m014 0
## 2 Afghanistan AF AFG 1997 new_sp_m1524 10
## 3 Afghanistan AF AFG 1997 new_sp_m2534 6
## 4 Afghanistan AF AFG 1997 new_sp_m3544 3
## 5 Afghanistan AF AFG 1997 new_sp_m4554 5
## 6 Afghanistan AF AFG 1997 new_sp_m5564 2
## 7 Afghanistan AF AFG 1997 new_sp_m65 0
## 8 Afghanistan AF AFG 1997 new_sp_f014 5
## 9 Afghanistan AF AFG 1997 new_sp_f1524 38
## 10 Afghanistan AF AFG 1997 new_sp_f2534 36
## # ℹ 76,036 more rows
who3 <- who2 %>%
separate(key, c("new", "type", "genderage"), sep = "_") %>%
print()
## # A tibble: 76,046 × 8
## country iso2 iso3 year new type genderage cases
## <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 Afghanistan AF AFG 1997 new sp m014 0
## 2 Afghanistan AF AFG 1997 new sp m1524 10
## 3 Afghanistan AF AFG 1997 new sp m2534 6
## 4 Afghanistan AF AFG 1997 new sp m3544 3
## 5 Afghanistan AF AFG 1997 new sp m4554 5
## 6 Afghanistan AF AFG 1997 new sp m5564 2
## 7 Afghanistan AF AFG 1997 new sp m65 0
## 8 Afghanistan AF AFG 1997 new sp f014 5
## 9 Afghanistan AF AFG 1997 new sp f1524 38
## 10 Afghanistan AF AFG 1997 new sp f2534 36
## # ℹ 76,036 more rows
who4 <- who3 %>%
select(-new, -iso2, -iso3) %>%
print()
## # A tibble: 76,046 × 5
## country year type genderage cases
## <chr> <dbl> <chr> <chr> <dbl>
## 1 Afghanistan 1997 sp m014 0
## 2 Afghanistan 1997 sp m1524 10
## 3 Afghanistan 1997 sp m2534 6
## 4 Afghanistan 1997 sp m3544 3
## 5 Afghanistan 1997 sp m4554 5
## 6 Afghanistan 1997 sp m5564 2
## 7 Afghanistan 1997 sp m65 0
## 8 Afghanistan 1997 sp f014 5
## 9 Afghanistan 1997 sp f1524 38
## 10 Afghanistan 1997 sp f2534 36
## # ℹ 76,036 more rows
who5 <- who4 %>%
separate(genderage, c("gender", "age"), sep = 1) %>%
print()
## # A tibble: 76,046 × 6
## country year type gender age cases
## <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 Afghanistan 1997 sp m 014 0
## 2 Afghanistan 1997 sp m 1524 10
## 3 Afghanistan 1997 sp m 2534 6
## 4 Afghanistan 1997 sp m 3544 3
## 5 Afghanistan 1997 sp m 4554 5
## 6 Afghanistan 1997 sp m 5564 2
## 7 Afghanistan 1997 sp m 65 0
## 8 Afghanistan 1997 sp f 014 5
## 9 Afghanistan 1997 sp f 1524 38
## 10 Afghanistan 1997 sp f 2534 36
## # ℹ 76,036 more rows
1. Plot the total number of TB cases in the world across
years.
Code:
ggplot(who5) +
stat_summary(aes(x = year, y = cases), fun = "sum", boundary = 1,
bins = 17, geom = "bar", fill = "darkorange3") +
labs(title = "World Tuberculosis Cases From 1997-2013",
x = "Years",
y = "Mean of cases per year") +
scale_x_continuous(limits = c(1995, 2015), breaks = seq(1995, 2015, 5)) +
scale_y_continuous(limits = c(0, 4.5e+06), breaks = seq(0, 5000000, 500000)) +
theme(plot.title = element_text(color = "magenta4"),
axis.title = element_text(color = "magenta3")) +
my_theme
2. Find out which country has the highest male-to-female
ratio of TB cases in 2010.
who5 %>%
filter(year == "2010", cases != 0, !is.na(cases)) %>%
group_by(country) %>%
summarise(tb_ratio = cases[gender == "m"]/cases[gender == "f"]) %>%
arrange(desc(tb_ratio)) %>%
print()
## # A tibble: 2,934 × 2
## # Groups: country [196]
## country tb_ratio
## <chr> <dbl>
## 1 Bosnia and Herzegovina 154
## 2 Azerbaijan 109.
## 3 Hungary 90
## 4 Belarus 65
## 5 Armenia 36
## 6 Qatar 32
## 7 Azerbaijan 29.8
## 8 Bahrain 27
## 9 Croatia 24
## 10 Netherlands 23
## # ℹ 2,924 more rows