The data set has information about 1538 skeletons kept in different locations across the world.
There are 3 categorical variables:
The remaining 8 variables are the lengths of 8 limb bones (in mm). The limb bones are
Note: Most skeletons have at least 1 missing measurement.
str_detect()
functionFor a couple of questions, you’ll be using the
str_detect()
function in the stringr
package,
which takes 2 or 3 arguments:
string =
the character (or column of characters) that
we want to check contains certain character of characterspattern =
the character(s) we want to check if it is in
the value given to the string
argumentnegate = F
if set to FALSE
, it will return
TRUE
if the character contains the pattern. If set to
TRUE
, it will return FALSE
if the character
contains the patternFor example,
str_detect("abcdef", pattern = "a", negate = F)
will return
TRUE
since “a” is in “abcdef”. If you change
negate = T
, it will return FALSE
instead
Using the correct dplyr
verb and
str_detect()
, only keep the rows where the sex of the
skeleton is known (no “0u” or “1u”). Save the resulting data frame as
bones1 and use count(bones1, sex)
to display the results in
the knitted document.
bones1 <-
bones |>
filter(str_detect(sex, "u", negate = T))
count(bones1, sex)
## sex n
## 1 Female 540
## 2 Male 981
Create the bar chart seen in Brightspace using the bones1 data set and two additional dplyr verbs. Make sure to place the order of the age ranges in the proper order!
bones1 |>
# Only keeping the age groups for 30+
filter(
age %in% c("30-39", "40-49", "50+")
# age >= 30
) |>
# Reordering the age group to be in numeric order
mutate(
# age = factor(age, levels = c("30-39", "40-49", "50+"))
) |>
# Creating the conditional bar chart
ggplot(
mapping = aes(
x = age,
fill = sex
)
) +
geom_bar(
position = "fill"
) +
# Adding a title, removing the labels for y and fill
labs(
x = "Age Range",
y = NULL,
fill = NULL,
title = "Age and Sex of Skeletons in Museums"
) +
# Changing the tickmarks to percentages and removing the buffer space
scale_y_continuous(
expand = c(0, 0, 0.05, 0),
labels = scales::label_percent()
) +
# Changing the theme to theme_classic()
theme_classic() +
# Centering the title
theme(
plot.title = element_text(hjust = 0.5)
)
The code chunk below will change the format of the
data from a wide format to a long format and save the
data set as boness. To get the code chunk below to work, fill
in the space to keep the ID, age, and the 8 bone columns only. We’ll be
looking at what the pivot_longer()
function does
later
If done correctly, the data set should have 11,032 rows and 5 columns
bones2 <-
bones |>
### Enter your code below:
# Use the correct dplyr verb in the line below to keep the specified columns
dplyr::select(ID, age, where(is.numeric)) |>
# Changing the format from wide to long
pivot_longer(
cols = where(is.numeric),
names_to = "bone",
values_to = "length",
values_drop_na = T # Removing the new row if the value is NA
) |>
# Separating the bone column into two columns: limb bone and the side
mutate(
side = str_trunc(bone, 1, ellipsis = ""),
bone = str_trunc(bone, 3, side = "left", ellipsis = ""),
bone = factor(bone, levels = c("hum", "rad", "fem", "tib"))
)
tibble(bones2)
## # A tibble: 11,032 × 5
## ID age bone length side
## <chr> <chr> <fct> <dbl> <chr>
## 1 1 40-49 hum 308. l
## 2 1 40-49 rad 229 l
## 3 1 40-49 fem 443 l
## 4 1 40-49 fem 452 r
## 5 1 40-49 tib 366. r
## 6 2 50+ fem 386 l
## 7 2 50+ fem 383 r
## 8 2 50+ tib 283 l
## 9 2 50+ tib 282. r
## 10 3 40-49 hum 311 l
## # ℹ 11,022 more rows
Create a column in the bones data set called
country that has just the country the skeleton is currently
located in (aka, for any row with location in the United States, it has
the state, United States). You should use the appropriate
dplyr
verb to add a new column along with the
if_else()
and str_detect()
functions, and no
others.
Name the resulting data set bones3a.
bones3a <-
bones |>
## Your code here ###
# Remove the state from the column
mutate(
country = if_else(str_detect(location, "United States"),
"United States",
location)
)
tibble(bones3a)
## # A tibble: 1,531 × 13
## ID sex age location lhum rhum lrad rrad lfem rfem ltib rtib
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Male 40-49 Alaska, U… 308. NA 229 NA 443 452 NA 366.
## 2 2 Female 50+ Alaska, U… NA NA NA NA 386 383 283 282.
## 3 3 Male 40-49 Alaska, U… 311 310 NA 222. 415 416. 330 332.
## 4 4 Male 25-29 Alaska, U… 289 298 224. 227 398 400 312 320.
## 5 5 Female 50+ Alaska, U… 295 302 208 205 395 396 306 305
## 6 6 Female 30-39 Alaska, U… 270. 281 196. NA NA 375 298 294.
## 7 7 Male 40-49 Alaska, U… 314. 306. 237 226 430 431 351 352
## 8 8 Male 30-39 Alaska, U… 318 322 234 238 434. 437 361 364
## 9 9 Male 30-39 Alaska, U… 304. 307 233 235 402. 407 338 338
## 10 10 Female 50+ Alaska, U… 287 292. 210. 214 390 386 311 315
## # ℹ 1,521 more rows
## # ℹ 1 more variable: country <chr>
Using the bones3a data set, create a data set named bones3b that has the following columns:
1) Country: Each country listed only once 2) count: How many times each country appears in the bones3a data set 3) prop: The proportion of skeletons in the bones3a data that are in each country
Then keep only the 10 countries with the highest proportion and arrange the rows from highest proportion to lowest proportoin. Make sure it appears in the knitted document
bones3b <-
bones3a |>
# Counting how frequently each country appears
count(
country,
name = "count"
) |>
# Calculating the proportion with mutate()
mutate(
prop = count/sum(count)
) |>
# Keeping the top 10 countries
slice_max(prop, n = 10) |>
# arranging them from highest to lowest prop
arrange(-prop)
bones3b
## country count prop
## 1 United States 722 0.47158720
## 2 Germany 108 0.07054213
## 3 Japan 94 0.06139778
## 4 United Kingdom 85 0.05551927
## 5 Egypt 81 0.05290660
## 6 Austria 78 0.05094709
## 7 Sudan 53 0.03461790
## 8 Italy 51 0.03331156
## 9 Belgium 41 0.02677988
## 10 Philippine Islands 31 0.02024820
If done correctly, the code to create the graph below should run
# Creating the bar chart
ggplot(
data = bones3b,
mapping = aes(
y = fct_reorder(country, prop), # Ordering largest to smallest prop
x = prop
)
) +
geom_col(
fill = "wheat1"
) +
# Removing the x and y-axis label
labs(
y = NULL,
x = NULL
) +
# Having the bars touch the y-axis and display percent on the x-axis
scale_x_continuous(
expand = c(0, 0, 0.05, 0),
labels = scales::label_percent()
)
Calculate the correlation between the length of the left arm and left leg by using the appropriate dplyr verbs in the following order:
1) Add two columns to the bones data set that are: a) larm = length of the left arm b) lleg = length of the left leg
2) Remove a row if larm or lleg is missing
3) Calculate the number of remaining skeletons and correlation between larm and lleg rounded to 3 decimal places. Both should be in the same data frame that has 1 row
bones |>
# Adding the larm and lleg columns
mutate(
larm = lhum + lrad,
lleg = lfem + ltib
) |>
# Remove any rows that have missing values
filter(
!is.na(larm),
!is.na(lleg)
) |>
# Counting the number of skeletons with both left bones and the correlation
summarize(
skeletons = n(),
left_limb_cors = round(cor(larm, lleg), digits = 3)
)
## skeletons left_limb_cors
## 1 1093 0.928
To create the graph seen in Brightspace in the second code chunk, create a data set named bones5 using the bones2 data set created in question 4. It will need 4 columns:
1) The age range (18-24, 25-29, 30-39, 40-49, 50+)
2) The bone (hum/rad/fem/tib)
3) The side of the body (l/r)
4) The median of the length column for each combination of age, bone, and side
Make sure the data frame appears in your knitted document by
using tibble(bones5)
at the bottom of the first code chunk.
Then check that it is correct by running the second code chunk, which
should create the graph
bones5 <-
bones2 |>
# Calculating the median of bone length for each age and bone combo
summarize(
.by = c(age, bone, side),
length_med = median(length)
)
tibble(bones5)
## # A tibble: 40 × 4
## age bone side length_med
## <chr> <fct> <chr> <dbl>
## 1 40-49 hum l 302
## 2 40-49 rad l 232.
## 3 40-49 fem l 425
## 4 40-49 fem r 424
## 5 40-49 tib r 350.
## 6 50+ fem l 428
## 7 50+ fem r 426
## 8 50+ tib l 351
## 9 50+ tib r 352.
## 10 40-49 hum r 306.
## # ℹ 30 more rows
# Creating the graph
ggplot(
data = bones5,
mapping = aes(
x = age,
y = length_med
)
) +
# Adding the bars
geom_col(
fill = "steelblue"
) +
# Creating a separate graph for each bone
facet_grid(
rows = vars(side),
cols = vars(bone),
# Changing the names that appear in the facet name plates
labeller = labeller(
side = c("l" = "Left Side", "r" = "Right Side"),
bone = c("fem" = "Femur", "rad" = "Radius",
"tib" = "Tibia", "hum" = "Humerus")
)
) +
# Changing the labels and adding a title
labs(
x = NULL,
title = "Medians of Bone Lengths by Age",
y = "Median length (mm)"
) +
# Centering the title and rotating the labels on the x-axis
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
plot.title = element_text(hjust = 0.5)
) +
# Removing the buffer space on the bottom of the graph
scale_y_continuous(
expand = c(0, 0, 0.05, 0)
)