See the Brightspace post for a description of the data.
If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document
The final document should not show any warnings
Skim the data set.
skim(bones)
Name | bones |
Number of rows | 1531 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
sex | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
age | 0 | 1 | 3 | 5 | 0 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
humerus | 161 | 0.89 | 303.88 | 22.97 | 229.5 | 287.5 | 303.5 | 319 | 376.0 | ▁▅▇▅▁ |
radius | 217 | 0.86 | 233.14 | 18.98 | 179.0 | 219.0 | 233.0 | 247 | 290.5 | ▁▅▇▅▁ |
femur | 117 | 0.92 | 427.28 | 31.46 | 345.0 | 405.0 | 428.0 | 449 | 531.0 | ▂▆▇▃▁ |
tibia | 135 | 0.91 | 353.21 | 28.06 | 276.0 | 333.0 | 353.0 | 372 | 446.0 | ▁▆▇▃▁ |
iliac | 66 | 0.96 | 262.55 | 18.05 | 184.0 | 251.0 | 263.0 | 274 | 324.0 | ▁▂▇▆▁ |
Answer the following questions:
There are two categorical variables: sex and age
The five other variables are numeric: humerus, radius, femur, tibia, iliac
Iliac has the fewest missing values
radius has the most missing values
Start by creating a new column in bones called arm, the total length of the humerus and radius bones, then display the first 6 rows of the bones data set
bones$arm <- bones$humerus + bones$radius
head(bones)
## sex age humerus radius femur tibia iliac arm
## 1 Male 20-29 289.0 223.5 398.0 312 240.5 512.5
## 2 Female 20-29 NA 197.5 376.0 288 246.5 NA
## 3 Male 20-29 305.5 228.5 421.0 337 295.0 534.0
## 4 Male 20-29 287.0 221.0 407.5 320 279.0 508.0
## 5 Male 20-29 352.5 259.0 513.5 377 287.0 611.5
## 6 Male 20-29 294.5 228.0 403.0 318 269.0 522.5
What happens in R when you add a missing value to a non-missing value?
If either value is missing, R returns NA
(missing)
Start by creating a blank graph saved as gg_arm with:
arm on the x-axis
A white background with grey grid lines
The x-axis labelled as “Arm Length (mm)”
gg_arm <-
ggplot(
data = bones,
mapping = aes(x = arm)
) +
theme_bw() +
labs(x = "Arm Length (mm)")
gg_arm
Create and save a histogram named gg_arm_hist with
Bars colored with “seagreen”
A black outline for each bar
Each bin 10 millimeters wide
gg_arm_hist <-
gg_arm +
geom_histogram(
color = "black",
binwidth = 10,
fill = "seagreen"
)
gg_arm_hist
## Warning: Removed 299 rows containing non-finite outside the scale range
## (`stat_bin()`).
Create and save a density plot as gg_arm_den with the region under the line shaded “seagreen” that is partly transparent
gg_arm_den <-
gg_arm +
geom_density(
fill = "seagreen",
alpha = 0.75
)
gg_arm_den
## Warning: Removed 299 rows containing non-finite outside the scale range
## (`stat_density()`).
Using the graphs created in part 2A, describe any important features of the arm variable.
How do you expect the mean and median to compare to each other?
Calculate the mean and median of arm.
mean(x = bones$arm, na.rm = T)
## [1] 537.4375
median(x = bones$arm, na.rm = T)
## [1] 537.5
Calculate the 5 number summary for iliac. Are there any unusually narrow or wide iliacs?
fivenum(x = bones$iliac, na.rm = T)
## [1] 184 251 263 274 324
Create a boxplot for iliac using ggplot so it appears as the plot in blackboard. The color of the box can be approximate. Are there any unusually narrow or wide iliacs? Explain your answer!
ggplot(
data = bones,
mapping = aes(x = iliac)
) +
geom_boxplot(fill = "orchid") +
theme_light() +
labs(x = "Width of Iliac (mm)",
title = "Iliac Width Boxplot") +
# Add the following line of code to your boxplot to remove the labels on the y-axis:
scale_y_continuous(breaks = NULL)
## Warning: Removed 66 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Create boxplots as they appear in the attached pdf comparing iliac width between males and females. Describe any important differences between the two sexes
ggplot(
data = bones,
mapping = aes(
y = iliac,
x = sex,
fill = sex
)
) +
geom_boxplot(show.legend = F) +
labs(
y = "Width of Iliac",
x = NULL
) +
theme_light()
## Warning: Removed 66 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Create vertical boxplots for iliac with age ranges on the x-axis and color of the boxes representing sex. Describe any associations you notice in the boxplots
ggplot(
data = bones,
mapping = aes(
y = iliac,
fill = sex,
x = age
)
) +
geom_boxplot() +
labs(
y = "Width of Iliac",
x = NULL,
fill = NULL
) +
theme_bw()
## Warning: Removed 66 rows containing non-finite outside the scale range
## (`stat_boxplot()`).