See the blackboard post for a description of the data.
If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document
The final document should not show any warnings
Skim the data set.
skim(bones)
Name | bones |
Number of rows | 1531 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
sex | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
age | 0 | 1 | 3 | 5 | 0 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
humerus | 161 | 0.89 | 303.88 | 22.97 | 229.5 | 287.5 | 303.5 | 319 | 376.0 | ▁▅▇▅▁ |
radius | 217 | 0.86 | 233.14 | 18.98 | 179.0 | 219.0 | 233.0 | 247 | 290.5 | ▁▅▇▅▁ |
femur | 117 | 0.92 | 427.28 | 31.46 | 345.0 | 405.0 | 428.0 | 449 | 531.0 | ▂▆▇▃▁ |
tibia | 135 | 0.91 | 353.21 | 28.06 | 276.0 | 333.0 | 353.0 | 372 | 446.0 | ▁▆▇▃▁ |
iliac | 66 | 0.96 | 262.55 | 18.05 | 184.0 | 251.0 | 263.0 | 274 | 324.0 | ▁▂▇▆▁ |
Answer the following questions:
How categorical variables are in the data set?
How many numeric variables?
Which numeric column has the fewest missing values?
Which numeric column has the most missing values?
Start by creating a new column in bones called arm, the total length of the humerus and radius bones, then display the first 6 rows of the bones data set
bones$arm <- bones$humerus + bones$radius
head(bones)
## sex age humerus radius femur tibia iliac arm
## 1 Male 20-29 289.0 223.5 398.0 312 240.5 512.5
## 2 Female 20-29 NA 197.5 376.0 288 246.5 NA
## 3 Male 20-29 305.5 228.5 421.0 337 295.0 534.0
## 4 Male 20-29 287.0 221.0 407.5 320 279.0 508.0
## 5 Male 20-29 352.5 259.0 513.5 377 287.0 611.5
## 6 Male 20-29 294.5 228.0 403.0 318 269.0 522.5
What happens in R when you add a missing value to a non-missing value?
The total will also be missing (NA
)
Start by creating a blank graph saved as gg_arm with:
arm on the x-axis
A white background with grey grid lines
The x-axis labelled as “Arm Length (mm)”
gg_arm <-
ggplot(
data = bones,
mapping = aes(x = arm)
) +
theme_bw() +
labs(x = "Arm Length (mm)") +
# Include this line in your blank graph by removing the comment
scale_y_continuous(expand = c(0, 0, 0.05, 0))
gg_arm
Create and save a histogram named gg_arm_hist with
Bars colored with “seagreen”
A black outline for each bar
Each bin 10 millimeters wide
gg_arm_hist <-
gg_arm +
geom_histogram(
color = "black",
binwidth = 10,
fill = "seagreen"
)
gg_arm_hist
Create and save a density plot as gg_arm_den with the region under the line shaded “seagreen” that is partly transparent
gg_arm_den <-
gg_arm +
geom_density(
fill = "seagreen",
alpha = 0.75
)
gg_arm_den
Using the graphs created in part 2A, describe any important features of the arm variable. Arm length has 1 peak (unimodal) and is about symmetric
How do you expect the mean and median to compare to each other? Since it is about symmetric, the mean and median of arm length should be similar
Calculate the mean and median of arm.
# Calculating the mean of arm
c(
"mean" = mean(x = bones$arm,
na.rm = T),
# Calculating the median
"median" = median(x = bones$arm,
na.rm = T)
)
## mean median
## 537.4375 537.5000
Yes, the mean and median are about the same
Calculate the 5 number summary for iliac.
quantile(
x = bones$iliac,
probs = c(0, 0.25, 0.5, 0.75, 1),
na.rm = T
)
## 0% 25% 50% 75% 100%
## 184 251 263 274 324
Create a boxplot for iliac using ggplot so it appears as the plot in Brigthspace. The color of the box can be approximate.
ggplot(data = bones,
mapping = aes(x = iliac)) +
geom_boxplot(fill = "orchid") +
theme_light() +
labs(x = "Width of Iliac (mm)",
title = "Iliac Width Boxplot") +
# Add the following line of code to your boxplot to remove the labels on the y-axis:
scale_y_continuous(breaks = NULL)
Are there any unusually narrow or wide iliacs? Explain your answer!
Yes, there are dots in the boxplot, which indicates that there are outliers in the data (both unusually narrow and unusually wide)
Create boxplots as they appear in Brightspace comparing iliac width between the age ranges.
ggplot(
data = bones,
mapping = aes(
y = iliac,
x = age
)
) +
geom_boxplot(
fill = "steelblue"
) +
labs(
y = "Width of Iliac",
x = NULL
) +
theme_light()
Create horizontal boxplots for iliac with age ranges on the y-axis and color in the boxes representing sex. Describe any associations you notice in the boxplots
ggplot(
data = bones,
mapping = aes(
y = iliac,
fill = sex,
x = age
)
) +
geom_boxplot() +
labs(
y = "Width of Iliac",
x = NULL,
fill = NULL
) +
theme_bw() +
coord_flip()