See the blackboard post for a description of the data.

If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document

The final document should not show any warnings

Question 1: Skimming the data set

Skim the data set.

skim(bones)
Data summary
Name bones
Number of rows 1531
Number of columns 7
_______________________
Column type frequency:
character 2
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
sex 0 1 4 6 0 2 0
age 0 1 3 5 0 4 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
humerus 161 0.89 303.88 22.97 229.5 287.5 303.5 319 376.0 ▁▅▇▅▁
radius 217 0.86 233.14 18.98 179.0 219.0 233.0 247 290.5 ▁▅▇▅▁
femur 117 0.92 427.28 31.46 345.0 405.0 428.0 449 531.0 ▂▆▇▃▁
tibia 135 0.91 353.21 28.06 276.0 333.0 353.0 372 446.0 ▁▆▇▃▁
iliac 66 0.96 262.55 18.05 184.0 251.0 263.0 274 324.0 ▁▂▇▆▁

Answer the following questions:

Question 2: Arm Length

Start by creating a new column in bones called arm, the total length of the humerus and radius bones, then display the first 6 rows of the bones data set

bones$arm <- bones$humerus + bones$radius

head(bones)
##      sex   age humerus radius femur tibia iliac   arm
## 1   Male 20-29   289.0  223.5 398.0   312 240.5 512.5
## 2 Female 20-29      NA  197.5 376.0   288 246.5    NA
## 3   Male 20-29   305.5  228.5 421.0   337 295.0 534.0
## 4   Male 20-29   287.0  221.0 407.5   320 279.0 508.0
## 5   Male 20-29   352.5  259.0 513.5   377 287.0 611.5
## 6   Male 20-29   294.5  228.0 403.0   318 269.0 522.5

What happens in R when you add a missing value to a non-missing value?

The total will also be missing (NA)

Part 2A: Graphs for Arm

2A i) Blank graph

Start by creating a blank graph saved as gg_arm with:

  • arm on the x-axis

  • A white background with grey grid lines

  • The x-axis labelled as “Arm Length (mm)”

gg_arm <- 
  ggplot(
    data = bones,
    mapping = aes(x = arm)
  ) + 
  
  theme_bw() +
  
  labs(x = "Arm Length (mm)") + 
  # Include this line in your blank graph by removing the comment
  scale_y_continuous(expand = c(0, 0, 0.05, 0))
  
gg_arm

Part 2A ii) Histogram

Create and save a histogram named gg_arm_hist with

  • Bars colored with “seagreen”

  • A black outline for each bar

  • Each bin 10 millimeters wide

gg_arm_hist <- 
  gg_arm +
  geom_histogram(
    color = "black",
    binwidth = 10,
    fill = "seagreen"
  )

gg_arm_hist

Part 2A iii) Density Plot

Create and save a density plot as gg_arm_den with the region under the line shaded “seagreen” that is partly transparent

gg_arm_den <- 
  gg_arm +
  geom_density(
    fill = "seagreen",
    alpha = 0.75
  )

gg_arm_den

Part 2B: Arm Shape

  • Using the graphs created in part 2A, describe any important features of the arm variable. Arm length has 1 peak (unimodal) and is about symmetric

  • How do you expect the mean and median to compare to each other? Since it is about symmetric, the mean and median of arm length should be similar

Part 2C: Measures of Center

Calculate the mean and median of arm.

# Calculating the mean of arm
c(
  "mean" = mean(x = bones$arm,
                na.rm = T),
  
# Calculating the median
"median" = median(x = bones$arm,
                  na.rm = T)
)
##     mean   median 
## 537.4375 537.5000
  • Do they meet your expectations from your answers in part 2B?

Yes, the mean and median are about the same

Question 3: Iliac Width

Part 3A: Five Number Summary

Calculate the 5 number summary for iliac.

quantile(
  x = bones$iliac,
  probs = c(0, 0.25, 0.5, 0.75, 1),
  na.rm = T
)
##   0%  25%  50%  75% 100% 
##  184  251  263  274  324

Part 3B: Boxplot for Iliac

Create a boxplot for iliac using ggplot so it appears as the plot in Brigthspace. The color of the box can be approximate.

ggplot(data = bones,
       mapping = aes(x = iliac)) + 
  
  geom_boxplot(fill = "orchid") + 
  
  theme_light() + 
  
  labs(x = "Width of Iliac (mm)",
       title = "Iliac Width Boxplot") +

# Add the following line of code to your boxplot to remove the labels on the y-axis:
  scale_y_continuous(breaks = NULL)

Are there any unusually narrow or wide iliacs? Explain your answer!

Yes, there are dots in the boxplot, which indicates that there are outliers in the data (both unusually narrow and unusually wide)

Part 3C: Iliac by Age

Create boxplots as they appear in Brightspace comparing iliac width between the age ranges.

ggplot(
  data = bones,
  mapping = aes(
    y = iliac,
    x = age
  )
) + 
  
  geom_boxplot(
    fill = "steelblue"
  ) + 
  
  labs(
    y = "Width of Iliac",
       x = NULL
    ) + 
  
  theme_light()

Part 3D: Iliac by sex and age

Create horizontal boxplots for iliac with age ranges on the y-axis and color in the boxes representing sex. Describe any associations you notice in the boxplots

ggplot(
  data = bones,
  mapping = aes(
    y = iliac,
    fill = sex,
    x = age
  )
) + 
  
  geom_boxplot() + 
  
  labs(
    y = "Width of Iliac",
    x = NULL,
    fill = NULL
  ) + 
  
  theme_bw() + 
  coord_flip()