Question 1 (Preparation) [2 marks] (a) Which R functions you would use to produce the following plots of numeric data:

• Histogram = hist()

• Box plot = boxplot()

• Scatter plot = plot()

  1. Which R function can you use to calculate the sample standard deviation of a numeric variable?

sd()

Question 2 [2 marks] The following data are odontoblast1 measurements (in microns) for ten guinea pigs after having orange juice for 42 days. 22.5 26.4 22.4 24.5 24.8 30.9 26.4 27.3 29.4 23.0 Load the data in a variable (a) Calculate the sample mean, 𝑥̅, of the odontoblast measurements. Show working in R (b) Calculate the sample median, 𝑚, of the odontoblast measurements. Show working in R

guineaPigs = c(22.5, 26.4, 22.4, 24.5, 24.8, 30.9, 26.4, 27.3, 29.4, 23.0)
mean(guineaPigs)
## [1] 25.76
median(guineaPigs)
## [1] 25.6

Question 3[2 marks] The following is the R output from summarising the gross weekly income of New Zealanders who live in Wellington and are 15 years old or older. income Min. :-3551.0 1st Qu. : 254.5 Median : 574.0 Mean : 729.5 3rd Qu. : 1001.0 Max. :18369.0 (a) Calculate the range of the gross weekly income variable. (b) Calculate the interquartile range of the gross weekly income variable. Show your working.

Range = 18396.0 - -3551.0 = 21947

IQR = 1001 - 254.5 = 746.5

Question 4 [4 marks] We will again work with the penguin dataset here. Load the dataset in R. (a) Produce a histogram of the penguins’ body mass. Your histogram must include the appropriate axis label(s). (b) Use the histogram produced for Question 4(a) to describe any features of the penguins’ body mass. (c) Produce a box plot of the penguins’ body mass. Your box plot must include the appropriate axis label(s). (d) Use the box plot produced for Question 4(c) to describe any features of the penguins’ body mass.

data2 = read.csv("G:/Jeremy Andre - Uni/DATAX 121/Week 2/penguins.csv")
hist(data2$body_mass_g, main="Histogram of Penguin Body Mass Data", ylab = "Frequency", xlab = "Penguin Weight (g)")

#The graphs shows that the data is skewed towards the higher end with the majority focused between 3500-4000. 
boxplot(data2$body_mass_g, main="Boxplot of Penguin Body Mass Data", ylab = "Penguin Weight (g)")

#While the whole of the data has a range of 3600, the IQR is relatively condensed with 1200. This shows that while there are some outliers present, 50% of the data falls within a relatively small range.