{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)

Instructions to candidates.

Please type your SID here: 2213352

Below are a series of questions you need to answer based on the dataset you have been provided within Canvas.

Answer all of the questions. and then knit this to a .docx file. Canvas will not accept any other file type. If the docx file is created outside of R, you will be deducted 6 marks (1 for each question).

Issues with ‘knitting’ the document: If a code block fails to run successfully, the document will not knit. In this scenario, it is better to comment out the problematic code block using the # key at the start of each line. R will tell you at which point code fails. The document should then knit successfully.

1 of the questions explicitly asks you for your GPT prompt and the resulting code. The rest should be attempted as far as possible using your notes and the help sheets provided on Canvas.

Questions 1. Data Extraction Tasks. (3 points)

a. Load the dataset provided by you in the Canvas quiz into R and display the first six rows. (1 point)

b. Extract all rows with the eye colour “Blue” and display the result. (1 point)

c. Create a new dataset that only includes females with Brown eyes. (1 point)

PUT YOUR CODE IN THE BLOCK BELOW AND RUN IT.

``{r Q1answers} head(R_workbook_2) blue_eyes <- subset(R_workbook_2,Eye Colour== "Blue") print(blue_eyes) females_brown <- subset(R_workbook_2, Sex == "F" &Eye Colour` == “Brown”) print(females_brown)


## Question 2. Data summary task. (2 points)

**Using the full dataset:** Calculate the following summary statistics
for height for each sex. (2 points).

a\. Mean

b\. Median

c\. Standard deviation.

PUT YOUR CODE IN THE BLOCK BELOW AND RUN IT.

```{r Q2Answers}
mean_male <- mean(subset(R_workbook_2, Sex == "M")$`Height/cm`, na.rm = TRUE)
mean_female <- mean(subset(R_workbook_2, Sex == "F")$`Height/cm`, na.rm = TRUE)
median_male <- median(subset(R_workbook_2, Sex == "M")$`Height/cm`, na.rm = TRUE)
median_female <- median(subset(R_workbook_2, Sex == "F")$`Height/cm`, na.rm = TRUE)
sd_male <- sd(subset(R_workbook_2, Sex == "M")$`Height/cm`, na.rm = TRUE)
sd_female <- sd(subset(R_workbook_2, Sex == "F")$`Height/cm`, na.rm = TRUE)

Question 3: Data Visualisation (5 points)

Using the full dataset:

a. Create a bar plot of the number of individuals for each eye colour. Use purple colour bars. (2 points)

b. Create a scatter plot of Height_cm versus Weight_kg with different colours for each sex. Use purple for males and green for females, with diamond shape symbols. (3 points)

PUT YOUR CODE IN THE BLOCK BELOW AND RUN IT.

```{r Q3answers} library(ggplot2)

Bar plot for the number of individuals for each eye color

ggplot(R_workbook_2, aes(x = Eye Colour)) + geom_bar(fill = “purple”) + labs(title = “Number of Individuals by Eye Colour”, x = “Eye Colour”, y = “Count”) + theme_minimal() # Scatter plot of Height vs. Weight ggplot(R_workbook_2, aes(x = Weight/kg, y = Height/cm, color = Sex, shape = Sex)) + geom_point(size = 4) + scale_color_manual(values = c(“M” = “purple”, “F” = “green”)) + scale_shape_manual(values = c(“M” = 18, “F” = 18)) + labs(title = “Height vs. Weight by Sex”, x = “Weight (kg)”, y = “Height (cm)”) + theme_minimal()


## Question 4. Investigating height differences. (6 points)

**Using the full dataset:**

a\. Test the difference in average height between males and females.
Assume the data are normally distributed and pick the most appropriate
test. (2 points)

PUT YOUR CODE IN THE BLOCK BELOW AND RUN IT.

```{r 4aanswer}
# Perform a t-test for height difference between males and females
t_test_result <- t.test(`Height/cm` ~ Sex, data = R_workbook_2, var.equal = TRUE)
print(t_test_result)

b. Report if there is a statistically significant difference in heights. (1 point)

Females show they are 15 centimetre shorter in height than men on average

ANSWER

c. Determine if there is a significant difference in average height among different eye colours. Test for normality of your data and use the most appropriate test. (2 points)

PUT YOUR CODE IN THE BLOCK BELOW AND RUN IT.

``{r 4canswer} shapiro_test <- shapiro.test(R_workbook_2$Height/cm) print(shapiro_test) # ANOVA test anova_result <- aov(Height/cm~Eye Colour`, data = R_workbook_2) summary(anova_result)

Kruskal-Wallis test (if data isn’t normally distributed)

kruskal_result <- kruskal.test(Height/cm ~ Eye Colour, data = R_workbook_2) print(kruskal_result)


d\. Is the difference between groups statistically significant? (1
point)

yes it is

**ANSWER**

## Question 5: BMI Calculation (4 points)

**Using the full dataset:**

a\. Convert all heights to m and create a new column named height_m (1
point)

**USING CHAT GPT** Construct an algorithm in R to calculate Body Mass
Index (BMI) for each individual in the dataset. (2 points)

BMI is calculated as:

$BMI=\frac{weight\,in\,kg}{(height\,in\,m)^2}$

**PUT YOUR GPT PROMPT HERE: i asked Chat gpt to give me r code based on
the R_workbook_2 file and asked it to** Convert all heights to m and
create a new column named height_m and to construct an algorithm in R to
calculate Body Mass Index (BMI) for each individual in the dataset.

b\. Add a new column to the dataset for BMI and display the updated
dataset. (1 point)

PUT YOUR CODE FOR 5a AND 5b IN THE BLOCK BELOW AND RUN IT.

```{r Q5answers}
# Add a new column for height in meters
R_workbook_2$height_m <- R_workbook_2$`Height/cm` / 100
# BMI calculation
R_workbook_2$BMI <- R_workbook_2$`Weight/kg` / (R_workbook_2$height_m^2)

Question 6: Regression and correlation (5 points)

Using the full dataset:

a. Regress height against weight. (1 point)

b. Plot this as a scattergraph with the following parameters:

  • Filled circles, size 4. Males coloured chocolate1 and Females coloured aquamarine

  • A navy line for the regression equation thickness 3.

  • The regression equation printed in the top left corner. (3 points)

c. Calculate the Correlation and the Product Moment Correlation Coefficient for your regression. (1 point)

PUT YOUR CODE IN THE BLOCK BELOW FOR Q6a,b,c AND RUN IT.

``{r Q6answers} # Linear regression height_weight_model <- lm(Height/cm~Weight/kg`, data = R_workbook_2) summary(height_weight_model)

```

d. Is the correlation between height and weight statistically significant? Why? (1 point) yes

ANSWER:

NOW KNIT YOUR DOCUMENT AND UPLOAD IT TO YOUR CANVAS QUIZ.