Introduction

This report conducts an exploration into how whether the variables height, weight and levels physical activity of men and women aged 26-45 are dependent on gender.Firstly, a linear regression model was utilised in order to investigate whether a linear relationship exists between height and weight. Secondly, a t-test was employed to delve into whether gender impacted height, in order to do this the mean height of males and females were compared. Lastly, a chi-squared test is utilised to delve into whether there is any association between gender and the amount of physical activity. By examining these three research questions the impact of various variables such as height, weight and gender on physical activity will be revealed.

Data

The data set utilised is found in “A4.csv” which contains 1000 observations of men and women aged 26-45. The data set has been made available in a csv format which requires the below code to access. The information within the data set is categorized in 5 variables which are as follows: ID, Gender, Height, Weight and Physical Activity. These variables are extracted from the csv file and saved as variables with appropriate names through the code below. The variables Height and Weight are present as an array of numerical values while ID, Gender and Physical Activity are string variables that have been assigned numerical values. Lastly, the kable package is utilised in order to incorporate a table.

Reading the file and saving variables:

file_name <- "A4.csv" 
file <- read.csv(file_name)


#Setting up the variables from the csv file (assuming that there are going to be 5 columns in the data set)
id <- file[,1]
name_id <- names(file[1])
gender <- file[,2]
name_gender <- names(file[2])
height <- file[,3]
name_height <- names(file[3])
weight <- file[,4]
name_weight <- names(file[4])
phys <- file[,5]
name_phys <- names(file[5])

From the above code we have accessed the data set, extracted the relevant variables and stored them with appropriate names.

Relationship Between Height and Weight

The analysis of relationship between height and weight is investigated with the aid of linear regression. After exploring the data set, it was evident that if any relationship between height and weight existed it would be linear, hence, a linear regression was chosen. To conduct a linear regression on variables Height and Weight, firstly, a null hypothesis has established that will be examining the slope of the regression. After which a t-test was utilised to extrapolate the p-value produced in the linear regression. This method has been implemented in the below R code and designed so that the hypothesis, decision and conclusion can be called upon separately and is automatically outputted.

Conducting a Linear Regression on Height And Weight:

# Stating the Null Hypothesis and Alternative Hypothesis
hypothesis1 <- paste("For the data set ", file_name, " the null hypothesis is the slope is 0 while the alternative hypothesis states the slope different from 0", sep= "")

#Fitting the variables of height and weight to a linear regression 
fit <- lm(height ~ weight) 

# Using t-test to get results
t_test <- t.test(height,weight, var.equal = TRUE)

#Creating an if and else statement taht extracts p-value and accordingly outputs result and conclusion
if (t_test$p.value>=0.05) {
  decision1 <- paste("Do not reject null hypothesis as we have a p-value of ", round(t_test$p.value,4), sep= "")
  conclusion1 <- paste("At a 5% significance level there is no evidence that slope is different than 0. This signifies that there is no significant linear relationship between ", name_height, " and ", name_weight, ".", sep="")
} else {
  decision1 <- paste("Reject null hypothesis as we have a p-value of ", round(t_test$p.value,4), sep= "")
  conclusion1 <- paste("At a 5% significance level there is evidence that slope is different than 0. This signifies that there is a significant linear relationship between ", name_height, " and ", name_weight, ".", sep="")
}

In the investigation between the relationship between Height and Weight the following null hypothesis is being used.

print(hypothesis1)
## [1] "For the data set A4.csv the null hypothesis is the slope is 0 while the alternative hypothesis states the slope different from 0"

Results:

After running the regression below is the automatically produced result for whether we accept or reject the null hypothesis. In addition a graph is included so any linear relationship can be visually be depicted.

print(decision1)
## [1] "Reject null hypothesis as we have a p-value of 0"
Relationship between Height and Weight

Relationship between Height and Weight

Conclusion:

Lastly, is the conclusion on whether there exists a linear relationship between the variables between Height and Weight

print(conclusion1)
## [1] "At a 5% significance level there is evidence that slope is different than 0. This signifies that there is a significant linear relationship between height and weight."

Exploring Mean Heights of Males and Females

As a liner relationship has been established between variables Height and Weight, we will now be investigating whether there is a relationship between male and female heights. For ease of analysis only the means are being considered. To analyze the mean height for males and females, firstly, from the Height variable two separate variables called maleHeight and femaleHeight are created that contain male and female heights respectively. After which a t-test was utilised to analyze whether the mean height for males and females is the same. The t-test will reveal a p-value has been extracted which will showcase whether the mean heights between the genders differ or not.This method has been implemented in the below R code and designed so that the hypothesis, decision and conclusion can be called upon separately and is automatically outputted.

Method:

maleHeight <-(file$height[file$gender=="Male"])
femaleHeight <-(file$height[file$gender=="Female"])
t_test2 <- t.test(femaleHeight,maleHeight, var.equal = TRUE)

# Stating the Null Hypothesis and Alternative Hypothesis
hypothesis2 <- paste("For the data set ", file_name, " the null hypothesis is that difference between means is zero while the alternative hypothesis is states that the difference between means is not equal to zero", sep= "")

#Creating an if and else statement taht extracts p-value and accordingly outputs result and conclusion
if (t_test2$p.value>=0.05) {
  decision2 <- paste("Do not reject null hypothesis as we have a p-value of ", round(t_test2$p.value,4), sep= "")
  conclusion2 <- paste("At a 5% significance level there is no evidence that the mean is different than 0. This signifies that the mean height of male and female is the same")
} else {
  decision2 <- paste("Reject null hypothesis as we have a p-value of ", round(t_test2$p.value,4), sep= "")
  conclusion2 <- paste("At a 5% significance level there is evidence that the mean is different than 0. This signifies that the mean height of male and female are not the same")
}

In the investigation between the variables maleHeight and femaleHeight to explore whether the mean height between the genders is the same, the following null hypothesis has been used

print(hypothesis2)
## [1] "For the data set A4.csv the null hypothesis is that difference between means is zero while the alternative hypothesis is states that the difference between means is not equal to zero"

Results:

After running the t-test below is the automatically produced result for whether we accept or reject the null hypothesis.

print(decision2)
## [1] "Reject null hypothesis as we have a p-value of 0"

Conclusion:

Lastly, is the conclusion on whether the mean height for both genders varies or not

print(conclusion2)
## [1] "At a 5% significance level there is evidence that the mean is different than 0. This signifies that the mean height of male and female are not the same"

Concluding Remarks

This report conducted an exploration into how various variables such as height, weight and gender effect physical activity levels of men and women aged 26-45. Firstly, it was revealed that there was indeed a linear relationship between height and weight. Secondly, through a t-test we were able to determine that mean height for males and females differs. By combining our first two finding we were able to determine that height and weight are dependent on gender.Lastly, a chi-squared test was used to draw the conclusion that physical activity is also dependent on gender. Thus, we can say that height, weight and level of physical activity are dependent on gender.