This report conducts an exploration into how whether the variables height, weight and levels physical activity of men and women aged 26-45 are dependent on gender.Firstly, a linear regression model was utilised in order to investigate whether a linear relationship exists between height and weight. Secondly, a t-test was employed to delve into whether gender impacted height, in order to do this the mean height of males and females were compared. Lastly, a chi-squared test is utilised to delve into whether there is any association between gender and the amount of physical activity. By examining these three research questions the impact of various variables such as height, weight and gender on physical activity will be revealed.
The data set utilised is found in “A4.csv” which contains 1000 observations of men and women aged 26-45. The data set has been made available in a csv format which requires the below code to access. The information within the data set is categorized in 5 variables which are as follows: ID, Gender, Height, Weight and Physical Activity. These variables are extracted from the csv file and saved as variables with appropriate names through the code below. The variables Height and Weight are present as an array of numerical values while ID, Gender and Physical Activity are string variables that have been assigned numerical values. Lastly, the kable package is utilised in order to incorporate a table.
file_name <- "A4.csv"
file <- read.csv(file_name)
#Setting up the variables from the csv file (assuming that there are going to be 5 columns in the data set)
id <- file[,1]
name_id <- names(file[1])
gender <- file[,2]
name_gender <- names(file[2])
height <- file[,3]
name_height <- names(file[3])
weight <- file[,4]
name_weight <- names(file[4])
phys <- file[,5]
name_phys <- names(file[5])
From the above code we have accessed the data set, extracted the relevant variables and stored them with appropriate names.
The analysis of relationship between height and weight is investigated with the aid of linear regression. After exploring the data set, it was evident that if any relationship between height and weight existed it would be linear, hence, a linear regression was chosen. To conduct a linear regression on variables Height and Weight, firstly, a null hypothesis has established that will be examining the slope of the regression. After which a t-test was utilised to extrapolate the p-value produced in the linear regression. This method has been implemented in the below R code and designed so that the hypothesis, decision and conclusion can be called upon separately and is automatically outputted.
# Stating the Null Hypothesis and Alternative Hypothesis
hypothesis1 <- paste("For the data set ", file_name, " the null hypothesis is the slope is 0 while the alternative hypothesis states the slope different from 0", sep= "")
#Fitting the variables of height and weight to a linear regression
fit <- lm(height ~ weight)
# Using t-test to get results
t_test <- t.test(height,weight, var.equal = TRUE)
#Creating an if and else statement taht extracts p-value and accordingly outputs result and conclusion
if (t_test$p.value>=0.05) {
decision1 <- paste("Do not reject null hypothesis as we have a p-value of ", round(t_test$p.value,4), sep= "")
conclusion1 <- paste("At a 5% significance level there is no evidence that slope is different than 0. This signifies that there is no significant linear relationship between ", name_height, " and ", name_weight, ".", sep="")
} else {
decision1 <- paste("Reject null hypothesis as we have a p-value of ", round(t_test$p.value,4), sep= "")
conclusion1 <- paste("At a 5% significance level there is evidence that slope is different than 0. This signifies that there is a significant linear relationship between ", name_height, " and ", name_weight, ".", sep="")
}
In the investigation between the relationship between Height and Weight the following null hypothesis is being used.
print(hypothesis1)
## [1] "For the data set A4.csv the null hypothesis is the slope is 0 while the alternative hypothesis states the slope different from 0"
After running the regression below is the automatically produced result for whether we accept or reject the null hypothesis. In addition a graph is included so any linear relationship can be visually be depicted.
print(decision1)
## [1] "Reject null hypothesis as we have a p-value of 0"
Relationship between Height and Weight
Lastly, is the conclusion on whether there exists a linear relationship between the variables between Height and Weight
print(conclusion1)
## [1] "At a 5% significance level there is evidence that slope is different than 0. This signifies that there is a significant linear relationship between height and weight."
As a liner relationship has been established between variables Height and Weight, we will now be investigating whether there is a relationship between male and female heights. For ease of analysis only the means are being considered. To analyze the mean height for males and females, firstly, from the Height variable two separate variables called maleHeight and femaleHeight are created that contain male and female heights respectively. After which a t-test was utilised to analyze whether the mean height for males and females is the same. The t-test will reveal a p-value has been extracted which will showcase whether the mean heights between the genders differ or not.This method has been implemented in the below R code and designed so that the hypothesis, decision and conclusion can be called upon separately and is automatically outputted.
maleHeight <-(file$height[file$gender=="Male"])
femaleHeight <-(file$height[file$gender=="Female"])
t_test2 <- t.test(femaleHeight,maleHeight, var.equal = TRUE)
# Stating the Null Hypothesis and Alternative Hypothesis
hypothesis2 <- paste("For the data set ", file_name, " the null hypothesis is that difference between means is zero while the alternative hypothesis is states that the difference between means is not equal to zero", sep= "")
#Creating an if and else statement taht extracts p-value and accordingly outputs result and conclusion
if (t_test2$p.value>=0.05) {
decision2 <- paste("Do not reject null hypothesis as we have a p-value of ", round(t_test2$p.value,4), sep= "")
conclusion2 <- paste("At a 5% significance level there is no evidence that the mean is different than 0. This signifies that the mean height of male and female is the same")
} else {
decision2 <- paste("Reject null hypothesis as we have a p-value of ", round(t_test2$p.value,4), sep= "")
conclusion2 <- paste("At a 5% significance level there is evidence that the mean is different than 0. This signifies that the mean height of male and female are not the same")
}
In the investigation between the variables maleHeight and femaleHeight to explore whether the mean height between the genders is the same, the following null hypothesis has been used
print(hypothesis2)
## [1] "For the data set A4.csv the null hypothesis is that difference between means is zero while the alternative hypothesis is states that the difference between means is not equal to zero"
After running the t-test below is the automatically produced result for whether we accept or reject the null hypothesis.
print(decision2)
## [1] "Reject null hypothesis as we have a p-value of 0"
Lastly, is the conclusion on whether the mean height for both genders varies or not
print(conclusion2)
## [1] "At a 5% significance level there is evidence that the mean is different than 0. This signifies that the mean height of male and female are not the same"
By combining our first two findings we can determine that height and weight are dependent on gender, so now we are going to be delving into finding if there exists any association between gender and amount of physical activity. As both variables being explored, Gender and Physical Activity are categorical variables the method utilized to investigate get them differs significantly from the methods implemented above. Firstly, a table was create for ease of analysis which can be seen below:
data = table(file$phys, file$gender)
knitr::kable(data, caption = "Number of males and females participating in each level of physical activity")
| Female | Male | |
|---|---|---|
| Intense | 130 | 133 |
| Moderate | 268 | 226 |
| None | 109 | 134 |
After extracting the above table from the data set the data was fed into a chi-squared analysis in order to analysis whether there is a link between genders and levels of physical activity. A chi-squared test has been chosen for the analysis as it will allow us to determine whether Physical Activity and Gender are dependent or independent variables. The method discussed hs been implemented in the R code below.
chi_test <- chisq.test(data)
# Stating the Null Hypothesis and Alternative Hypothesis
hypothesis3 <- paste("For the dataset ", file_name, " the null hypothesis is that gender and level of physical acitvity are independent of each other while the alternative hypothesis is that the levels of physical acvitiy are dependent on gender", sep= "")
#Creating an if and else statement taht extracts p-value and accordingly outputs result and conclusion
if (chi_test$p.value>=0.05) {
decision3 <- paste("Do not reject NULL hypothesis as we have a p-value of ", round(chi_test$p.value,4), sep= "")
conclusion3 <- paste("At a 5% significance level there is evidence that gender and levels of physical activity are independent of each other. Meaning there exists no association between the gender and level of physical activity")
} else {
decision3 <- paste("Reject NULL hypothesis as we have a p-value of ", round(chi_test$p.value,4), sep= "")
conclusion3 <- paste("At a 5% significance level there is evidence that gender and levels of physical activity are dependent on each other. Meaning that there is an association between gender and level of physical activity")
}
In the investigation into any association between the variables Genders and Physical Activity the following null hypothesis has been used
print(hypothesis3)
## [1] "For the dataset A4.csv the null hypothesis is that gender and level of physical acitvity are independent of each other while the alternative hypothesis is that the levels of physical acvitiy are dependent on gender"
After running the chi-squared test below is the automatically produced result for whether we accept or reject the null hypothesis.
print(decision3)
## [1] "Do not reject NULL hypothesis as we have a p-value of 0.0502"
Lastly, is the conclusion on whether there is an association between genders and levels of physical activity
print(conclusion3)
## [1] "At a 5% significance level there is evidence that gender and levels of physical activity are independent of each other. Meaning there exists no association between the gender and level of physical activity"
This report conducted an exploration into how various variables such as height, weight and gender effect physical activity levels of men and women aged 26-45. Firstly, it was revealed that there was indeed a linear relationship between height and weight. Secondly, through a t-test we were able to determine that mean height for males and females differs. By combining our first two finding we were able to determine that height and weight are dependent on gender.Lastly, a chi-squared test was used to draw the conclusion that physical activity is also dependent on gender. Thus, we can say that height, weight and level of physical activity are dependent on gender.