Research question:

In this report, we will be exploring the research question: Are there significant differences in age between Labour and Conservative voters? We will be running a t-test, a one-way ANOVA and a regression analysis a) to address the age differences between the two voting groups and b) to see if you get the same results across all of the three analyses.

The following chunk of code will set up the data directory and library packages for the analysis:

# na.strings turn the fullstops into na, header = true gives us column headers
# if you copy paste formulas note that the "" doesn't always copy correctly, ensure you type in
library(ggplot2)
library(datasets)

df_data <- read.csv(file = "Data_W1.csv", header = TRUE, na.strings=".")

Step 1: Cleaning the data:

First, we need to clean up the data and remove the participants who have chosen not to respond:

# use ! to create opposite of function, we are changing TRUE to FALSE and vice versa, so our vector now says TRUE if value is present, and FALSE if value is missing, # square brackets sub set the data

df_clean2 <- df_data[!is.na(df_data$Age) & df_data$Vote != "Refused",]

Step 2: Checking the variable types (classes) are defined appropriately.

Factor is categorical, we will use this for the IV in an ANOVA (in this case vote preference). Numeric is continuous, this will be used for the DV in an ANOVA (in this case age). Running the check it appears that Age is integer and pet is character, we can convert the variable type accordingly.

class(df_clean2$Age)
## [1] "integer"
class(df_clean2$Vote)
## [1] "character"
##convert variable class type nb - important to rename it so it saves
df_clean2$Age<-as.numeric(df_clean2$Age)
df_clean2$Vote<-as.factor(df_clean2$Vote)
df_clean2$pet <- as.factor (df_clean2$pet)

head(df_clean2)
##   ID Age         Vote pet
## 1  1  76 Conservative Yes
## 2  2  64 Conservative Yes
## 3  3  18 Conservative  No
## 5  5  33       Labour  No
## 6  6  38       Labour Yes
## 7  7  68       Labour Yes
#checked the data is classed correctly
class(df_clean2$Age)
## [1] "numeric"
class(df_clean2$Vote)
## [1] "factor"
class(df_clean2$pet)
## [1] "factor"

Step 3: Run the t-test analysis

t = 1.7724, p=0.8481 (not significant at 0.05) therefore the null hypotheses, that there is no relationship between age and voting is accepted.

ttest<-t.test(df_clean2$Age[df_clean2$Vote == "Conservative"], df_clean2$Age[df_clean2$Vote == "Labour"], paired
= FALSE)
print(ttest)
## 
##  Welch Two Sample t-test
## 
## data:  df_clean2$Age[df_clean2$Vote == "Conservative"] and df_clean2$Age[df_clean2$Vote == "Labour"]
## t = 1.7724, df = 35.922, p-value = 0.08481
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.557563 23.136510
## sample estimates:
## mean of x mean of y 
##  49.10526  38.31579

Step 4: Run One Way ANOVA:

(usually not run when we have 2 groups). F = 3.141, p = 0.0848.

test_Vote<-aov(Age~Vote,df_clean2)
summary(test_Vote)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Vote         1   1106  1105.9   3.141 0.0848 .
## Residuals   36  12674   352.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step 5: Check the F value is t squared? P value is the same as t-test.

Yes, confirmed it is the same. F = 3.14 T squared = 3.14.

ttest$statistic**2
##        t 
## 3.141351

Step 6: Regression analysis:

Shows F = 3.141 and p = 0.8479 (same as One Way Anova).

#first create effect coding variable to code the two groups 1 vs -1
df_clean2$Vote_c<-ifelse(df_clean2$Vote=="Conservative", 1, -1)

head (df_clean2)
##   ID Age         Vote pet Vote_c
## 1  1  76 Conservative Yes      1
## 2  2  64 Conservative Yes      1
## 3  3  18 Conservative  No      1
## 5  5  33       Labour  No     -1
## 6  6  38       Labour Yes     -1
## 7  7  68       Labour Yes     -1
#Next, we use this effect coding variable to run a regression analysis:
reg<-lm(Age~Vote_c,data=df_clean2)
summary(reg)
## 
## Call:
## lm(formula = Age ~ Vote_c, data = df_clean2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.105 -14.066  -3.316  13.934  39.684 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   43.711      3.044  14.361   <2e-16 ***
## Vote_c         5.395      3.044   1.772   0.0848 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.76 on 36 degrees of freedom
## Multiple R-squared:  0.08026,    Adjusted R-squared:  0.05471 
## F-statistic: 3.141 on 1 and 36 DF,  p-value: 0.08479