The Goal
This week didn’t leave much time for R unfortunately so my goal this week was simply to familiarise myself with statistics in R.
Progress
As age is often correlated with changes in personality dimensions, I was wondering whether this was shown within the current data and whether it had any effect on perceived risk.
Initially, I’d like to observe these correlations using the corrplot package.
library(tidyverse) # for ggplot and dplyr
library(corrplot) # for correlation plot
library(rstatix) # for statistical analysis
# extracting data set
covid <- "Covid_Data.csv" %>%
read_csv()
# selecting relevant variables
aiya <- covid %>%
select(Openness, Conscientiousness, Extraversion, Agreeableness,
Em_Stability, risk_self, risk_pop, risk_comp, age) %>%
cor()
corrplot(aiya, type = "lower", addCoef.col = "black", method = "color", diag = FALSE)Some quite substantial correlations between age and conscientiousness, agreeableness and emotional stability. Also, interesting to note that many personality dimensions correlate with each other, with a quite high correlation between emotional stability and agreeableness.
For further clarification, let’s run a linear regression between age and personality dimensions.
# data set
ach <- covid %>%
select(Openness, Conscientiousness, Extraversion, Agreeableness,
Em_Stability, risk_self, risk_pop, risk_comp, age)
# linear regression of main effect
reg1 <- lm(age ~ Openness + Conscientiousness +
Extraversion + Agreeableness + Em_Stability, ach)
summary(reg1)##
## Call:
## lm(formula = age ~ Openness + Conscientiousness + Extraversion +
## Agreeableness + Em_Stability, data = ach)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.854 -12.917 0.913 13.256 37.664
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.8594 3.0337 5.557 3.57e-08 ***
## Openness -0.3666 0.4493 -0.816 0.414737
## Conscientiousness 1.5265 0.4281 3.566 0.000381 ***
## Extraversion 0.5389 0.3370 1.599 0.110093
## Agreeableness 2.7227 0.4575 5.952 3.74e-09 ***
## Em_Stability 1.3624 0.3882 3.510 0.000470 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.73 on 939 degrees of freedom
## Multiple R-squared: 0.1268, Adjusted R-squared: 0.1222
## F-statistic: 27.28 on 5 and 939 DF, p-value: < 2.2e-16
That is about as nice as it gets I suppose
It appears there is a significant main effect between agreeableness, emotional stability and conscientiousness on age (p < 0.001) and a significant regression equation was found (F(5, 939) = 27.28, p < 0.001).
# new data set with average risk variable
new <- covid %>%
select(Openness, Conscientiousness, Extraversion, Agreeableness,
Em_Stability, risk_self, risk_pop, risk_comp, age) %>%
mutate(avg_risk = (risk_self + risk_comp + risk_pop)/3) %>%
select(-starts_with("risk"))
# linear regression of main effect
abba <- lm(avg_risk ~ Openness + Conscientiousness +
Extraversion + Agreeableness + Em_Stability, new)
summary(abba)##
## Call:
## lm(formula = avg_risk ~ Openness + Conscientiousness + Extraversion +
## Agreeableness + Em_Stability, data = new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.57099 -0.62359 0.02178 0.56369 2.59794
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.61375 0.17092 15.292 < 2e-16 ***
## Openness 0.02187 0.02531 0.864 0.388
## Conscientiousness 0.03112 0.02412 1.290 0.197
## Extraversion -0.01380 0.01898 -0.727 0.468
## Agreeableness 0.03418 0.02577 1.326 0.185
## Em_Stability -0.08872 0.02187 -4.056 5.4e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8862 on 939 degrees of freedom
## Multiple R-squared: 0.01863, Adjusted R-squared: 0.01341
## F-statistic: 3.565 on 5 and 939 DF, p-value: 0.003364
There is a main effect between emotional stability and average perceived risk (p > 0.001) with a relatively significant regression equation found (F(5, 939) = 3.57, p < 0.01)
To visualise this, I thought it would be appropriate to add a quick plot.
why <- covid %>%
select(Openness, Conscientiousness, Extraversion, Agreeableness,
Em_Stability, risk_self, risk_pop, risk_comp) %>%
mutate(avg_risk = (risk_self + risk_comp + risk_pop)/3) %>%
pivot_longer(
cols = Openness:Em_Stability,
names_to = "Dimension",
values_to = "mean"
) %>%
mutate(Dimension2 = ifelse(Dimension == "Em_Stability",
"Emotional Stability", Dimension)) %>%
select(Dimension2, mean, avg_risk)
ggplot(why, aes(x = mean, y = avg_risk, colour = Dimension2)) +
theme_minimal() +
geom_smooth(method = "lm", se = FALSE) +
theme(panel.background = element_rect(colour = "black", size = 1/40),
axis.title.x = element_text(family ="Arial Narrow",
size = 14, face = "bold"),
axis.title.y = element_text(family = "Arial Narrow",
size = 14, face = "bold")) +
xlab("Score") +
ylab("Average Perceived Risk")I guess the effect is a bit more obvious now
Challenges/Successes
I successfully managed to conduct some linear regressions and make a correlation plot using the packages corrplot and rstatix. However, I am quite unsure as to whether or not I completely butchered the reporting of the test results but time will tell.
Next steps
I intend to continue learning statistics in R and start making a huge dent into my verification report.