Learning Log 9

Joshua Pham

2021-04-19

The Goal

This week didn’t leave much time for R unfortunately so my goal this week was simply to familiarise myself with statistics in R.

Progress

As age is often correlated with changes in personality dimensions, I was wondering whether this was shown within the current data and whether it had any effect on perceived risk.

Initially, I’d like to observe these correlations using the corrplot package.

library(tidyverse) # for ggplot and dplyr
library(corrplot) # for correlation plot
library(rstatix) # for statistical analysis

# extracting data set

covid <- "Covid_Data.csv" %>% 
  read_csv()

# selecting relevant variables

aiya <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp, age) %>% 
  cor()

corrplot(aiya, type = "lower", addCoef.col = "black", method = "color", diag = FALSE)

Some quite substantial correlations between age and conscientiousness, agreeableness and emotional stability. Also, interesting to note that many personality dimensions correlate with each other, with a quite high correlation between emotional stability and agreeableness.

For further clarification, let’s run a linear regression between age and personality dimensions.

# data set

ach <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp, age)

# linear regression of main effect 

reg1 <- lm(age ~ Openness + Conscientiousness + 
             Extraversion + Agreeableness + Em_Stability, ach)

summary(reg1)
## 
## Call:
## lm(formula = age ~ Openness + Conscientiousness + Extraversion + 
##     Agreeableness + Em_Stability, data = ach)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.854 -12.917   0.913  13.256  37.664 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        16.8594     3.0337   5.557 3.57e-08 ***
## Openness           -0.3666     0.4493  -0.816 0.414737    
## Conscientiousness   1.5265     0.4281   3.566 0.000381 ***
## Extraversion        0.5389     0.3370   1.599 0.110093    
## Agreeableness       2.7227     0.4575   5.952 3.74e-09 ***
## Em_Stability        1.3624     0.3882   3.510 0.000470 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.73 on 939 degrees of freedom
## Multiple R-squared:  0.1268, Adjusted R-squared:  0.1222 
## F-statistic: 27.28 on 5 and 939 DF,  p-value: < 2.2e-16

That is about as nice as it gets I suppose

It appears there is a significant main effect between agreeableness, emotional stability and conscientiousness on age (p < 0.001) and a significant regression equation was found (F(5, 939) = 27.28, p < 0.001).

# new data set with average risk variable

new <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp, age) %>% 
  mutate(avg_risk = (risk_self + risk_comp + risk_pop)/3) %>% 
  select(-starts_with("risk"))

# linear regression of main effect 

abba <- lm(avg_risk ~ Openness + Conscientiousness + 
             Extraversion + Agreeableness + Em_Stability, new)

summary(abba)
## 
## Call:
## lm(formula = avg_risk ~ Openness + Conscientiousness + Extraversion + 
##     Agreeableness + Em_Stability, data = new)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.57099 -0.62359  0.02178  0.56369  2.59794 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.61375    0.17092  15.292  < 2e-16 ***
## Openness           0.02187    0.02531   0.864    0.388    
## Conscientiousness  0.03112    0.02412   1.290    0.197    
## Extraversion      -0.01380    0.01898  -0.727    0.468    
## Agreeableness      0.03418    0.02577   1.326    0.185    
## Em_Stability      -0.08872    0.02187  -4.056  5.4e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8862 on 939 degrees of freedom
## Multiple R-squared:  0.01863,    Adjusted R-squared:  0.01341 
## F-statistic: 3.565 on 5 and 939 DF,  p-value: 0.003364

There is a main effect between emotional stability and average perceived risk (p > 0.001) with a relatively significant regression equation found (F(5, 939) = 3.57, p < 0.01)

To visualise this, I thought it would be appropriate to add a quick plot.

why <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp) %>% 
  mutate(avg_risk = (risk_self + risk_comp + risk_pop)/3) %>% 
  pivot_longer(
    cols = Openness:Em_Stability,
    names_to = "Dimension",
    values_to = "mean"
  ) %>% 
  mutate(Dimension2 = ifelse(Dimension == "Em_Stability", 
                             "Emotional Stability", Dimension)) %>% 
  select(Dimension2, mean, avg_risk)
  
ggplot(why, aes(x = mean, y = avg_risk, colour = Dimension2)) +
  theme_minimal() +
  geom_smooth(method = "lm", se = FALSE) +
  theme(panel.background = element_rect(colour = "black", size = 1/40),
        axis.title.x = element_text(family ="Arial Narrow", 
                                    size = 14, face = "bold"),
        axis.title.y = element_text(family = "Arial Narrow", 
                                    size = 14, face = "bold")) +
  xlab("Score") +
  ylab("Average Perceived Risk")

I guess the effect is a bit more obvious now

Challenges/Successes

I successfully managed to conduct some linear regressions and make a correlation plot using the packages corrplot and rstatix. However, I am quite unsure as to whether or not I completely butchered the reporting of the test results but time will tell.

Next steps

I intend to continue learning statistics in R and start making a huge dent into my verification report.