Diana Piskareva and Daria Rukosueva did an equivalent part of the work, it is difficult to distinguish who was doing what exactly, the project was done together

Let’s choose 3 continuous and 1 categorical variables

continuous

happy - How happy are you. C1 Taking all things together, how happy would you say you are?

stfgov - How satisfied with the national government

trstlgl - Trust in the legal system B6-12a Using this card, please tell me on a score of 0-10 how much you personally trust each of the institutions I read out. 0 means you do not trust an institution at all, and 10 means you have complete trust. Firstly… …the legal system?

categorical

psppipla - Political system allows people to have influence on politics. And how much would you say that the political system in [country] allows people like you to have an influence on politics?

greece <- select(greece, c("stfgov", "happy", "trstlgl", "psppipla"))

skim(greece)
Data summary
Name greece
Number of rows 2799
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
stfgov 21 0.99 4.12 2.27 0 2 4 6 10 ▇▇▇▅▁
happy 5 1.00 6.58 1.54 0 6 7 8 10 ▁▁▅▇▁
trstlgl 13 1.00 6.43 2.26 0 5 7 8 10 ▂▃▆▇▅
psppipla 78 0.97 1.90 0.96 1 1 2 3 5 ▇▅▅▁▁
summary(greece) 
##      stfgov          happy          trstlgl         psppipla  
##  Min.   : 0.00   Min.   : 0.00   Min.   : 0.00   Min.   :1.0  
##  1st Qu.: 2.00   1st Qu.: 6.00   1st Qu.: 5.00   1st Qu.:1.0  
##  Median : 4.00   Median : 7.00   Median : 7.00   Median :2.0  
##  Mean   : 4.12   Mean   : 6.58   Mean   : 6.43   Mean   :1.9  
##  3rd Qu.: 6.00   3rd Qu.: 8.00   3rd Qu.: 8.00   3rd Qu.:3.0  
##  Max.   :10.00   Max.   :10.00   Max.   :10.00   Max.   :5.0  
##  NA's   :21      NA's   :5       NA's   :13      NA's   :78

using the summary function, we saw that the missing values are encoded correctly and are reflected as NA, so we can remove them from the dataset so that they do not distort the results.

greece <- greece[complete.cases(greece),]
summary(greece)
##      stfgov          happy          trstlgl         psppipla  
##  Min.   : 0.00   Min.   : 0.00   Min.   : 0.00   Min.   :1.0  
##  1st Qu.: 2.00   1st Qu.: 6.00   1st Qu.: 5.00   1st Qu.:1.0  
##  Median : 4.00   Median : 7.00   Median : 7.00   Median :2.0  
##  Mean   : 4.11   Mean   : 6.59   Mean   : 6.43   Mean   :1.9  
##  3rd Qu.: 6.00   3rd Qu.: 8.00   3rd Qu.: 8.00   3rd Qu.:3.0  
##  Max.   :10.00   Max.   :10.00   Max.   :10.00   Max.   :5.0

EDA

descriptive statistics

continuous variables

We get general information about variables using the describe function

greece %>% 
  dplyr::select(-4) %>% 
  describe() 
##         vars    n mean   sd median trimmed  mad min max range  skew kurtosis
## stfgov     1 2685 4.11 2.28      4    4.08 2.97   0  10    10  0.10    -0.75
## happy      2 2685 6.59 1.53      7    6.70 1.48   0  10    10 -0.73     0.96
## trstlgl    3 2685 6.43 2.26      7    6.65 1.48   0  10    10 -0.72    -0.13
##           se
## stfgov  0.04
## happy   0.03
## trstlgl 0.04
greece %>% 
  dplyr::select(-4) %>% 
  sjmisc::descr(show = c('n', "mean","sd", "md", "range")) %>% 
  rename("variable" = "var",
         "Number of obs." = "n",
         "Mean" = "mean",
         "SD" = "sd",
         "Median" = "md",
         "Range" = "range") 
## 
## ## Basic descriptive statistics
## 
##  variable Number of obs. Mean   SD Median     Range
##    stfgov           2685 4.11 2.28      4 10 (0-10)
##     happy           2685 6.59 1.53      7 10 (0-10)
##   trstlgl           2685 6.43 2.26      7 10 (0-10)
greece %>% 
  pivot_longer(c(stfgov, happy, trstlgl),
               names_to = 'Var', values_to = 'Score') %>% 
  ggplot(aes(y=Score)) + 
  geom_boxplot() +
  ggtitle("Distribution of scores") +
  xlab("Variable") + 
  ylab("Score") +
  theme_bw()+
  theme(legend.position="none") +
  facet_wrap(~Var)

In boxplot we see several outliers in the values of happy and trust

greece %>% 
  pivot_longer(c(stfgov, happy, trstlgl),
               names_to = 'Var', values_to = 'Score') %>% 
  ggplot(aes(x=Score, fill=Var)) + 
  geom_histogram(aes(y=..density.., fill = Var), bins = 10) +
  geom_density(alpha = .5, color="blue")+
  ggtitle("Distribution of scores") +
  xlab("Variable") + 
  ylab("Score") +
  theme_bw()+
  theme(legend.position="none") +
  facet_wrap(~Var)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

As it can be seen from the histograms, satisfaction with government, happy close to normal distribution. As for the trust in the legal system the histogram is not normally distributed.

categorical

Let’s look at the categorical variable

table(greece$psppipla)
## 
##    1    2    3    4    5 
## 1205  710  614  146   10

#Category 5 (`A great deal’) contains only 5 observations. This may affect the evaluation of the coefficients. Therefore, let’s combine categories 4 and 5

greece$psppipla <- car::recode(greece$psppipla, "1 = 1;
                                      2 = 2;
                                      3 = 3;
                                      4 = 4;
                                      5 = 4")
greece %>% 
  group_by(psppipla) %>% 
  count()
## # A tibble: 4 × 2
## # Groups:   psppipla [4]
##   psppipla     n
##      <dbl> <int>
## 1        1  1205
## 2        2   710
## 3        3   614
## 4        4   156
greece$psppipla = factor(greece$psppipla) 
greece %>% 
  ggplot(aes(x = psppipla)) +
  geom_bar(fill = "lightblue", color = "blue") +
  xlab("Category") +
  ylab("Frequency") +
  theme_bw() 

scatter plot

par(mfrow = c(1, 3))
greece %>% 
  ggplot(aes(x=stfgov, y=happy)) +
  geom_point(size=2) +
  geom_smooth(method=lm)
## `geom_smooth()` using formula = 'y ~ x'

greece %>% 
  ggplot(aes(x=stfgov, y=trstlgl)) +
  geom_point(size=2) +
  geom_smooth(method=lm)
## `geom_smooth()` using formula = 'y ~ x'

greece %>% 
  ggplot(aes(x=trstlgl, y=happy)) +
  geom_point(size=2) +
  geom_smooth(method=lm)
## `geom_smooth()` using formula = 'y ~ x'

there is a positive correlation between happy and satisfaction with government there is a positive correlation between trust in the legal system and satisfaction with government there is a positive correlation between happy and trust in the legal system

Correlations

chart.Correlation(greece[,c('stfgov', 'happy', 'trstlgl')],
                  histogram = TRUE) # by default Pearson
## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

chart.Correlation(greece[,c('stfgov', 'happy', 'trstlgl')],
                  histogram = TRUE,
                  method = "spearman") # Spearman's method
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## Есть совпадающие значения: не могу высчитать точное p-значение

## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## argument 1 does not name a graphical parameter
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## Есть совпадающие значения: не могу высчитать точное p-значение
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## Есть совпадающие значения: не могу высчитать точное p-значение
## Warning in par(usr): argument 1 does not name a graphical parameter

chart.Correlation(greece[,c('stfgov', 'happy', 'trstlgl')],
                  histogram = TRUE,
                  method = "kendall") # Kendall's method
## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

heatmap

heatmaply_cor(
  cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "spearman"),
  Colv=NA, Rowv=NA)

матрица корреляций

cor.test(greece$happy, greece$stfgov)
## 
##  Pearson's product-moment correlation
## 
## data:  greece$happy and greece$stfgov
## t = 12, df = 2683, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.195 0.267
## sample estimates:
##   cor 
## 0.231
cor.test(greece$happy, greece$stfgov,method="spearman")
## Warning in cor.test.default(greece$happy, greece$stfgov, method = "spearman"):
## Есть совпадающие значения: не могу высчитать точное p-значение
## 
##  Spearman's rank correlation rho
## 
## data:  greece$happy and greece$stfgov
## S = 3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##  rho 
## 0.21
cor_matrix <- cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "spearman")

stargazer(cor_matrix, title="Correlation Matrix", type = "latex")
## 
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Пт, май 12, 2023 - 14:51:22
## \begin{table}[!htbp] \centering 
##   \caption{Correlation Matrix} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}} cccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
##  & stfgov & happy & trstlgl \\ 
## \hline \\[-1.8ex] 
## stfgov & $1$ & $0.210$ & $0.313$ \\ 
## happy & $0.210$ & $1$ & $0.324$ \\ 
## trstlgl & $0.313$ & $0.324$ & $1$ \\ 
## \hline \\[-1.8ex] 
## \end{tabular} 
## \end{table}
sjPlot::tab_corr(greece[,c('stfgov', 'happy', 'trstlgl')],
                 corr.method = "spearman") 
  stfgov happy trstlgl
stfgov   0.210*** 0.313***
happy 0.210***   0.324***
trstlgl 0.313*** 0.324***  
Computed correlation used spearman-method with listwise-deletion.
cor.test(greece$happy, greece$stfgov)
## 
##  Pearson's product-moment correlation
## 
## data:  greece$happy and greece$stfgov
## t = 12, df = 2683, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.195 0.267
## sample estimates:
##   cor 
## 0.231
cor.test(greece$happy, greece$stfgov,method="kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  greece$happy and greece$stfgov
## z = 11, p-value <2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##   tau 
## 0.168
cor_matrix <- cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "kendall")

stargazer(cor_matrix, title="Correlation Matrix", type = "latex")
## 
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Пт, май 12, 2023 - 14:51:23
## \begin{table}[!htbp] \centering 
##   \caption{Correlation Matrix} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}} cccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
##  & stfgov & happy & trstlgl \\ 
## \hline \\[-1.8ex] 
## stfgov & $1$ & $0.168$ & $0.238$ \\ 
## happy & $0.168$ & $1$ & $0.256$ \\ 
## trstlgl & $0.238$ & $0.256$ & $1$ \\ 
## \hline \\[-1.8ex] 
## \end{tabular} 
## \end{table}
sjPlot::tab_corr(greece[,c('stfgov', 'happy', 'trstlgl')],
                 corr.method = "kendall")
  stfgov happy trstlgl
stfgov   0.168*** 0.238***
happy 0.168***   0.256***
trstlgl 0.238*** 0.256***  
Computed correlation used kendall-method with listwise-deletion.
cor.test(greece$happy, greece$stfgov)
## 
##  Pearson's product-moment correlation
## 
## data:  greece$happy and greece$stfgov
## t = 12, df = 2683, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.195 0.267
## sample estimates:
##   cor 
## 0.231
cor.test(greece$happy, greece$stfgov,method="spearman")
## Warning in cor.test.default(greece$happy, greece$stfgov, method = "spearman"):
## Есть совпадающие значения: не могу высчитать точное p-значение
## 
##  Spearman's rank correlation rho
## 
## data:  greece$happy and greece$stfgov
## S = 3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##  rho 
## 0.21
cor_matrix <- cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "spearman")

stargazer(cor_matrix, title="Correlation Matrix", type = "latex")
## 
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Пт, май 12, 2023 - 14:51:25
## \begin{table}[!htbp] \centering 
##   \caption{Correlation Matrix} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}} cccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
##  & stfgov & happy & trstlgl \\ 
## \hline \\[-1.8ex] 
## stfgov & $1$ & $0.210$ & $0.313$ \\ 
## happy & $0.210$ & $1$ & $0.324$ \\ 
## trstlgl & $0.313$ & $0.324$ & $1$ \\ 
## \hline \\[-1.8ex] 
## \end{tabular} 
## \end{table}
sjPlot::tab_corr(greece[,c('stfgov', 'happy', 'trstlgl')],
                 corr.method = "spearman")
  stfgov happy trstlgl
stfgov   0.210*** 0.313***
happy 0.210***   0.324***
trstlgl 0.313*** 0.324***  
Computed correlation used spearman-method with listwise-deletion.
rcorr(as.matrix(greece[,c('stfgov', 'happy', 'trstlgl')]), type = "spearman")
##         stfgov happy trstlgl
## stfgov    1.00  0.21    0.31
## happy     0.21  1.00    0.32
## trstlgl   0.31  0.32    1.00
## 
## n= 2685 
## 
## 
## P
##         stfgov happy trstlgl
## stfgov          0     0     
## happy    0            0     
## trstlgl  0      0
cor_mat <- greece[,-4] %>% 
  rstatix::cor_mat()

cor_mat %>% 
  rstatix::cor_get_pval()
## # A tibble: 3 × 4
##   rowname   stfgov    happy  trstlgl
##   <chr>      <dbl>    <dbl>    <dbl>
## 1 stfgov  0        7.09e-34 1.03e-68
## 2 happy   7.09e-34 0        2.03e-63
## 3 trstlgl 1.03e-68 2.03e-63 0
cor_mat %>% 
  rstatix::cor_gather()
## # A tibble: 9 × 4
##   var1    var2      cor        p
##   <chr>   <chr>   <dbl>    <dbl>
## 1 stfgov  stfgov   1    0       
## 2 happy   stfgov   0.23 7.09e-34
## 3 trstlgl stfgov   0.33 1.03e-68
## 4 stfgov  happy    0.23 7.09e-34
## 5 happy   happy    1    0       
## 6 trstlgl happy    0.32 2.03e-63
## 7 stfgov  trstlgl  0.33 1.03e-68
## 8 happy   trstlgl  0.32 2.03e-63
## 9 trstlgl trstlgl  1    0
greece[,-4] %>% 
  apa.cor.table(filename = "cor_matrix_Greece.doc")
## 
## 
## Means, standard deviations, and correlations with confidence intervals
##  
## 
##   Variable   M    SD   1          2         
##   1. stfgov  4.11 2.28                      
##                                             
##   2. happy   6.59 1.53 .23**                
##                        [.19, .27]           
##                                             
##   3. trstlgl 6.43 2.26 .33**      .32**     
##                        [.29, .36] [.28, .35]
##                                             
## 
## Note. M and SD are used to represent mean and standard deviation, respectively.
## Values in square brackets indicate the 95% confidence interval.
## The confidence interval is a plausible range of population correlations 
## that could have caused the sample correlation (Cumming, 2014).
##  * indicates p < .05. ** indicates p < .01.
## 
greece[,-4] %>% 
  sjPlot::sjp.corr()
## Warning: 'sjp.corr' is deprecated. Please use 'correlation::correlation()' and
## its related plot()-method.
## Computing correlation using pearson-method with listwise-deletion...
## Warning: Removed 6 rows containing missing values (`geom_text()`).

From what we can see, all the relationship between our variables are quite moderate and have positive direction. The highest correlation coefficient is between trstlgl and stfgov. The presented values confirm the situation on the scatterplots. It is also worth noting that there is a very high level of significance (p <0.001)

a boxplot for the categorical predictor and the outcome

greece %>% 
  ggplot(aes(x = factor (psppipla),
             y = happy,
             fill = factor (psppipla))) + 
  geom_boxplot() +
  ggtitle("Distribution of happy level") +
  xlab("Category") + 
  ylab("Happy level") +
  theme_bw()+
  theme(legend.position="none") 

Regardless of the level of psppipla, we observe the same distribution of the level of happiness of citizens

Linear regression

model1 = lm(happy ~ stfgov, data = greece)
sjPlot::tab_model(model1)
  happy
Predictors Estimates CI p
(Intercept) 5.96 5.84 – 6.07 <0.001
stfgov 0.15 0.13 – 0.18 <0.001
Observations 2685
R2 / R2 adjusted 0.053 / 0.053

we standardize - so we can compare the coefficients with each other, and interpret them as the size of the effect

greece <- greece %>% 
  mutate(Zhappy = scale(happy)[,1],
         Zstfgov = scale(stfgov)[,1],
         Ztrstlgl = scale(trstlgl)[,1])
model1_std = lm(Zhappy ~ Zstfgov, data = greece)
sjPlot::tab_model(model1_std)
  Zhappy
Predictors Estimates CI p
(Intercept) 0.00 -0.04 – 0.04 1.000
Zstfgov 0.23 0.19 – 0.27 <0.001
Observations 2685
R2 / R2 adjusted 0.053 / 0.053
# let's add a categorical variable
model2 = lm(happy ~ stfgov + psppipla, data = greece)
sjPlot::tab_model(model2)
  happy
Predictors Estimates CI p
(Intercept) 5.99 5.86 – 6.11 <0.001
stfgov 0.15 0.13 – 0.18 <0.001
psppipla [2] -0.05 -0.19 – 0.09 0.497
psppipla [3] -0.09 -0.24 – 0.05 0.206
psppipla [4] 0.18 -0.07 – 0.43 0.161
Observations 2685
R2 / R2 adjusted 0.055 / 0.054
# with standardized coefficients
model2_std = lm(Zhappy ~ Zstfgov + psppipla, data = greece)
sjPlot::tab_model(model2_std)
  Zhappy
Predictors Estimates CI p
(Intercept) 0.02 -0.04 – 0.07 0.582
Zstfgov 0.23 0.19 – 0.27 <0.001
psppipla [2] -0.03 -0.12 – 0.06 0.497
psppipla [3] -0.06 -0.16 – 0.03 0.206
psppipla [4] 0.12 -0.05 – 0.28 0.161
Observations 2685
R2 / R2 adjusted 0.055 / 0.054
# comparison of models
anova(model1_std, model2_std)
## Analysis of Variance Table
## 
## Model 1: Zhappy ~ Zstfgov
## Model 2: Zhappy ~ Zstfgov + psppipla
##   Res.Df  RSS Df Sum of Sq    F Pr(>F)
## 1   2683 2541                         
## 2   2680 2536  3      4.48 1.58   0.19

Conclusion: Model 1 is statistically significantly better suited to the data than model 2 (p>0.05)

summary(model1)
## 
## Call:
## lm(formula = happy ~ stfgov, data = greece)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.732 -0.887  0.113  0.958  4.043 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.9571     0.0591   100.7   <2e-16 ***
## stfgov        0.1550     0.0126    12.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.49 on 2683 degrees of freedom
## Multiple R-squared:  0.0534, Adjusted R-squared:  0.053 
## F-statistic:  151 on 1 and 2683 DF,  p-value: <2e-16

For a model with non-standardized coefficients

p-value: < 2.2-16 <0.05, maybe makes sense Adjusted R-squared: 0.053, i.e. this indicator is 5.3% of the expected variable (happy) with my independent time (stfgov) Coefficients 0.15 and p-value:< 0.001, with each increase in atf gov per unit, the happiness level increases by 0.15. intersept = 5.96 - this refers to the predicted value of happy when stfgov is 0. The regression equation looks like this: happy = 5.96 + 0.15*stfgov

Linear regression model with 2 continuous predictors Now we add another predictor to our model.

# Let's add another continuous variable to the model
model3 = lm(happy ~ stfgov + trstlgl, data = greece)
sjPlot::tab_model(model3)
  happy
Predictors Estimates CI p
(Intercept) 5.03 4.86 – 5.20 <0.001
stfgov 0.10 0.07 – 0.12 <0.001
trstlgl 0.18 0.16 – 0.21 <0.001
Observations 2685
R2 / R2 adjusted 0.118 / 0.117
model3_std = lm(Zhappy ~ Zstfgov + Ztrstlgl, data = greece)
sjPlot::tab_model(model3_std)
  Zhappy
Predictors Estimates CI p
(Intercept) 0.00 -0.04 – 0.04 1.000
Zstfgov 0.14 0.10 – 0.18 <0.001
Ztrstlgl 0.27 0.23 – 0.31 <0.001
Observations 2685
R2 / R2 adjusted 0.118 / 0.117
# model comparison
anova(model1_std, model3_std)
## Analysis of Variance Table
## 
## Model 1: Zhappy ~ Zstfgov
## Model 2: Zhappy ~ Zstfgov + Ztrstlgl
##   Res.Df  RSS Df Sum of Sq   F Pr(>F)    
## 1   2683 2541                            
## 2   2682 2367  1       174 197 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: Model 3 is statistically significantly better suited to the data than model 1 (p<0.05)

summary(model3_std)
## 
## Call:
## lm(formula = Zhappy ~ Zstfgov + Ztrstlgl, data = greece)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.365 -0.519  0.034  0.619  3.252 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.50e-15   1.81e-02    0.00        1    
## Zstfgov     1.43e-01   1.92e-02    7.42  1.5e-13 ***
## Ztrstlgl    2.69e-01   1.92e-02   14.03  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.939 on 2682 degrees of freedom
## Multiple R-squared:  0.118,  Adjusted R-squared:  0.117 
## F-statistic:  180 on 2 and 2682 DF,  p-value: <2e-16

For a model with standardized coefficients

p-value: < 2.2-16 <0.05, maybe makes sense Adjusted R-squared: 0.1174, i.e. this indicator is 11.74% of the expected variable the model is quite high-quality (happy) with mine, independent, unchangeable (stfgov & trstlgl) Correlation coefficients 0.14 and 0.27 and p-value:< 0.001, correlation coefficient = 0.00. The regression equation outputs: Z happy = 0.14Zstfgov + 0.27Ztrstlgl

summary(model3)
## 
## Call:
## lm(formula = happy ~ stfgov + trstlgl, data = greece)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.670 -0.792  0.053  0.945  4.969 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0305     0.0873   57.62  < 2e-16 ***
## stfgov        0.0956     0.0129    7.42  1.5e-13 ***
## trstlgl       0.1821     0.0130   14.03  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.44 on 2682 degrees of freedom
## Multiple R-squared:  0.118,  Adjusted R-squared:  0.117 
## F-statistic:  180 on 2 and 2682 DF,  p-value: <2e-16

For a model with non-standardized coefficients

p-value: < 2.2-16 <0.05, maybe makes sense Adjusted R-squared: 0.1174, i.e. this indicator is 11.74% of the expected variable the model is quite high-quality (happy) with mine, independent, unchangeable (stfgov & trstlgl) Coefficients 0.10 and 0.18 and p-value:< 0.001, correlation coefficient = 5.0 - this refers to the predicted value of happy when trstlgl indicators are 0.

The regression equation looks like this: happy = 5.03 + 0.10stfgov + 0.18trstlgl

With each increase in stfgov by one, happy rises by 0.10. With each increase in trstlgl by one, happy rises by 0.18

Let’s check the assumptions of linear regression

Checking Linear Regression Assumptions Linear regression makes several assumptions about the data, such as :

autoplot(model3_std)

let’s check in a little more detail, the assumptions of linear regression are fulfilled

#1) normality of the remainder distribution

res <- resid(model3_std)
hist(res, breaks = 20, col = 'lightblue', freq = FALSE)
lines(density(res), col = 'red', lwd = 2)

shapiro.test(res) # the leftovers are NOT distributed normally
## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 1, p-value <2e-16
#QQ-plot
par(mfrow = c(1, 1))
qqnorm(res)
qqline(res)

car::qqPlot(model3_std)

## 1402 1484 
## 1346 1424
#The histogram, test and qqplot graphs DO NOT show the normal distribution of residuals
# homoscedasticity
plot(fitted(model3_std), res)
abline(0,0)

ggplot(data = model3_std, aes(x = .fitted, y = .stdresid)) + 
  geom_point() + 
  geom_hline(yintercept = 0)

bptest(model3_std) # The Broich — Pagan or Breusch — Pagan test
## 
##  studentized Breusch-Pagan test
## 
## data:  model3_std
## BP = 112, df = 2, p-value <2e-16
# we can say that homoscedasticity does NOT hold
# let's check multicollinearity
car::vif(model3_std)
##  Zstfgov Ztrstlgl 
##     1.12     1.12
# there is NO multicollinearity

Linearity assumption: at the Residuals vs.Fitted plot a horizontal line, without distinct patterns can be seen, which is surely a good thing. (Our data is linear) The histogram, test and qqplot graphs DO NOT show the normal distribution of residuals Scale-Location & Residuals vs. Leverage plot DO NOT show us a horizontal line with equally, though in a funny way, spread points. This corresponds with NO homoscedasticity of our data.