Diana Piskareva and Daria Rukosueva did an equivalent part of the work, it is difficult to distinguish who was doing what exactly, the project was done together
happy - How happy are you. C1 Taking all things together, how happy would you say you are?
stfgov - How satisfied with the national government
trstlgl - Trust in the legal system B6-12a Using this card, please tell me on a score of 0-10 how much you personally trust each of the institutions I read out. 0 means you do not trust an institution at all, and 10 means you have complete trust. Firstly… …the legal system?
psppipla - Political system allows people to have influence on politics. And how much would you say that the political system in [country] allows people like you to have an influence on politics?
greece <- select(greece, c("stfgov", "happy", "trstlgl", "psppipla"))
skim(greece)
| Name | greece |
| Number of rows | 2799 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| stfgov | 21 | 0.99 | 4.12 | 2.27 | 0 | 2 | 4 | 6 | 10 | ▇▇▇▅▁ |
| happy | 5 | 1.00 | 6.58 | 1.54 | 0 | 6 | 7 | 8 | 10 | ▁▁▅▇▁ |
| trstlgl | 13 | 1.00 | 6.43 | 2.26 | 0 | 5 | 7 | 8 | 10 | ▂▃▆▇▅ |
| psppipla | 78 | 0.97 | 1.90 | 0.96 | 1 | 1 | 2 | 3 | 5 | ▇▅▅▁▁ |
summary(greece)
## stfgov happy trstlgl psppipla
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :1.0
## 1st Qu.: 2.00 1st Qu.: 6.00 1st Qu.: 5.00 1st Qu.:1.0
## Median : 4.00 Median : 7.00 Median : 7.00 Median :2.0
## Mean : 4.12 Mean : 6.58 Mean : 6.43 Mean :1.9
## 3rd Qu.: 6.00 3rd Qu.: 8.00 3rd Qu.: 8.00 3rd Qu.:3.0
## Max. :10.00 Max. :10.00 Max. :10.00 Max. :5.0
## NA's :21 NA's :5 NA's :13 NA's :78
using the summary function, we saw that the missing values are encoded correctly and are reflected as NA, so we can remove them from the dataset so that they do not distort the results.
greece <- greece[complete.cases(greece),]
summary(greece)
## stfgov happy trstlgl psppipla
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :1.0
## 1st Qu.: 2.00 1st Qu.: 6.00 1st Qu.: 5.00 1st Qu.:1.0
## Median : 4.00 Median : 7.00 Median : 7.00 Median :2.0
## Mean : 4.11 Mean : 6.59 Mean : 6.43 Mean :1.9
## 3rd Qu.: 6.00 3rd Qu.: 8.00 3rd Qu.: 8.00 3rd Qu.:3.0
## Max. :10.00 Max. :10.00 Max. :10.00 Max. :5.0
We get general information about variables using the describe function
greece %>%
dplyr::select(-4) %>%
describe()
## vars n mean sd median trimmed mad min max range skew kurtosis
## stfgov 1 2685 4.11 2.28 4 4.08 2.97 0 10 10 0.10 -0.75
## happy 2 2685 6.59 1.53 7 6.70 1.48 0 10 10 -0.73 0.96
## trstlgl 3 2685 6.43 2.26 7 6.65 1.48 0 10 10 -0.72 -0.13
## se
## stfgov 0.04
## happy 0.03
## trstlgl 0.04
greece %>%
dplyr::select(-4) %>%
sjmisc::descr(show = c('n', "mean","sd", "md", "range")) %>%
rename("variable" = "var",
"Number of obs." = "n",
"Mean" = "mean",
"SD" = "sd",
"Median" = "md",
"Range" = "range")
##
## ## Basic descriptive statistics
##
## variable Number of obs. Mean SD Median Range
## stfgov 2685 4.11 2.28 4 10 (0-10)
## happy 2685 6.59 1.53 7 10 (0-10)
## trstlgl 2685 6.43 2.26 7 10 (0-10)
greece %>%
pivot_longer(c(stfgov, happy, trstlgl),
names_to = 'Var', values_to = 'Score') %>%
ggplot(aes(y=Score)) +
geom_boxplot() +
ggtitle("Distribution of scores") +
xlab("Variable") +
ylab("Score") +
theme_bw()+
theme(legend.position="none") +
facet_wrap(~Var)
In boxplot we see several outliers in the values of happy and trust
greece %>%
pivot_longer(c(stfgov, happy, trstlgl),
names_to = 'Var', values_to = 'Score') %>%
ggplot(aes(x=Score, fill=Var)) +
geom_histogram(aes(y=..density.., fill = Var), bins = 10) +
geom_density(alpha = .5, color="blue")+
ggtitle("Distribution of scores") +
xlab("Variable") +
ylab("Score") +
theme_bw()+
theme(legend.position="none") +
facet_wrap(~Var)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
As it can be seen from the histograms, satisfaction with government,
happy close to normal distribution. As for the trust in the legal system
the histogram is not normally distributed.
Let’s look at the categorical variable
table(greece$psppipla)
##
## 1 2 3 4 5
## 1205 710 614 146 10
#Category 5 (`A great deal’) contains only 5 observations. This may affect the evaluation of the coefficients. Therefore, let’s combine categories 4 and 5
greece$psppipla <- car::recode(greece$psppipla, "1 = 1;
2 = 2;
3 = 3;
4 = 4;
5 = 4")
greece %>%
group_by(psppipla) %>%
count()
## # A tibble: 4 × 2
## # Groups: psppipla [4]
## psppipla n
## <dbl> <int>
## 1 1 1205
## 2 2 710
## 3 3 614
## 4 4 156
greece$psppipla = factor(greece$psppipla)
greece %>%
ggplot(aes(x = psppipla)) +
geom_bar(fill = "lightblue", color = "blue") +
xlab("Category") +
ylab("Frequency") +
theme_bw()
par(mfrow = c(1, 3))
greece %>%
ggplot(aes(x=stfgov, y=happy)) +
geom_point(size=2) +
geom_smooth(method=lm)
## `geom_smooth()` using formula = 'y ~ x'
greece %>%
ggplot(aes(x=stfgov, y=trstlgl)) +
geom_point(size=2) +
geom_smooth(method=lm)
## `geom_smooth()` using formula = 'y ~ x'
greece %>%
ggplot(aes(x=trstlgl, y=happy)) +
geom_point(size=2) +
geom_smooth(method=lm)
## `geom_smooth()` using formula = 'y ~ x'
there is a positive correlation between happy and satisfaction with
government there is a positive correlation between trust in the legal
system and satisfaction with government there is a positive correlation
between happy and trust in the legal system
chart.Correlation(greece[,c('stfgov', 'happy', 'trstlgl')],
histogram = TRUE) # by default Pearson
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
chart.Correlation(greece[,c('stfgov', 'happy', 'trstlgl')],
histogram = TRUE,
method = "spearman") # Spearman's method
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## Есть совпадающие значения: не могу высчитать точное p-значение
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## argument 1 does not name a graphical parameter
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## Есть совпадающие значения: не могу высчитать точное p-значение
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in cor.test.default(as.numeric(x), as.numeric(y), method = method):
## Есть совпадающие значения: не могу высчитать точное p-значение
## Warning in par(usr): argument 1 does not name a graphical parameter
chart.Correlation(greece[,c('stfgov', 'happy', 'trstlgl')],
histogram = TRUE,
method = "kendall") # Kendall's method
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
heatmaply_cor(
cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "spearman"),
Colv=NA, Rowv=NA)
cor.test(greece$happy, greece$stfgov)
##
## Pearson's product-moment correlation
##
## data: greece$happy and greece$stfgov
## t = 12, df = 2683, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.195 0.267
## sample estimates:
## cor
## 0.231
cor.test(greece$happy, greece$stfgov,method="spearman")
## Warning in cor.test.default(greece$happy, greece$stfgov, method = "spearman"):
## Есть совпадающие значения: не могу высчитать точное p-значение
##
## Spearman's rank correlation rho
##
## data: greece$happy and greece$stfgov
## S = 3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.21
cor_matrix <- cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "spearman")
stargazer(cor_matrix, title="Correlation Matrix", type = "latex")
##
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Пт, май 12, 2023 - 14:51:22
## \begin{table}[!htbp] \centering
## \caption{Correlation Matrix}
## \label{}
## \begin{tabular}{@{\extracolsep{5pt}} cccc}
## \\[-1.8ex]\hline
## \hline \\[-1.8ex]
## & stfgov & happy & trstlgl \\
## \hline \\[-1.8ex]
## stfgov & $1$ & $0.210$ & $0.313$ \\
## happy & $0.210$ & $1$ & $0.324$ \\
## trstlgl & $0.313$ & $0.324$ & $1$ \\
## \hline \\[-1.8ex]
## \end{tabular}
## \end{table}
sjPlot::tab_corr(greece[,c('stfgov', 'happy', 'trstlgl')],
corr.method = "spearman")
| stfgov | happy | trstlgl | |
|---|---|---|---|
| stfgov | 0.210*** | 0.313*** | |
| happy | 0.210*** | 0.324*** | |
| trstlgl | 0.313*** | 0.324*** | |
| Computed correlation used spearman-method with listwise-deletion. | |||
cor.test(greece$happy, greece$stfgov)
##
## Pearson's product-moment correlation
##
## data: greece$happy and greece$stfgov
## t = 12, df = 2683, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.195 0.267
## sample estimates:
## cor
## 0.231
cor.test(greece$happy, greece$stfgov,method="kendall")
##
## Kendall's rank correlation tau
##
## data: greece$happy and greece$stfgov
## z = 11, p-value <2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.168
cor_matrix <- cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "kendall")
stargazer(cor_matrix, title="Correlation Matrix", type = "latex")
##
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Пт, май 12, 2023 - 14:51:23
## \begin{table}[!htbp] \centering
## \caption{Correlation Matrix}
## \label{}
## \begin{tabular}{@{\extracolsep{5pt}} cccc}
## \\[-1.8ex]\hline
## \hline \\[-1.8ex]
## & stfgov & happy & trstlgl \\
## \hline \\[-1.8ex]
## stfgov & $1$ & $0.168$ & $0.238$ \\
## happy & $0.168$ & $1$ & $0.256$ \\
## trstlgl & $0.238$ & $0.256$ & $1$ \\
## \hline \\[-1.8ex]
## \end{tabular}
## \end{table}
sjPlot::tab_corr(greece[,c('stfgov', 'happy', 'trstlgl')],
corr.method = "kendall")
| stfgov | happy | trstlgl | |
|---|---|---|---|
| stfgov | 0.168*** | 0.238*** | |
| happy | 0.168*** | 0.256*** | |
| trstlgl | 0.238*** | 0.256*** | |
| Computed correlation used kendall-method with listwise-deletion. | |||
cor.test(greece$happy, greece$stfgov)
##
## Pearson's product-moment correlation
##
## data: greece$happy and greece$stfgov
## t = 12, df = 2683, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.195 0.267
## sample estimates:
## cor
## 0.231
cor.test(greece$happy, greece$stfgov,method="spearman")
## Warning in cor.test.default(greece$happy, greece$stfgov, method = "spearman"):
## Есть совпадающие значения: не могу высчитать точное p-значение
##
## Spearman's rank correlation rho
##
## data: greece$happy and greece$stfgov
## S = 3e+09, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.21
cor_matrix <- cor(greece[,c('stfgov', 'happy', 'trstlgl')], method = "spearman")
stargazer(cor_matrix, title="Correlation Matrix", type = "latex")
##
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Пт, май 12, 2023 - 14:51:25
## \begin{table}[!htbp] \centering
## \caption{Correlation Matrix}
## \label{}
## \begin{tabular}{@{\extracolsep{5pt}} cccc}
## \\[-1.8ex]\hline
## \hline \\[-1.8ex]
## & stfgov & happy & trstlgl \\
## \hline \\[-1.8ex]
## stfgov & $1$ & $0.210$ & $0.313$ \\
## happy & $0.210$ & $1$ & $0.324$ \\
## trstlgl & $0.313$ & $0.324$ & $1$ \\
## \hline \\[-1.8ex]
## \end{tabular}
## \end{table}
sjPlot::tab_corr(greece[,c('stfgov', 'happy', 'trstlgl')],
corr.method = "spearman")
| stfgov | happy | trstlgl | |
|---|---|---|---|
| stfgov | 0.210*** | 0.313*** | |
| happy | 0.210*** | 0.324*** | |
| trstlgl | 0.313*** | 0.324*** | |
| Computed correlation used spearman-method with listwise-deletion. | |||
rcorr(as.matrix(greece[,c('stfgov', 'happy', 'trstlgl')]), type = "spearman")
## stfgov happy trstlgl
## stfgov 1.00 0.21 0.31
## happy 0.21 1.00 0.32
## trstlgl 0.31 0.32 1.00
##
## n= 2685
##
##
## P
## stfgov happy trstlgl
## stfgov 0 0
## happy 0 0
## trstlgl 0 0
cor_mat <- greece[,-4] %>%
rstatix::cor_mat()
cor_mat %>%
rstatix::cor_get_pval()
## # A tibble: 3 × 4
## rowname stfgov happy trstlgl
## <chr> <dbl> <dbl> <dbl>
## 1 stfgov 0 7.09e-34 1.03e-68
## 2 happy 7.09e-34 0 2.03e-63
## 3 trstlgl 1.03e-68 2.03e-63 0
cor_mat %>%
rstatix::cor_gather()
## # A tibble: 9 × 4
## var1 var2 cor p
## <chr> <chr> <dbl> <dbl>
## 1 stfgov stfgov 1 0
## 2 happy stfgov 0.23 7.09e-34
## 3 trstlgl stfgov 0.33 1.03e-68
## 4 stfgov happy 0.23 7.09e-34
## 5 happy happy 1 0
## 6 trstlgl happy 0.32 2.03e-63
## 7 stfgov trstlgl 0.33 1.03e-68
## 8 happy trstlgl 0.32 2.03e-63
## 9 trstlgl trstlgl 1 0
greece[,-4] %>%
apa.cor.table(filename = "cor_matrix_Greece.doc")
##
##
## Means, standard deviations, and correlations with confidence intervals
##
##
## Variable M SD 1 2
## 1. stfgov 4.11 2.28
##
## 2. happy 6.59 1.53 .23**
## [.19, .27]
##
## 3. trstlgl 6.43 2.26 .33** .32**
## [.29, .36] [.28, .35]
##
##
## Note. M and SD are used to represent mean and standard deviation, respectively.
## Values in square brackets indicate the 95% confidence interval.
## The confidence interval is a plausible range of population correlations
## that could have caused the sample correlation (Cumming, 2014).
## * indicates p < .05. ** indicates p < .01.
##
greece[,-4] %>%
sjPlot::sjp.corr()
## Warning: 'sjp.corr' is deprecated. Please use 'correlation::correlation()' and
## its related plot()-method.
## Computing correlation using pearson-method with listwise-deletion...
## Warning: Removed 6 rows containing missing values (`geom_text()`).
From what we can see, all the relationship between our variables are
quite moderate and have positive direction. The highest correlation
coefficient is between trstlgl and stfgov. The presented values confirm
the situation on the scatterplots. It is also worth noting that there is
a very high level of significance (p <0.001)
greece %>%
ggplot(aes(x = factor (psppipla),
y = happy,
fill = factor (psppipla))) +
geom_boxplot() +
ggtitle("Distribution of happy level") +
xlab("Category") +
ylab("Happy level") +
theme_bw()+
theme(legend.position="none")
Regardless of the level of psppipla, we observe the same distribution of
the level of happiness of citizens
model1 = lm(happy ~ stfgov, data = greece)
sjPlot::tab_model(model1)
| happy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 5.96 | 5.84 – 6.07 | <0.001 |
| stfgov | 0.15 | 0.13 – 0.18 | <0.001 |
| Observations | 2685 | ||
| R2 / R2 adjusted | 0.053 / 0.053 | ||
greece <- greece %>%
mutate(Zhappy = scale(happy)[,1],
Zstfgov = scale(stfgov)[,1],
Ztrstlgl = scale(trstlgl)[,1])
model1_std = lm(Zhappy ~ Zstfgov, data = greece)
sjPlot::tab_model(model1_std)
| Zhappy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.00 | -0.04 – 0.04 | 1.000 |
| Zstfgov | 0.23 | 0.19 – 0.27 | <0.001 |
| Observations | 2685 | ||
| R2 / R2 adjusted | 0.053 / 0.053 | ||
# let's add a categorical variable
model2 = lm(happy ~ stfgov + psppipla, data = greece)
sjPlot::tab_model(model2)
| happy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 5.99 | 5.86 – 6.11 | <0.001 |
| stfgov | 0.15 | 0.13 – 0.18 | <0.001 |
| psppipla [2] | -0.05 | -0.19 – 0.09 | 0.497 |
| psppipla [3] | -0.09 | -0.24 – 0.05 | 0.206 |
| psppipla [4] | 0.18 | -0.07 – 0.43 | 0.161 |
| Observations | 2685 | ||
| R2 / R2 adjusted | 0.055 / 0.054 | ||
# with standardized coefficients
model2_std = lm(Zhappy ~ Zstfgov + psppipla, data = greece)
sjPlot::tab_model(model2_std)
| Zhappy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.02 | -0.04 – 0.07 | 0.582 |
| Zstfgov | 0.23 | 0.19 – 0.27 | <0.001 |
| psppipla [2] | -0.03 | -0.12 – 0.06 | 0.497 |
| psppipla [3] | -0.06 | -0.16 – 0.03 | 0.206 |
| psppipla [4] | 0.12 | -0.05 – 0.28 | 0.161 |
| Observations | 2685 | ||
| R2 / R2 adjusted | 0.055 / 0.054 | ||
# comparison of models
anova(model1_std, model2_std)
## Analysis of Variance Table
##
## Model 1: Zhappy ~ Zstfgov
## Model 2: Zhappy ~ Zstfgov + psppipla
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2683 2541
## 2 2680 2536 3 4.48 1.58 0.19
Conclusion: Model 1 is statistically significantly better suited to the data than model 2 (p>0.05)
summary(model1)
##
## Call:
## lm(formula = happy ~ stfgov, data = greece)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.732 -0.887 0.113 0.958 4.043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9571 0.0591 100.7 <2e-16 ***
## stfgov 0.1550 0.0126 12.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.49 on 2683 degrees of freedom
## Multiple R-squared: 0.0534, Adjusted R-squared: 0.053
## F-statistic: 151 on 1 and 2683 DF, p-value: <2e-16
For a model with non-standardized coefficients
p-value: < 2.2-16 <0.05, maybe makes sense Adjusted R-squared: 0.053, i.e. this indicator is 5.3% of the expected variable (happy) with my independent time (stfgov) Coefficients 0.15 and p-value:< 0.001, with each increase in atf gov per unit, the happiness level increases by 0.15. intersept = 5.96 - this refers to the predicted value of happy when stfgov is 0. The regression equation looks like this: happy = 5.96 + 0.15*stfgov
Linear regression model with 2 continuous predictors Now we add another predictor to our model.
# Let's add another continuous variable to the model
model3 = lm(happy ~ stfgov + trstlgl, data = greece)
sjPlot::tab_model(model3)
| happy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 5.03 | 4.86 – 5.20 | <0.001 |
| stfgov | 0.10 | 0.07 – 0.12 | <0.001 |
| trstlgl | 0.18 | 0.16 – 0.21 | <0.001 |
| Observations | 2685 | ||
| R2 / R2 adjusted | 0.118 / 0.117 | ||
model3_std = lm(Zhappy ~ Zstfgov + Ztrstlgl, data = greece)
sjPlot::tab_model(model3_std)
| Zhappy | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.00 | -0.04 – 0.04 | 1.000 |
| Zstfgov | 0.14 | 0.10 – 0.18 | <0.001 |
| Ztrstlgl | 0.27 | 0.23 – 0.31 | <0.001 |
| Observations | 2685 | ||
| R2 / R2 adjusted | 0.118 / 0.117 | ||
# model comparison
anova(model1_std, model3_std)
## Analysis of Variance Table
##
## Model 1: Zhappy ~ Zstfgov
## Model 2: Zhappy ~ Zstfgov + Ztrstlgl
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2683 2541
## 2 2682 2367 1 174 197 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: Model 3 is statistically significantly better suited to the data than model 1 (p<0.05)
summary(model3_std)
##
## Call:
## lm(formula = Zhappy ~ Zstfgov + Ztrstlgl, data = greece)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.365 -0.519 0.034 0.619 3.252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50e-15 1.81e-02 0.00 1
## Zstfgov 1.43e-01 1.92e-02 7.42 1.5e-13 ***
## Ztrstlgl 2.69e-01 1.92e-02 14.03 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.939 on 2682 degrees of freedom
## Multiple R-squared: 0.118, Adjusted R-squared: 0.117
## F-statistic: 180 on 2 and 2682 DF, p-value: <2e-16
For a model with standardized coefficients
p-value: < 2.2-16 <0.05, maybe makes sense Adjusted R-squared: 0.1174, i.e. this indicator is 11.74% of the expected variable the model is quite high-quality (happy) with mine, independent, unchangeable (stfgov & trstlgl) Correlation coefficients 0.14 and 0.27 and p-value:< 0.001, correlation coefficient = 0.00. The regression equation outputs: Z happy = 0.14Zstfgov + 0.27Ztrstlgl
summary(model3)
##
## Call:
## lm(formula = happy ~ stfgov + trstlgl, data = greece)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.670 -0.792 0.053 0.945 4.969
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0305 0.0873 57.62 < 2e-16 ***
## stfgov 0.0956 0.0129 7.42 1.5e-13 ***
## trstlgl 0.1821 0.0130 14.03 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.44 on 2682 degrees of freedom
## Multiple R-squared: 0.118, Adjusted R-squared: 0.117
## F-statistic: 180 on 2 and 2682 DF, p-value: <2e-16
For a model with non-standardized coefficients
p-value: < 2.2-16 <0.05, maybe makes sense Adjusted R-squared: 0.1174, i.e. this indicator is 11.74% of the expected variable the model is quite high-quality (happy) with mine, independent, unchangeable (stfgov & trstlgl) Coefficients 0.10 and 0.18 and p-value:< 0.001, correlation coefficient = 5.0 - this refers to the predicted value of happy when trstlgl indicators are 0.
The regression equation looks like this: happy = 5.03 + 0.10stfgov + 0.18trstlgl
With each increase in stfgov by one, happy rises by 0.10. With each increase in trstlgl by one, happy rises by 0.18
Checking Linear Regression Assumptions Linear regression makes several assumptions about the data, such as :
autoplot(model3_std)
#1) normality of the remainder distribution
res <- resid(model3_std)
hist(res, breaks = 20, col = 'lightblue', freq = FALSE)
lines(density(res), col = 'red', lwd = 2)
shapiro.test(res) # the leftovers are NOT distributed normally
##
## Shapiro-Wilk normality test
##
## data: res
## W = 1, p-value <2e-16
#QQ-plot
par(mfrow = c(1, 1))
qqnorm(res)
qqline(res)
car::qqPlot(model3_std)
## 1402 1484
## 1346 1424
#The histogram, test and qqplot graphs DO NOT show the normal distribution of residuals
# homoscedasticity
plot(fitted(model3_std), res)
abline(0,0)
ggplot(data = model3_std, aes(x = .fitted, y = .stdresid)) +
geom_point() +
geom_hline(yintercept = 0)
bptest(model3_std) # The Broich — Pagan or Breusch — Pagan test
##
## studentized Breusch-Pagan test
##
## data: model3_std
## BP = 112, df = 2, p-value <2e-16
# we can say that homoscedasticity does NOT hold
# let's check multicollinearity
car::vif(model3_std)
## Zstfgov Ztrstlgl
## 1.12 1.12
# there is NO multicollinearity
Linearity assumption: at the Residuals vs.Fitted plot a horizontal line, without distinct patterns can be seen, which is surely a good thing. (Our data is linear) The histogram, test and qqplot graphs DO NOT show the normal distribution of residuals Scale-Location & Residuals vs. Leverage plot DO NOT show us a horizontal line with equally, though in a funny way, spread points. This corresponds with NO homoscedasticity of our data.