The data set is 6 by 123. Each row represents one country’s human freedom index.
## # A tibble: 6 × 123
## year ISO_code countries region pf_rol_procedural pf_rol_civil pf_rol_criminal
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2016 ALB Albania Easte… 6.66 4.55 4.67
## 2 2016 DZA Algeria Middl… NA NA NA
## 3 2016 AGO Angola Sub-S… NA NA NA
## 4 2016 ARG Argentina Latin… 7.10 5.79 4.34
## 5 2016 ARM Armenia Cauca… NA NA NA
## 6 2016 AUS Australia Ocean… 8.44 7.53 7.36
## # ℹ 116 more variables: pf_rol <dbl>, pf_ss_homicide <dbl>,
## # pf_ss_disappearances_disap <dbl>, pf_ss_disappearances_violent <dbl>,
## # pf_ss_disappearances_organized <dbl>,
## # pf_ss_disappearances_fatalities <dbl>, pf_ss_disappearances_injuries <dbl>,
## # pf_ss_disappearances <dbl>, pf_ss_women_fgm <dbl>,
## # pf_ss_women_missing <dbl>, pf_ss_women_inheritance_widows <dbl>,
## # pf_ss_women_inheritance_daughters <dbl>, pf_ss_women_inheritance <dbl>, …
## # A tibble: 6 × 123
## year ISO_code countries region pf_rol_procedural pf_rol_civil pf_rol_criminal
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2016 ALB Albania Easte… 6.66 4.55 4.67
## 2 2016 DZA Algeria Middl… NA NA NA
## 3 2016 AGO Angola Sub-S… NA NA NA
## 4 2016 ARG Argentina Latin… 7.10 5.79 4.34
## 5 2016 ARM Armenia Cauca… NA NA NA
## 6 2016 AUS Australia Ocean… 8.44 7.53 7.36
## # ℹ 116 more variables: pf_rol <dbl>, pf_ss_homicide <dbl>,
## # pf_ss_disappearances_disap <dbl>, pf_ss_disappearances_violent <dbl>,
## # pf_ss_disappearances_organized <dbl>,
## # pf_ss_disappearances_fatalities <dbl>, pf_ss_disappearances_injuries <dbl>,
## # pf_ss_disappearances <dbl>, pf_ss_women_fgm <dbl>,
## # pf_ss_women_missing <dbl>, pf_ss_women_inheritance_widows <dbl>,
## # pf_ss_women_inheritance_daughters <dbl>, pf_ss_women_inheritance <dbl>, …
A scatter plot with linear regression could be used to display the relationship between pf_score, and pf_expression_control.
ggplot(hfi_2016, aes(x=pf_expression_control, y=pf_score))+
geom_point() +
geom_smooth(method = "lm") + #adds in line for scatter plot
theme_bw()+
labs(x="Freedom of Expression",
y="Personal Freedom (score)",
title = "Scatterplot of Freedom of Expression to Personal Freedom",
caption = "Source:Ian Vasquez and Tanja Porcnik, The Human Freedom Index 2018: A Global Measurement of Personal, Civil, and Economic Freedom (Washington: Cato Institute, Fraser Institute, and the Friedrich Naumann Foundation for Freedom, 2018).")## `geom_smooth()` using formula = 'y ~ x'
## # A tibble: 1 × 1
## `cor(pf_expression_control, pf_score)`
## <dbl>
## 1 0.845
We can see a linear, positive, moderate association: as freedom of expression increases, personal freedom tends to increase with it. There are a few outliers who have very low personal freedom and freedom of expression.
## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## 4.2838 0.5418
##
## Sum of Squares: 102.213
## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## 4.2838 0.5418
##
## Sum of Squares: 102.213
The smallest sum of squares I got after changing the Global Tools setting was 105.968.
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.28 0.149 28.8 4.23e-65
## 2 pf_expression_control 0.542 0.0271 20.0 2.31e-45
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.714 0.712 0.799 400. 2.31e-45 1 -193. 391. 400.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Equation of Regression Line: y = 4.28 + 0.542*pf_expression_control The slope tells us that when the human freedom score increases by 0.542 for every 1 unit of amount of political pressure.
ggplot(data = hfi_2016, aes(x = pf_expression_control, y = pf_score)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)## `geom_smooth()` using formula = 'y ~ x'
The corresponding personal freedom score for a pf_expression_control rating of 3 is about 5.9. This seems to be an overestimate because it is higer than all actual freedom scores at a pf_expression_control rating of 3. It is an over estimate by 0.5.
ggplot(data = m1_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
xlab("Fitted values") +
ylab("Residuals")The residuals plot shows points scattered in no particular pattern about the horizontal line at y=0. This indicated that the relationship between the two variables is linear.
The conditions do not appear to be violated. The histogram shows a bell-shaped curve mostly centered around 0, confirming the linearity of the relationship between the two variables.
I think the residual vs fitted plot shows the constant variability conditions not being violated as points are randomly scattered on the plot.