library(tidyverse)
library(openintro)
library(statsr)
library(broom)
library(ggplot2)
Exercise 1
There are 1458 rows and 123 columns, each row represents a single
country in a specific year.
## [1] 1458 123
Exercise 2
hfi_2016 <- hfi %>%
filter(year == 2016) %>%
select(pf_score, pf_expression_control, hf_score, region, countries, ISO_code)
head(hfi_2016)
## # A tibble: 6 × 6
## pf_score pf_expression_control hf_score region countries ISO_code
## <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 7.60 5.25 7.57 Eastern Europe Albania ALB
## 2 5.28 4 5.14 Middle East & Nort… Algeria DZA
## 3 6.11 2.5 5.64 Sub-Saharan Africa Angola AGO
## 4 8.10 5.5 6.47 Latin America & th… Argentina ARG
## 5 6.91 4.25 7.24 Caucasus & Central… Armenia ARM
## 6 9.18 7.75 8.58 Oceania Australia AUS
Exercise 3
The form of the relationship is linear, the direction is positive,
and the strength is strong, with a correlation of 0.845. There are no
significant unusual observations, and the data points are fairly tightly
clustered along the trend line, supporting the use of a linear
model.
ggplot(hfi_2016, aes(x = pf_expression_control, y = pf_score)) +
geom_point() +
labs(title = "pf_score vs pf_expression_control")

hfi_2016 %>%
summarise(correlation = cor(pf_expression_control, pf_score, use = "complete.obs"))
## # A tibble: 1 × 1
## correlation
## <dbl>
## 1 0.845
Exercise 4
The closer the fitted line is to the actual data points, the lower
the sum of squared residuals (SSR).
plot_ss(x = pf_expression_control, y = pf_score, data = hfi_2016, showSquares = TRUE)

## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## 4.2838 0.5418
##
## Sum of Squares: 102.213
Exercise 5
The regression model shows that for each 1-unit increase in
expression control, the personal freedom score increases by about 0.542
points. This indicates a strong, positive linear relationship between
the two variables.
m1 <- lm(pf_score ~ pf_expression_control, data = hfi_2016)
tidy(m1)
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.28 0.149 28.8 4.23e-65
## 2 pf_expression_control 0.542 0.0271 20.0 2.31e-45
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.714 0.712 0.799 400. 2.31e-45 1 -193. 391. 400.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Exercise 6
The predicted personal freedom score for a country with a
pf_expression_control of 3 is approximately 5.91. If the actual score
is, for example, 5.20, then the residual is −0.71, indicating the model
overestimated the true value. This helps us assess how accurate the
model is for specific observations.
ggplot(hfi_2016, aes(x = pf_expression_control, y = pf_score)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

predicted <- predict(m1, newdata = data.frame(pf_expression_control = 3))
actual <- hfi_2016$pf_score[hfi_2016$pf_expression_control == 3]
residual <- actual - predicted
predict(m1, newdata = data.frame(pf_expression_control = 3))
## 1
## 5.909351
## [1] 5.465632
## 1
## -0.4437186
Exercise 7
Using the regression line, the predicted personal freedom score for a
country with a pf_expression_control score of 3 is approximately 5.91.
If the actual score is 5.20, the model overestimates the value by 0.71.
This results in a residual of −0.71, which indicates that the model’s
prediction was too high.
m1_aug <- augment(m1)
ggplot(m1_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
xlab("Fitted values") +
ylab("Residuals")

Exercise 8
There is no apparent skew or strong pattern in the histogram of
residuals, and the shape appears roughly symmetric and bell-shaped. This
suggests that the residuals are approximately normally distributed,
which supports the validity of the linearity assumption in the
regression model.
ggplot(data = m1_aug, aes(x = .resid)) +
geom_histogram(binwidth = 0.25, fill = "skyblue") +
labs(title = "Histogram of Residuals", x = "Residuals", y = "Count")

Exercise 9
The histogram looks roughly symmetric and bell-shaped, so the nearly
normal residuals condition is satisfied.
Exercise 10
Based on the residuals vs. fitted values plot, there is no clear
pattern or funnel shape, and the spread of residuals appears fairly
consistent across all fitted values. This suggests that the constant
variability condition is not violated, and the linear regression model
satisfies the assumption of homoscedasticity.
m2 <- lm(pf_score ~ hf_score, data = hfi_2016)
glance(m2)$r.squared
## [1] 0.8978487
LS0tCnRpdGxlOiAiTGFiIDE6IEludHJvIHRvIFIiCmF1dGhvcjogIkF1dGhvciBOYW1lIgpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0Ci0tLQoKYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkob3BlbmludHJvKQpsaWJyYXJ5KHN0YXRzcikKbGlicmFyeShicm9vbSkKbGlicmFyeShnZ3Bsb3QyKQpgYGAKCiMjIyBFeGVyY2lzZSAxCgpUaGVyZSBhcmUgMTQ1OCByb3dzIGFuZCAxMjMgY29sdW1ucywgZWFjaCByb3cgcmVwcmVzZW50cyBhIHNpbmdsZSBjb3VudHJ5IGluIGEgc3BlY2lmaWMgeWVhci4KCmBgYHtyIHZpZXctZ2lybHMtY291bnRzfQpkYXRhKGhmaSkKZGltKGhmaSkKYGBgCgoKIyMjIEV4ZXJjaXNlIDIKCmBgYHtyIHRyZW5kLWdpcmxzfQpoZmlfMjAxNiA8LSBoZmkgJT4lCiAgZmlsdGVyKHllYXIgPT0gMjAxNikgJT4lCiAgc2VsZWN0KHBmX3Njb3JlLCBwZl9leHByZXNzaW9uX2NvbnRyb2wsIGhmX3Njb3JlLCByZWdpb24sIGNvdW50cmllcywgSVNPX2NvZGUpCmhlYWQoaGZpXzIwMTYpCmBgYAoKCiMjIyBFeGVyY2lzZSAzCgpUaGUgZm9ybSBvZiB0aGUgcmVsYXRpb25zaGlwIGlzIGxpbmVhciwgdGhlIGRpcmVjdGlvbiBpcyBwb3NpdGl2ZSwgYW5kIHRoZSBzdHJlbmd0aCBpcyBzdHJvbmcsIHdpdGggYSBjb3JyZWxhdGlvbiBvZiAwLjg0NS4gVGhlcmUgYXJlIG5vIHNpZ25pZmljYW50IHVudXN1YWwgb2JzZXJ2YXRpb25zLCBhbmQgdGhlIGRhdGEgcG9pbnRzIGFyZSBmYWlybHkgdGlnaHRseSBjbHVzdGVyZWQgYWxvbmcgdGhlIHRyZW5kIGxpbmUsIHN1cHBvcnRpbmcgdGhlIHVzZSBvZiBhIGxpbmVhciBtb2RlbC4KCmBgYHtyIHBsb3QtcHJvcC1ib3lzLWFyYnV0aG5vdH0KZ2dwbG90KGhmaV8yMDE2LCBhZXMoeCA9IHBmX2V4cHJlc3Npb25fY29udHJvbCwgeSA9IHBmX3Njb3JlKSkgKwogIGdlb21fcG9pbnQoKSArCiAgbGFicyh0aXRsZSA9ICJwZl9zY29yZSB2cyBwZl9leHByZXNzaW9uX2NvbnRyb2wiKQoKaGZpXzIwMTYgJT4lCiAgc3VtbWFyaXNlKGNvcnJlbGF0aW9uID0gY29yKHBmX2V4cHJlc3Npb25fY29udHJvbCwgcGZfc2NvcmUsIHVzZSA9ICJjb21wbGV0ZS5vYnMiKSkKYGBgCgoKIyMjIEV4ZXJjaXNlIDQKClRoZSBjbG9zZXIgdGhlIGZpdHRlZCBsaW5lIGlzIHRvIHRoZSBhY3R1YWwgZGF0YSBwb2ludHMsIHRoZSBsb3dlciB0aGUgc3VtIG9mIHNxdWFyZWQgcmVzaWR1YWxzIChTU1IpLgoKYGBge3IgZGltLXByZXNlbnR9CnBsb3Rfc3MoeCA9IHBmX2V4cHJlc3Npb25fY29udHJvbCwgeSA9IHBmX3Njb3JlLCBkYXRhID0gaGZpXzIwMTYsIHNob3dTcXVhcmVzID0gVFJVRSkKYGBgCgoKIyMjIEV4ZXJjaXNlIDUKClRoZSByZWdyZXNzaW9uIG1vZGVsIHNob3dzIHRoYXQgZm9yIGVhY2ggMS11bml0IGluY3JlYXNlIGluIGV4cHJlc3Npb24gY29udHJvbCwgdGhlIHBlcnNvbmFsIGZyZWVkb20gc2NvcmUgaW5jcmVhc2VzIGJ5IGFib3V0IDAuNTQyIHBvaW50cy4gVGhpcyBpbmRpY2F0ZXMgYSBzdHJvbmcsIHBvc2l0aXZlIGxpbmVhciByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcy4KCmBgYHtyIGNvdW50LWNvbXBhcmV9Cm0xIDwtIGxtKHBmX3Njb3JlIH4gcGZfZXhwcmVzc2lvbl9jb250cm9sLCBkYXRhID0gaGZpXzIwMTYpCnRpZHkobTEpCmdsYW5jZShtMSkKYGBgCgoKIyMjIEV4ZXJjaXNlIDYKClRoZSBwcmVkaWN0ZWQgcGVyc29uYWwgZnJlZWRvbSBzY29yZSBmb3IgYSBjb3VudHJ5IHdpdGggYSBwZl9leHByZXNzaW9uX2NvbnRyb2wgb2YgMyBpcyBhcHByb3hpbWF0ZWx5IDUuOTEuIElmIHRoZSBhY3R1YWwgc2NvcmUgaXMsIGZvciBleGFtcGxlLCA1LjIwLCB0aGVuIHRoZSByZXNpZHVhbCBpcyDiiJIwLjcxLCBpbmRpY2F0aW5nIHRoZSBtb2RlbCBvdmVyZXN0aW1hdGVkIHRoZSB0cnVlIHZhbHVlLiBUaGlzIGhlbHBzIHVzIGFzc2VzcyBob3cgYWNjdXJhdGUgdGhlIG1vZGVsIGlzIGZvciBzcGVjaWZpYyBvYnNlcnZhdGlvbnMuCgpgYGB7ciBwbG90LXByb3AtYm95cy1wcmVzZW50fQpnZ3Bsb3QoaGZpXzIwMTYsIGFlcyh4ID0gcGZfZXhwcmVzc2lvbl9jb250cm9sLCB5ID0gcGZfc2NvcmUpKSArCiAgZ2VvbV9wb2ludCgpICsKICBnZW9tX3Ntb290aChtZXRob2QgPSAibG0iLCBzZSA9IEZBTFNFKQoKcHJlZGljdGVkIDwtIHByZWRpY3QobTEsIG5ld2RhdGEgPSBkYXRhLmZyYW1lKHBmX2V4cHJlc3Npb25fY29udHJvbCA9IDMpKQphY3R1YWwgPC0gaGZpXzIwMTYkcGZfc2NvcmVbaGZpXzIwMTYkcGZfZXhwcmVzc2lvbl9jb250cm9sID09IDNdCnJlc2lkdWFsIDwtIGFjdHVhbCAtIHByZWRpY3RlZAoKcHJlZGljdChtMSwgbmV3ZGF0YSA9IGRhdGEuZnJhbWUocGZfZXhwcmVzc2lvbl9jb250cm9sID0gMykpCgpwcmludChhY3R1YWwpCnByaW50KHJlc2lkdWFsKQpgYGAKCgojIyMgRXhlcmNpc2UgNwoKVXNpbmcgdGhlIHJlZ3Jlc3Npb24gbGluZSwgdGhlIHByZWRpY3RlZCBwZXJzb25hbCBmcmVlZG9tIHNjb3JlIGZvciBhIGNvdW50cnkgd2l0aCBhIHBmX2V4cHJlc3Npb25fY29udHJvbCBzY29yZSBvZiAzIGlzIGFwcHJveGltYXRlbHkgNS45MS4gSWYgdGhlIGFjdHVhbCBzY29yZSBpcyA1LjIwLCB0aGUgbW9kZWwgb3ZlcmVzdGltYXRlcyB0aGUgdmFsdWUgYnkgMC43MS4gVGhpcyByZXN1bHRzIGluIGEgcmVzaWR1YWwgb2Yg4oiSMC43MSwgd2hpY2ggaW5kaWNhdGVzIHRoYXQgdGhlIG1vZGVsJ3MgcHJlZGljdGlvbiB3YXMgdG9vIGhpZ2guCgpgYGB7ciBmaW5kLW1heC10b3RhbH0KbTFfYXVnIDwtIGF1Z21lbnQobTEpCgpnZ3Bsb3QobTFfYXVnLCBhZXMoeCA9IC5maXR0ZWQsIHkgPSAucmVzaWQpKSArCiAgZ2VvbV9wb2ludCgpICsKICBnZW9tX2hsaW5lKHlpbnRlcmNlcHQgPSAwLCBsaW5ldHlwZSA9ICJkYXNoZWQiLCBjb2xvciA9ICJyZWQiKSArCiAgeGxhYigiRml0dGVkIHZhbHVlcyIpICsKICB5bGFiKCJSZXNpZHVhbHMiKQpgYGAKCgojIyMgRXhlcmNpc2UgOCAKClRoZXJlIGlzIG5vIGFwcGFyZW50IHNrZXcgb3Igc3Ryb25nIHBhdHRlcm4gaW4gdGhlIGhpc3RvZ3JhbSBvZiByZXNpZHVhbHMsIGFuZCB0aGUgc2hhcGUgYXBwZWFycyByb3VnaGx5IHN5bW1ldHJpYyBhbmQgYmVsbC1zaGFwZWQuIFRoaXMgc3VnZ2VzdHMgdGhhdCB0aGUgcmVzaWR1YWxzIGFyZSBhcHByb3hpbWF0ZWx5IG5vcm1hbGx5IGRpc3RyaWJ1dGVkLCB3aGljaCBzdXBwb3J0cyB0aGUgdmFsaWRpdHkgb2YgdGhlIGxpbmVhcml0eSBhc3N1bXB0aW9uIGluIHRoZSByZWdyZXNzaW9uIG1vZGVsLgoKYGBge3J9CmdncGxvdChkYXRhID0gbTFfYXVnLCBhZXMoeCA9IC5yZXNpZCkpICsKICBnZW9tX2hpc3RvZ3JhbShiaW53aWR0aCA9IDAuMjUsIGZpbGwgPSAic2t5Ymx1ZSIpICsKICBsYWJzKHRpdGxlID0gIkhpc3RvZ3JhbSBvZiBSZXNpZHVhbHMiLCB4ID0gIlJlc2lkdWFscyIsIHkgPSAiQ291bnQiKQpgYGAKCgojIyMgRXhlcmNpc2UgOQoKVGhlIGhpc3RvZ3JhbSBsb29rcyByb3VnaGx5IHN5bW1ldHJpYyBhbmQgYmVsbC1zaGFwZWQsIHNvIHRoZSBuZWFybHkgbm9ybWFsIHJlc2lkdWFscyBjb25kaXRpb24gaXMgc2F0aXNmaWVkLgoKIyMjIEV4ZXJjaXNlIDEwCgpCYXNlZCBvbiB0aGUgcmVzaWR1YWxzIHZzLiBmaXR0ZWQgdmFsdWVzIHBsb3QsIHRoZXJlIGlzIG5vIGNsZWFyIHBhdHRlcm4gb3IgZnVubmVsIHNoYXBlLCBhbmQgdGhlIHNwcmVhZCBvZiByZXNpZHVhbHMgYXBwZWFycyBmYWlybHkgY29uc2lzdGVudCBhY3Jvc3MgYWxsIGZpdHRlZCB2YWx1ZXMuIFRoaXMgc3VnZ2VzdHMgdGhhdCB0aGUgY29uc3RhbnQgdmFyaWFiaWxpdHkgY29uZGl0aW9uIGlzIG5vdCB2aW9sYXRlZCwgYW5kIHRoZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCBzYXRpc2ZpZXMgdGhlIGFzc3VtcHRpb24gb2YgaG9tb3NjZWRhc3RpY2l0eS4KCmBgYHtyfQptMiA8LSBsbShwZl9zY29yZSB+IGhmX3Njb3JlLCBkYXRhID0gaGZpXzIwMTYpCmdsYW5jZShtMikkci5zcXVhcmVkCmBgYA==