The data used in this analysis are from the Human Freedom Index 2023
data <- read_csv("https://raw.githubusercontent.com/Heleinef/Data-Science-Master_Heleine/main/Human%20Freedom%20Index_2023.csv")
## Rows: 495 Columns: 146
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): iso, countries, region, ef_government_tax_income_data, ef_governm...
## dbl (139): year, hf_score, hf_rank, hf_quartile, pf_rol_procedural, pf_rol_c...
## lgl (2): pf_identity_inheritance_widows, pf_identity_inheritance_daughters
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hfi <- data
hfi
## # A tibble: 495 × 146
## year iso countries region hf_score hf_rank hf_quartile pf_rol_procedural
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2021 ALB Albania Easte… 7.67 49 2 NA
## 2 2021 DZA Algeria Middl… 4.82 155 4 NA
## 3 2021 AGO Angola Sub-S… 5.76 122 3 NA
## 4 2021 ARG Argentina Latin… 6.85 77 2 NA
## 5 2021 ARM Armenia Cauca… 7.99 33 1 NA
## 6 2021 AUS Australia Ocean… 8.52 14 1 NA
## 7 2021 AUT Austria Weste… 8.24 24 1 NA
## 8 2021 AZE Azerbaijan Cauca… 5.65 126 4 NA
## 9 2021 BHS Bahamas, T… Latin… 7.82 41 2 NA
## 10 2021 BHR Bahrain Middl… 5.47 137 4 NA
## # ℹ 485 more rows
## # ℹ 138 more variables: pf_rol_civil <dbl>, pf_rol_criminal <dbl>,
## # pf_rol_vdem <dbl>, pf_rol <dbl>, pf_ss_homicide <dbl>,
## # pf_ss_homicide_data <dbl>, pf_ss_disappearances_disap <dbl>,
## # pf_ss_disappearances_violent <dbl>,
## # pf_ss_disappearances_violent_data <dbl>,
## # pf_ss_disappearances_organized <dbl>, …
dim(hfi)
## [1] 495 146
head(hfi)
## # A tibble: 6 × 146
## year iso countries region hf_score hf_rank hf_quartile pf_rol_procedural
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2021 ALB Albania Eastern … 7.67 49 2 NA
## 2 2021 DZA Algeria Middle E… 4.82 155 4 NA
## 3 2021 AGO Angola Sub-Saha… 5.76 122 3 NA
## 4 2021 ARG Argentina Latin Am… 6.85 77 2 NA
## 5 2021 ARM Armenia Caucasus… 7.99 33 1 NA
## 6 2021 AUS Australia Oceania 8.52 14 1 NA
## # ℹ 138 more variables: pf_rol_civil <dbl>, pf_rol_criminal <dbl>,
## # pf_rol_vdem <dbl>, pf_rol <dbl>, pf_ss_homicide <dbl>,
## # pf_ss_homicide_data <dbl>, pf_ss_disappearances_disap <dbl>,
## # pf_ss_disappearances_violent <dbl>,
## # pf_ss_disappearances_violent_data <dbl>,
## # pf_ss_disappearances_organized <dbl>,
## # pf_ss_disappearances_fatalities <dbl>, …
names(hfi)
## [1] "year"
## [2] "iso"
## [3] "countries"
## [4] "region"
## [5] "hf_score"
## [6] "hf_rank"
## [7] "hf_quartile"
## [8] "pf_rol_procedural"
## [9] "pf_rol_civil"
## [10] "pf_rol_criminal"
## [11] "pf_rol_vdem"
## [12] "pf_rol"
## [13] "pf_ss_homicide"
## [14] "pf_ss_homicide_data"
## [15] "pf_ss_disappearances_disap"
## [16] "pf_ss_disappearances_violent"
## [17] "pf_ss_disappearances_violent_data"
## [18] "pf_ss_disappearances_organized"
## [19] "pf_ss_disappearances_fatalities"
## [20] "pf_ss_disappearances_fatalities_data"
## [21] "pf_ss_disappearances_injuries"
## [22] "pf_ss_disappearances_injuries_data"
## [23] "pf_ss_disappearances_torture"
## [24] "pf_ss_killings"
## [25] "pf_ss_disappearances"
## [26] "pf_ss"
## [27] "pf_movement_vdem_foreign"
## [28] "pf_movement_vdem_men"
## [29] "pf_movement_vdem_women"
## [30] "pf_movement_vdem"
## [31] "pf_movement_cld"
## [32] "pf_movement"
## [33] "pf_religion_freedom_vdem"
## [34] "pf_religion_freedom_cld"
## [35] "pf_religion_freedom"
## [36] "pf_religion_suppression"
## [37] "pf_religion"
## [38] "pf_assembly_entry"
## [39] "pf_assembly_freedom_house"
## [40] "pf_assembly_freedom_bti"
## [41] "pf_assembly_freedom_cld"
## [42] "pf_assembly_freedom"
## [43] "pf_assembly_parties_barriers"
## [44] "pf_assembly_parties_bans"
## [45] "pf_assembly_parties_auton"
## [46] "pf_assembly_parties"
## [47] "pf_assembly_civil"
## [48] "pf_assembly"
## [49] "pf_expression_direct_killed"
## [50] "pf_expression_direct_killed_data"
## [51] "pf_expression_direct_jailed"
## [52] "pf_expression_direct_jailed_data"
## [53] "pf_expression_direct"
## [54] "pf_expression_vdem_cultural"
## [55] "pf_expression_vdem_harass"
## [56] "pf_expression_vdem_gov"
## [57] "pf_expression_vdem_internet"
## [58] "pf_expression_vdem_selfcens"
## [59] "pf_expression_vdem"
## [60] "pf_expression_house"
## [61] "pf_expression_bti"
## [62] "pf_expression_cld"
## [63] "pf_expression"
## [64] "pf_identity_same_m"
## [65] "pf_identity_same_f"
## [66] "pf_identity_same"
## [67] "pf_identity_divorce"
## [68] "pf_identity_inheritance_widows"
## [69] "pf_identity_inheritance_daughters"
## [70] "pf_identity_inheritance"
## [71] "pf_identity_fgm"
## [72] "pf_identity"
## [73] "pf_score"
## [74] "pf_rank"
## [75] "ef_government_consumption"
## [76] "ef_government_consumption_data"
## [77] "ef_government_transfers"
## [78] "ef_government_transfers_data"
## [79] "ef_government_investment"
## [80] "ef_government_investment_data"
## [81] "ef_government_tax_income"
## [82] "ef_government_tax_income_data"
## [83] "ef_government_tax_payroll"
## [84] "ef_government_tax_payroll_data"
## [85] "ef_government_tax"
## [86] "ef_government_soa"
## [87] "ef_government"
## [88] "ef_legal_judicial"
## [89] "ef_legal_courts"
## [90] "ef_legal_protection"
## [91] "ef_legal_military"
## [92] "ef_legal_integrity"
## [93] "ef_legal_enforcement"
## [94] "ef_legal_regulatory"
## [95] "ef_legal_police"
## [96] "ef_gender"
## [97] "ef_legal"
## [98] "ef_money_growth"
## [99] "ef_money_growth_data"
## [100] "ef_money_sd"
## [101] "ef_money_sd_data"
## [102] "ef_money_inflation"
## [103] "ef_money_inflation_data"
## [104] "ef_money_currency"
## [105] "ef_money"
## [106] "ef_trade_tariffs_revenue"
## [107] "ef_trade_tariffs_revenue_data"
## [108] "ef_trade_tariffs_mean"
## [109] "ef_trade_tariffs_mean_data"
## [110] "ef_trade_tariffs_sd"
## [111] "ef_trade_tariffs_sd_data"
## [112] "ef_trade_tariffs"
## [113] "ef_trade_regulatory_nontariff"
## [114] "ef_trade_regulatory_costs"
## [115] "ef_trade_regulatory"
## [116] "ef_trade_black"
## [117] "ef_trade_movement_open"
## [118] "ef_trade_movement_capital"
## [119] "ef_trade_movement_visit"
## [120] "ef_trade_movement_assets"
## [121] "ef_trade_movement"
## [122] "ef_trade"
## [123] "ef_regulation_credit_ownership"
## [124] "ef_regulation_credit_private"
## [125] "ef_regulation_credit_interest"
## [126] "ef_regulation_credit"
## [127] "ef_regulation_labor_minwage"
## [128] "ef_regulation_labor_firing"
## [129] "ef_regulation_labor_bargain"
## [130] "ef_regulation_labor_hours"
## [131] "ef_regulation_labor_dismissal"
## [132] "ef_regulation_labor_conscription"
## [133] "ef_regulation_labor_foreign"
## [134] "ef_regulation_labor"
## [135] "ef_regulation_business_burden"
## [136] "ef_regulation_business_costs"
## [137] "ef_regulation_business_impartial"
## [138] "ef_regulation_business_compliance"
## [139] "ef_regulation_business"
## [140] "ef_regulation_enter_openness"
## [141] "ef_regulation_enter_permits"
## [142] "ef_regulation_enter_distortion"
## [143] "ef_regulation_enter"
## [144] "ef_regulation"
## [145] "ef_score"
## [146] "ef_rank"
# Scatter plot
ggplot(hfi, aes(x = pf_score, y = ef_legal_police))+
geom_point(color = 4)
# correlation strength and direction
hfi %>%
summarise(cor(hfi$pf_score, hfi$ef_legal_police , use = "complete.obs"))
## # A tibble: 1 × 1
## `cor(hfi$pf_score, hfi$ef_legal_police, use = "complete.obs")`
## <dbl>
## 1 0.381
# Fitting the regression line
model <- lm(ef_legal_police ~ pf_score , data = hfi)
model
##
## Call:
## lm(formula = ef_legal_police ~ pf_score, data = hfi)
##
## Coefficients:
## (Intercept) pf_score
## 1.2836 0.5593
# Summary statistics of the model
summary(model)
##
## Call:
## lm(formula = ef_legal_police ~ pf_score, data = hfi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7577 -1.6531 0.0122 1.3590 6.4086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2836 0.4370 2.937 0.00347 **
## pf_score 0.5593 0.0612 9.138 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.244 on 493 degrees of freedom
## Multiple R-squared: 0.1449, Adjusted R-squared: 0.1431
## F-statistic: 83.51 on 1 and 493 DF, p-value: < 2.2e-16
# Fit our regression line
ggplot(data = hfi, aes(x = ef_legal_police, y = pf_score)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
model <- lm(ef_legal_police ~ pf_score , data = hfi)
# Get fitted values and residuals from the model
fitted_values <- fitted(model)
residuals <- resid(model)
# Create residual plot
ggplot(data = NULL, aes(x = fitted_values, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted values") +
ylab("Residuals")
3.Let’s create a quantile vs.quantile plot
# Fit a linear regression model
model <- lm(ef_legal_police ~ pf_score , data = hfi)
# Create Q-Q plot of residuals
qqnorm(resid(model))
qqline(resid(model))
model <- lm(ef_legal_police ~ pf_score , data = hfi)
model.lm <- model
# Set up the plot layout (to plot one graph at a time)
par(mfrow=c(1,1))
# Plot diagnostics for the linear regression model
plot(model.lm)
The scatter plot shows that many values are clustered far away from (but somehow around) the regression line. This indicates that our model may not be perfect and may not explain everything.
The homoscedasticity of the model cannot be confirmed either, although the residuals versus fitted values plot shows that the variability across the different values of x is almost about the same. However, that variability is not perfect and needs to be improved before we move ahead with inferential statistics.
Nonetheless, the quantile vs quantile plot shows residuals forming an almost straight line, which indicates a normal distribution with no outliers. As a result the model does not need any log transformation for normalization.