Project 1

Author

Davi

World Freedom Index

Cato institute, from their own website is a public policy research organization—or think tank—that creates a presence for and promotes libertarian ideas in policy debates.

It has more then 40 years of research and development in understanding the world and analysing it to help it be better. They stand to “individual liberty, limited government and peace.”

They release a anual report on the world, with indexes and comparisons for every contry and region.

This project will analyse 2023 report and try to extract cohelation between the economic freedoms and the personal freedoms, with a more atributed goal into freedom of speech.

Summury

The data set is large, and a initial critic is the lack of explanation to the data set points. Another strange thing is that the 2023 report has only 2021 and older information, at least on the csv file.

I chose to use the final indexes for all the major criterias of the dataset. That beeing: Rule of law, Security and safety, Movement; Freedom of religion; Association, assembly, and civil society; Expression and information; Relationships; Size of government; Sound money; Legal system and property rights; Freedom to trade internationally; Regulation.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(htmltools)
library(ggplot2)
library(forecast)
Warning: package 'forecast' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(plotly)
Warning: package 'plotly' was built under R version 4.3.3

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(ggfortify)
Warning: package 'ggfortify' was built under R version 4.3.3
Registered S3 methods overwritten by 'ggfortify':
  method                 from    
  autoplot.Arima         forecast
  autoplot.acf           forecast
  autoplot.ar            forecast
  autoplot.bats          forecast
  autoplot.decomposed.ts forecast
  autoplot.ets           forecast
  autoplot.forecast      forecast
  autoplot.stl           forecast
  autoplot.ts            forecast
  fitted.ar              forecast
  fortify.ts             forecast
  residuals.ar           forecast
initial <- read_csv('2023-Human-Freedom-Index-Data.csv')
Rows: 3630 Columns: 146
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (5): iso, countries, region, ef_government_tax_income_data, ef_governm...
dbl (141): year, hf_score, hf_rank, hf_quartile, pf_rol_procedural, pf_rol_c...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(initial)
# A tibble: 6 × 146
   year iso   countries region    hf_score hf_rank hf_quartile pf_rol_procedural
  <dbl> <chr> <chr>     <chr>        <dbl>   <dbl>       <dbl>             <dbl>
1  2021 ALB   Albania   Eastern …     7.67      49           2                NA
2  2021 DZA   Algeria   Middle E…     4.82     155           4                NA
3  2021 AGO   Angola    Sub-Saha…     5.76     122           3                NA
4  2021 ARG   Argentina Latin Am…     6.85      77           2                NA
5  2021 ARM   Armenia   Caucasus…     7.99      33           1                NA
6  2021 AUS   Australia Oceania       8.52      14           1                NA
# ℹ 138 more variables: pf_rol_civil <dbl>, pf_rol_criminal <dbl>,
#   pf_rol_vdem <dbl>, pf_rol <dbl>, pf_ss_homicide <dbl>,
#   pf_ss_homicide_data <dbl>, pf_ss_disappearances_disap <dbl>,
#   pf_ss_disappearances_violent <dbl>,
#   pf_ss_disappearances_violent_data <dbl>,
#   pf_ss_disappearances_organized <dbl>,
#   pf_ss_disappearances_fatalities <dbl>, …

Lets remove the excess information in the dataset:

initial2 <-  select(initial, year, countries, region, hf_score, hf_rank,    pf_rol, pf_ss,  pf_movement,    pf_religion,    pf_assembly,    pf_expression,  pf_identity,    pf_score,   pf_rank, ef_government, ef_legal,   ef_money,   ef_trade,   ef_regulation,  ef_score,   ef_rank
 )
head(initial2)
# A tibble: 6 × 21
   year countries region   hf_score hf_rank pf_rol pf_ss pf_movement pf_religion
  <dbl> <chr>     <chr>       <dbl>   <dbl>  <dbl> <dbl>       <dbl>       <dbl>
1  2021 Albania   Eastern…     7.67      49   4.86  9.24        6.58        9.76
2  2021 Algeria   Middle …     4.82     155   4.43  8.75        5.41        4.87
3  2021 Angola    Sub-Sah…     5.76     122   3.44  8.47        5.83        6.75
4  2021 Argentina Latin A…     6.85      77   5.19  8.58        7.72        9.49
5  2021 Armenia   Caucasu…     7.99      33   7.01  9.27        8.2         8.58
6  2021 Australia Oceania      8.52      14   7.58  9.77        6.24        9.86
# ℹ 12 more variables: pf_assembly <dbl>, pf_expression <dbl>,
#   pf_identity <dbl>, pf_score <dbl>, pf_rank <dbl>, ef_government <dbl>,
#   ef_legal <dbl>, ef_money <dbl>, ef_trade <dbl>, ef_regulation <dbl>,
#   ef_score <dbl>, ef_rank <dbl>

There are still so many values, lets segragat only the last year of the dataset witch is 2021, and analyse it. One possible evolution to the project is to see the fisrt year and also the evolution in the 20 years comparing them.

end <- initial2[initial2$year == 2021,]
 
head(end)
# A tibble: 6 × 21
   year countries region   hf_score hf_rank pf_rol pf_ss pf_movement pf_religion
  <dbl> <chr>     <chr>       <dbl>   <dbl>  <dbl> <dbl>       <dbl>       <dbl>
1  2021 Albania   Eastern…     7.67      49   4.86  9.24        6.58        9.76
2  2021 Algeria   Middle …     4.82     155   4.43  8.75        5.41        4.87
3  2021 Angola    Sub-Sah…     5.76     122   3.44  8.47        5.83        6.75
4  2021 Argentina Latin A…     6.85      77   5.19  8.58        7.72        9.49
5  2021 Armenia   Caucasu…     7.99      33   7.01  9.27        8.2         8.58
6  2021 Australia Oceania      8.52      14   7.58  9.77        6.24        9.86
# ℹ 12 more variables: pf_assembly <dbl>, pf_expression <dbl>,
#   pf_identity <dbl>, pf_score <dbl>, pf_rank <dbl>, ef_government <dbl>,
#   ef_legal <dbl>, ef_money <dbl>, ef_trade <dbl>, ef_regulation <dbl>,
#   ef_score <dbl>, ef_rank <dbl>

We can’t clearly that there are NAs values in the countries but lets clean it up no matther what.

For the furute project it is interresting to see that the earlyer the data the more NAs it has and so it is interresting to use for those the most distant one.

endnona <- end |>
  filter(!is.na(hf_score) & !is.na(hf_rank)& !is.na(pf_rol)& !is.na(pf_ss)& !is.na(ef_money)& !is.na(pf_score)& !is.na(ef_legal)) 
# remove na's for distance and arr_delay

head(endnona)
# A tibble: 6 × 21
   year countries region   hf_score hf_rank pf_rol pf_ss pf_movement pf_religion
  <dbl> <chr>     <chr>       <dbl>   <dbl>  <dbl> <dbl>       <dbl>       <dbl>
1  2021 Albania   Eastern…     7.67      49   4.86  9.24        6.58        9.76
2  2021 Algeria   Middle …     4.82     155   4.43  8.75        5.41        4.87
3  2021 Angola    Sub-Sah…     5.76     122   3.44  8.47        5.83        6.75
4  2021 Argentina Latin A…     6.85      77   5.19  8.58        7.72        9.49
5  2021 Armenia   Caucasu…     7.99      33   7.01  9.27        8.2         8.58
6  2021 Australia Oceania      8.52      14   7.58  9.77        6.24        9.86
# ℹ 12 more variables: pf_assembly <dbl>, pf_expression <dbl>,
#   pf_identity <dbl>, pf_score <dbl>, pf_rank <dbl>, ef_government <dbl>,
#   ef_legal <dbl>, ef_money <dbl>, ef_trade <dbl>, ef_regulation <dbl>,
#   ef_score <dbl>, ef_rank <dbl>

Linear Regretion analyses

library(psych)
Warning: package 'psych' was built under R version 4.3.3

Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':

    %+%, alpha
pairs.panels(endnona[4:21], # plot distributions and correlations for all the data
gap = 0,
pch = 21,
lm = TRUE)

The data is separated beteen a initial values of personal freedoms and the final informations from economical freedoms, and so the left top and bottom left of the graft has a lot of colinearity becouse they talk about basacly the same informations, the greatests finds would be in the top right that would tell us about the relationships between economic and personal freedoms.

Thourg a analyses, we believe that since our purpuse is to analyse freedom of speech witch is the 8th row and line, the greatest relations found are are: with economic general score of 0.58 and economic legal freedom and garanty, 0.6. Some of them are redundent, like, ranks ans scores in the same field or in the humanitary filds.

But one very interresting finding is that almost everything is correspondent with each other, the corresponding numbers are almost all high one notable exception is govement spending, it seems almost irrelevent to it all.

#Model 1 for freedom of speech

The first application will only use, the highst values of economic freedom:

fit2 <- lm(pf_expression ~ ef_score + ef_legal, data = endnona)

summary(fit2)

Call:
lm(formula = pf_expression ~ ef_score + ef_legal, data = endnona)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7728 -0.9743  0.4697  1.0874  3.9268 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.8759     1.0799  -0.811 0.418488    
ef_score      0.5911     0.2503   2.362 0.019387 *  
ef_legal      0.5679     0.1570   3.617 0.000398 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.899 on 162 degrees of freedom
Multiple R-squared:  0.381, Adjusted R-squared:  0.3733 
F-statistic: 49.85 on 2 and 162 DF,  p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow=2, ncol=2)

If gives us a verry good relation btween ecnomonic legal and garanties, but has a verry low R-squared but a great p-value.

Next lets see with all econonomic indexes except ranks.

hf_score, hf_rank, pf_rol, pf_ss, pf_movement, pf_religion, pf_assembly, pf_expression, pf_identity, pf_score, ef_government, ef_legal, ef_money, ef_trade, ef_regulation, ef_score, ef_rank

fit2 <- lm(pf_expression ~ ef_score + ef_legal + ef_trade + ef_money + ef_regulation + ef_government, data = endnona)

summary(fit2)

Call:
lm(formula = pf_expression ~ ef_score + ef_legal + ef_trade + 
    ef_money + ef_regulation + ef_government, data = endnona)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.2042 -0.7467  0.2632  1.0422  3.5678 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.9927     1.2537   0.792  0.42962    
ef_score       10.4380     1.8635   5.601 9.23e-08 ***
ef_legal       -1.2968     0.3948  -3.285  0.00126 ** 
ef_trade       -2.0449     0.4449  -4.597 8.74e-06 ***
ef_money       -2.2862     0.3877  -5.897 2.18e-08 ***
ef_regulation  -1.9003     0.4592  -4.138 5.68e-05 ***
ef_government  -1.8521     0.4014  -4.614 8.12e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.732 on 158 degrees of freedom
Multiple R-squared:  0.4974,    Adjusted R-squared:  0.4783 
F-statistic: 26.06 on 6 and 158 DF,  p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow=2, ncol=2)

If gives us a verry good relation btween ecnomonic legal and garanties, aldough it has incresed, it still has a low R-squared but a great p-value.

Next lets see with all indexes except ranks.

hf_score, hf_rank, pf_rol, pf_ss, pf_movement, pf_religion, pf_assembly, pf_expression, pf_identity, pf_score, ef_government, ef_legal, ef_money, ef_trade, ef_regulation, ef_score, ef_rank

fit2 <- lm(pf_expression ~ ef_score + ef_legal + ef_trade + ef_money + ef_regulation + ef_government  + pf_rol + pf_ss + pf_movement + pf_religion + pf_assembly  + pf_identity + pf_score, data = endnona)

summary(fit2)

Call:
lm(formula = pf_expression ~ ef_score + ef_legal + ef_trade + 
    ef_money + ef_regulation + ef_government + pf_rol + pf_ss + 
    pf_movement + pf_religion + pf_assembly + pf_identity + pf_score, 
    data = endnona)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.047025 -0.017217 -0.000003  0.017900  0.038519 

Coefficients:
               Estimate Std. Error  t value Pr(>|t|)    
(Intercept)   -0.015036   0.017474   -0.860    0.391    
ef_score      -0.040179   0.029832   -1.347    0.180    
ef_legal       0.010531   0.006363    1.655    0.100    
ef_trade       0.008609   0.006462    1.332    0.185    
ef_money       0.007715   0.006176    1.249    0.214    
ef_regulation  0.005847   0.006798    0.860    0.391    
ef_government  0.009141   0.006192    1.476    0.142    
pf_rol        -1.001843   0.003328 -301.013   <2e-16 ***
pf_ss         -0.997200   0.002631 -378.978   <2e-16 ***
pf_movement   -0.996939   0.002754 -361.934   <2e-16 ***
pf_religion   -0.996922   0.003157 -315.785   <2e-16 ***
pf_assembly   -0.996911   0.003644 -273.560   <2e-16 ***
pf_identity   -0.997905   0.002307 -432.478   <2e-16 ***
pf_score       6.987773   0.015056  464.127   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02131 on 151 degrees of freedom
Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9999 
F-statistic: 1.598e+05 on 13 and 151 DF,  p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow=2, ncol=2)

Well, a verry nice 99 adjusted R squared, but thi is still a verry large codirected values.

Other cohelations

Economic Freedom from personal freedom

Listening score from all indexes

fit2 <- lm(ef_score ~ pf_rol + pf_ss + pf_movement + pf_religion + pf_assembly  + pf_identity + pf_score + pf_expression, data = endnona)

summary(fit2)

Call:
lm(formula = ef_score ~ pf_rol + pf_ss + pf_movement + pf_religion + 
    pf_assembly + pf_identity + pf_score + pf_expression, data = endnona)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3526 -0.3213  0.0285  0.3634  1.4167 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.5918     0.3441  10.439   <2e-16 ***
pf_rol          0.4581     2.6147   0.175    0.861    
pf_ss           0.2973     2.6039   0.114    0.909    
pf_movement     0.2590     2.6036   0.099    0.921    
pf_religion     0.1256     2.6052   0.048    0.962    
pf_assembly     0.2236     2.6059   0.086    0.932    
pf_identity     0.2778     2.6076   0.107    0.915    
pf_score       -1.3868    18.2467  -0.076    0.940    
pf_expression   0.2482     2.6092   0.095    0.924    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6907 on 156 degrees of freedom
Multiple R-squared:  0.5761,    Adjusted R-squared:  0.5544 
F-statistic:  26.5 on 8 and 156 DF,  p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow=2, ncol=2)

A high value of 0.55 in the adjutes R, a p-value of 0.

Personal Freedom From Economic freedom

fit2 <- lm(pf_score ~ ef_score + ef_legal + ef_trade + ef_money + ef_regulation + ef_government , data = endnona)

summary(fit2)

Call:
lm(formula = pf_score ~ ef_score + ef_legal + ef_trade + ef_money + 
    ef_regulation + ef_government, data = endnona)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9233 -0.4647  0.1576  0.5453  2.7521 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     2.6424     0.7372   3.584  0.00045 ***
ef_score        8.3158     1.0958   7.589 2.63e-12 ***
ef_legal       -1.2186     0.2322  -5.249 4.87e-07 ***
ef_trade       -1.5184     0.2616  -5.804 3.44e-08 ***
ef_money       -1.7843     0.2280  -7.826 6.79e-13 ***
ef_regulation  -1.4259     0.2700  -5.280 4.20e-07 ***
ef_government  -1.5578     0.2361  -6.600 5.92e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.019 on 158 degrees of freedom
Multiple R-squared:  0.633, Adjusted R-squared:  0.619 
F-statistic: 45.41 on 6 and 158 DF,  p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow=2, ncol=2)

A high value of 0.619 in the adjutes R, a p-value of 0 and a residual of 1.019, witch is also great. All showing that their is a meaningfull cohelation between it all.

Plots

p2 <- ggplot(endnona, aes(x = pf_score, y = ef_score, size=ef_government, text = paste("Country:", countries, "\nregion:", region, "\npersonal freedom:", pf_score,"\neconomic fredom:", ef_score,"\nfreedom of speech:", pf_expression))) +
labs(title = "Countries Economic Freddom vs Personal Freedom indexes",
caption = "Source: CATO institute") +
xlab("Economic fredom index") +
ylab ("Personal fredom index") +
 geom_point(aes(color=region)) + scale_color_brewer(palette = "Paired")+theme_bw()
p2

p2 <- ggplotly(p2)
p2

Final

The Data set is very interesting, I initialy had bigger plans for the project, plans such as comparing the evolution of the index as years passed by for every country. But the process seemed to overwhelm me as the timed grew slim. The Dataset is very complete, at least in the newest part of it, this is one of the reasons I chose it, it would have more countries in the hole project. The cleaning process was only to first separate the initial indexes that delivered the middle one that then would delivered As and personal freedom index from the last 2, taking 147 variables to 21. Then separating only 2021, and then taking possible NAs. I got surprised by how would it all coherence, I believed it would but not as it is, I surprise was how the government would not be correlated to anything in a major way. The process really got problematic in two parts, understanding all the variables and choosing them, the CATO institute does not have a summery of them, and so I wasted a long time unedrstanding it all out in their reports, and the other was because I chose to use many variables and so it was very difficult to find good visualizations for the linear model analysis, and that time made I deviate from the original plan of seen the 20 years impact in the indexes for the countries.