Cato institute, from their own website is a public policy research organization—or think tank—that creates a presence for and promotes libertarian ideas in policy debates.
They release a anual report on the world, with indexes and comparisons for every contry and region.
This project will analyse 2023 report and try to extract cohelation between the economic freedoms and the personal freedoms, with a more atributed goal into freedom of speech.
Summury
The data set is large, and a initial critic is the lack of explanation to the data set points. Another strange thing is that the 2023 report has only 2021 and older information, at least on the csv file.
I chose to use the final indexes for all the major criterias of the dataset. That beeing: Rule of law, Security and safety, Movement; Freedom of religion; Association, assembly, and civil society; Expression and information; Relationships; Size of government; Sound money; Legal system and property rights; Freedom to trade internationally; Regulation.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: package 'forecast' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
library(plotly)
Warning: package 'plotly' was built under R version 4.3.3
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(ggfortify)
Warning: package 'ggfortify' was built under R version 4.3.3
Rows: 3630 Columns: 146
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso, countries, region, ef_government_tax_income_data, ef_governm...
dbl (141): year, hf_score, hf_rank, hf_quartile, pf_rol_procedural, pf_rol_c...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(initial)
# A tibble: 6 × 146
year iso countries region hf_score hf_rank hf_quartile pf_rol_procedural
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2021 ALB Albania Eastern … 7.67 49 2 NA
2 2021 DZA Algeria Middle E… 4.82 155 4 NA
3 2021 AGO Angola Sub-Saha… 5.76 122 3 NA
4 2021 ARG Argentina Latin Am… 6.85 77 2 NA
5 2021 ARM Armenia Caucasus… 7.99 33 1 NA
6 2021 AUS Australia Oceania 8.52 14 1 NA
# ℹ 138 more variables: pf_rol_civil <dbl>, pf_rol_criminal <dbl>,
# pf_rol_vdem <dbl>, pf_rol <dbl>, pf_ss_homicide <dbl>,
# pf_ss_homicide_data <dbl>, pf_ss_disappearances_disap <dbl>,
# pf_ss_disappearances_violent <dbl>,
# pf_ss_disappearances_violent_data <dbl>,
# pf_ss_disappearances_organized <dbl>,
# pf_ss_disappearances_fatalities <dbl>, …
Lets remove the excess information in the dataset:
There are still so many values, lets segragat only the last year of the dataset witch is 2021, and analyse it. One possible evolution to the project is to see the fisrt year and also the evolution in the 20 years comparing them.
We can’t clearly that there are NAs values in the countries but lets clean it up no matther what.
For the furute project it is interresting to see that the earlyer the data the more NAs it has and so it is interresting to use for those the most distant one.
endnona <- end |>filter(!is.na(hf_score) &!is.na(hf_rank)&!is.na(pf_rol)&!is.na(pf_ss)&!is.na(ef_money)&!is.na(pf_score)&!is.na(ef_legal)) # remove na's for distance and arr_delayhead(endnona)
Warning: package 'psych' was built under R version 4.3.3
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
pairs.panels(endnona[4:21], # plot distributions and correlations for all the datagap =0,pch =21,lm =TRUE)
The data is separated beteen a initial values of personal freedoms and the final informations from economical freedoms, and so the left top and bottom left of the graft has a lot of colinearity becouse they talk about basacly the same informations, the greatests finds would be in the top right that would tell us about the relationships between economic and personal freedoms.
Thourg a analyses, we believe that since our purpuse is to analyse freedom of speech witch is the 8th row and line, the greatest relations found are are: with economic general score of 0.58 and economic legal freedom and garanty, 0.6. Some of them are redundent, like, ranks ans scores in the same field or in the humanitary filds.
But one very interresting finding is that almost everything is correspondent with each other, the corresponding numbers are almost all high one notable exception is govement spending, it seems almost irrelevent to it all.
#Model 1 for freedom of speech
The first application will only use, the highst values of economic freedom:
fit2 <-lm(pf_expression ~ ef_score + ef_legal, data = endnona)summary(fit2)
Call:
lm(formula = pf_expression ~ ef_score + ef_legal, data = endnona)
Residuals:
Min 1Q Median 3Q Max
-5.7728 -0.9743 0.4697 1.0874 3.9268
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8759 1.0799 -0.811 0.418488
ef_score 0.5911 0.2503 2.362 0.019387 *
ef_legal 0.5679 0.1570 3.617 0.000398 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.899 on 162 degrees of freedom
Multiple R-squared: 0.381, Adjusted R-squared: 0.3733
F-statistic: 49.85 on 2 and 162 DF, p-value: < 2.2e-16
autoplot(fit2, 1:4, nrow=2, ncol=2)
If gives us a verry good relation btween ecnomonic legal and garanties, but has a verry low R-squared but a great p-value.
Next lets see with all econonomic indexes except ranks.
A high value of 0.619 in the adjutes R, a p-value of 0 and a residual of 1.019, witch is also great. All showing that their is a meaningfull cohelation between it all.
Plots
p2 <-ggplot(endnona, aes(x = pf_score, y = ef_score, size=ef_government, text =paste("Country:", countries, "\nregion:", region, "\npersonal freedom:", pf_score,"\neconomic fredom:", ef_score,"\nfreedom of speech:", pf_expression))) +labs(title ="Countries Economic Freddom vs Personal Freedom indexes",caption ="Source: CATO institute") +xlab("Economic fredom index") +ylab ("Personal fredom index") +geom_point(aes(color=region)) +scale_color_brewer(palette ="Paired")+theme_bw()p2
p2 <-ggplotly(p2)p2
Final
The Data set is very interesting, I initialy had bigger plans for the project, plans such as comparing the evolution of the index as years passed by for every country. But the process seemed to overwhelm me as the timed grew slim. The Dataset is very complete, at least in the newest part of it, this is one of the reasons I chose it, it would have more countries in the hole project. The cleaning process was only to first separate the initial indexes that delivered the middle one that then would delivered As and personal freedom index from the last 2, taking 147 variables to 21. Then separating only 2021, and then taking possible NAs. I got surprised by how would it all coherence, I believed it would but not as it is, I surprise was how the government would not be correlated to anything in a major way. The process really got problematic in two parts, understanding all the variables and choosing them, the CATO institute does not have a summery of them, and so I wasted a long time unedrstanding it all out in their reports, and the other was because I chose to use many variables and so it was very difficult to find good visualizations for the linear model analysis, and that time made I deviate from the original plan of seen the 20 years impact in the indexes for the countries.