The Friedrich Naumann Foundation for Freedom’s Cato Institute, Fraser Institute, and Liberales Institute publish an annual study called the Human Freedom Index that provides a summary of “freedom” in a number of different nations across the world. It gauges the connections between social and economic circumstances and many forms of freedom, including political, religious, economic, and personal freedom.
In order to determine the important factors that influence freedom narratives, this lab will examine data from 2008 to 2016.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
data('hfi', package='openintro')
dim(hfi)
## [1] 1458 123
Will use a scatter plot to explore the relationship between personal freedom score and political expression control
ggplot(hfi, aes(x = pf_expression_control, y = pf_score)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 80 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 80 rows containing missing values or values outside the scale range
## (`geom_point()`).
This plot provide a more understanding whether linear relationship
exits. A linear model may be appropriate if the points form an
approximately straight line.
hfi %>%
summarise(cor(pf_expression_control, pf_score, use = "complete.obs"))
## # A tibble: 1 × 1
## `cor(pf_expression_control, pf_score, use = "complete.obs")`
## <dbl>
## 1 0.796
Here we can see that a correlation close to 1 or -1 suggests a strong linear relationship.
ggplot(hfi, aes(x = pf_expression_control, y = pf_score)) +
geom_point(alpha = 0.5) +
labs(x = "Political Pressure on Expression",
y = "Personal Freedom Score",
title = "Relationship Between Political Pressure and Personal Freedom")
## Warning: Removed 80 rows containing missing values or values outside the scale range
## (`geom_point()`).
This plot shows a clear upward trend, indicating that as political
pressure decreases (higher values on the pf_expression_control scale),
the personal freedom score increases. Numerical inspection
#installing required packages.
devtools::install_github("jbryer/DATA606")
## Skipping install of 'DATA606' from a github remote, the SHA1 (96507a85) has not changed since last install.
## Use `force = TRUE` to force installation
library(DATA606)
## Loading required package: shiny
## Loading required package: markdown
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 4th Edition. You can read this by typing
## vignette('os4') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following objects are masked from 'package:openintro':
##
## calc_streak, present, qqnormsim
## The following object is masked from 'package:utils':
##
## demo
#Load and Inspect Data
hfi <- hfi %>% filter(complete.cases(pf_expression_control, pf_score))
DATA606::plot_ss(x = hfi$pf_expression_control, y = hfi$pf_score)
## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## 4.6171 0.4914
##
## Sum of Squares: 952.153
The residual sum of squares for the regression model, 952.153, was the lowest sum of squares that could be obtained using the plot_ss interactive tool. Since the line closely resembles the real data points, this score denotes a stronger model fit. The better the regression line fits the data, the closer the sum of squares is to 0.
Fitting the leas squares regression model
m1 <- lm(pf_score ~ pf_expression_control, data = hfi)
summary(m1)
##
## Call:
## lm(formula = pf_score ~ pf_expression_control, data = hfi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8467 -0.5704 0.1452 0.6066 3.2060
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.61707 0.05745 80.36 <2e-16 ***
## pf_expression_control 0.49143 0.01006 48.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8318 on 1376 degrees of freedom
## Multiple R-squared: 0.6342, Adjusted R-squared: 0.634
## F-statistic: 2386 on 1 and 1376 DF, p-value: < 2.2e-16
Plotting with the Regression Line
ggplot(hfi, aes(x = pf_expression_control, y = pf_score)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
This plot demonstrate the actual daata and the regression line from the
model
predict(m1, newdata = data.frame(pf_expression_control = 6.7))
## 1
## 7.909663
checking linearity
ggplot(data = m1, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted values") +
ylab("Residuals")
## Normality of Residuals Confirmation
ggplot(data = m1, aes(x = .resid)) +
geom_histogram(binwidth = 0.25)
ggplot(data = m1, aes(sample = .resid)) +
stat_qq()
## More Practice
Visualizing the relationship bet payroll tax rate and personal freedom score
ggplot(data = hfi, aes(x = ef_government_tax_payroll , y = pf_score)) +
geom_point(alpha = 0.6) +
labs(
title = "Scatterplot of Payroll Tax vs Personal Freedom Score",
x = "Payroll Tax Rate",
y = "Personal Freedom Score"
)
## Warning: Removed 113 rows containing missing values or values outside the scale range
## (`geom_point()`).
Each of the plotted point represents a country.
Fitting linear model to quantify the relationship
m_tax <- lm(pf_score ~ ef_government_tax_payroll , data = hfi)
summary(m_tax)
##
## Call:
## lm(formula = pf_score ~ ef_government_tax_payroll, data = hfi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8324 -0.8937 0.0953 1.0816 2.5947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.21696 0.08518 96.47 <2e-16 ***
## ef_government_tax_payroll -0.17458 0.01417 -12.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.317 on 1263 degrees of freedom
## (113 observations deleted due to missingness)
## Multiple R-squared: 0.1073, Adjusted R-squared: 0.1066
## F-statistic: 151.9 on 1 and 1263 DF, p-value: < 2.2e-16
lm(pf_score ~ ef_government_tax_payroll, data = hfi)
##
## Call:
## lm(formula = pf_score ~ ef_government_tax_payroll, data = hfi)
##
## Coefficients:
## (Intercept) ef_government_tax_payroll
## 8.2170 -0.1746
Using data from the Human Freedom Index, this lab showed how to apply linear regression to investigate the link between various policy elements and personal freedom. The findings indicate:
With an R2 of 0.63, there is a substantial positive correlation between personal freedom and control over political discourse.
Payroll tax rates and personal freedom have a weaker but still substantial negative association (R2 = 0.11).
These results imply that personal freedom is more impacted by government rules pertaining to political expression than by payroll taxes. The assumptions of linear regression are supported by visual inspections, residual checks, and model diagnostics.