Load any other packages you’ll use in the chunk below:
library(tidyverse)
library(ggplot2)
library(readxl)
Download a publicly available dataset related to some concept (or concepts) in political science. Be sure to download the codebook associated with the dataset as well. Below are some suggested sources for datasets, but feel free to choose any dataset with at least 1 categorical and 1 continuous (or ordinal) variable.
I highly recommend working with cross-sectional data. If you download choose time-series cross sectional data you can easily make it a cross section by keeping only one time period (e.g., year).
Take a look at the codebook and load your dataset into R. Print out the first 5 rows and first 5 variables (i.e., columns) of your dataset below:
#Importar el archivo de Excel
data <- read_excel("political_data.xlsx")
#Mostrar las primeras 5 filas de las primeras 5 columnas
print(data[1:5, 1:5])
## # A tibble: 5 × 5
## year country countryn iso iso3n
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1960 Australia 1 AUS 36
## 2 1961 Australia 1 AUS 36
## 3 1962 Australia 1 AUS 36
## 4 1963 Australia 1 AUS 36
## 5 1964 Australia 1 AUS 36
The Comparative Political Data Set (CPDS) 1960-2022 is a collection of political and institutional data assembled for research projects led by Klaus Armingeon and funded by the Swiss National Science Foundation. It is designed for cross-national, longitudinal, and time-series analyses of democratic countries.
The dataset covers the years 1960 to 2022 and includes 36 OECD and/or EU-member countries. The units of analysis are country-year observations, meaning data is recorded annually for each country, but only for democratic periods.
This dataset has 335 variables.
The variable vturn measures voter turnout in national parliamentary (lower house) elections. This concept represents the level of electoral participation within a country, indicating the proportion of eligible voters who cast a ballot in an election.
The concept is operationalized as the percentage of eligible voters who participated in the election. The dataset ensures consistency by only recording turnout for the most significant election in a given year, meaning that in years with multiple elections, only the final one is included.
Choose a categorical variable from the dataset you’ve just loaded. You can use a variable with ordered categories if needed, but avoid id variables such as “country”.
The categorical variable gov_type represents the type of government in a country. It has seven categories, each indicating a different government formation based on party composition and parliamentary majority:
This variable helps classify the structure and stability of governments across countries and time.
# 1. Crear tabla de frecuencias
gov_counts <- table(data$gov_type)
# 2. Ordenar las categorías de mayor a menor
gov_counts <- sort(gov_counts, decreasing = TRUE)
# your table
# 3. Mostrar la tabla
print(gov_counts)
##
## 2 1 3 4 5 6 7
## 601 503 329 217 178 14 11
The most common type of government is 2 (Minimal Winning Coalition), with 601 observations.
This suggests that minimal winning coalitions are the most frequent form of government. It is most common with parties 2 and 3 (111 and 185 observations, respectively).
Type 1 (Single-party Majority) is also common (502 observations). It is especially frequent with party 1 (323 observations), indicating that this party frequently governs alone with a majority.
Type 6 (Caretaker) and type 7 (Technocratic) governments are the least common. There are only 14 and 11 cases, respectively, indicating that these governments are rare.
Party 1 is the most frequently represented in government (784 times), while party 5 is the least frequent (229 times).
This suggests that party 1 is more likely to form governments in various types, while party 5 is less represented.
Type 4 (Single-party Minority) governments have 217 cases, and party 5 is most common in this type.
This could indicate that party 5 struggles to form coalitions or gain a majority.
#Crear la tabla cruzada entre gov_type y gov_party
cross_tab <- table(data$gov_type, data$gov_party)
#Agregar totales a filas y columnas
cross_tab_with_totals <- addmargins(cross_tab)
#Mostrar la tabla creada
print(cross_tab_with_totals)
##
## 1 2 3 4 5 Sum
## 1 323 19 16 32 112 502
## 2 224 111 185 69 12 601
## 3 78 132 101 10 8 329
## 4 74 6 7 51 79 217
## 5 71 44 14 31 18 178
## 6 6 4 4 0 0 14
## 7 8 1 2 0 0 11
## Sum 784 317 329 193 229 1852
The variable gov_sup measures the total government support, which represents the seat share of all parties in government. It is weighted by the number of days each party spent in office during a given year. This variable reflects the relative strength of the governing parties in terms of parliamentary seats over time. The minimum value it’s 0 and the maximum value it’s 95.2.
#1. Valores mínimos y valores máximos.
# Obtener el valor mínimo de la variable gov_sup
min_value <- min(data$gov_sup, na.rm = TRUE)
# Obtener el valor máximo de la variable gov_sup
max_value <- max(data$gov_sup, na.rm = TRUE)
# Imprimir los resultados
print(paste("Minimum value: ", min_value))
## [1] "Minimum value: 0"
print(paste("Maximum value: ", max_value))
## [1] "Maximum value: 95.2"
The mean value it’s 54.81 and the standard deviation of this variable it’s 12.29.
# Obtener la media de la variable gov_sup
mean_value <- mean(data$gov_sup, na.rm = TRUE)
# Obtener la desviación estándar de la variable gov_sup
sd_value <- sd(data$gov_sup, na.rm = TRUE)
# Imprimir los resultados
print(paste("Mean value: ", mean_value))
## [1] "Mean value: 54.8161672602247"
print(paste("Standard deviation: ", sd_value))
## [1] "Standard deviation: 12.2905454636257"
#Gráfico de barras de tipos de gobierno
barplot(gov_counts, las = 2, col = "steelblue",
main = "Distribución de Tipos de Gobierno",
ylab = "Número de Observaciones")
Create a plot showing the relationship between two or more variables.
# Scatter plot with a regression line and smaller title
plot(data$gov_party, data$gov_type,
xlab = "Government Party", ylab = "Government Type",
main = "Relationship between Government Type and Government Party",
pch = 19, col = "blue", cex.main = 0.8) # Adjust the title size with cex.main
# Add a regression line
abline(lm(data$gov_type ~ data$gov_party), col = "red")
# Load ggplot2 if not already done
library(ggplot2)
# Scatter plot with a smoothing line
ggplot(data, aes(x = gov_party, y = gov_type)) +
geom_point(color = "blue") +
geom_smooth(method = "loess", color = "red", se = FALSE) + # Smoothing line
labs(title = "Relationship between Government Type and Government Party",
x = "Government Party",
y = "Government Type") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 14 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 0.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 2.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 3.7173e-15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 4
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_point()`).