Setup

Load any other packages you’ll use in the chunk below:

library(tidyverse)
library(ggplot2)
library(readxl)

Working with Data and Codebooks

Download a publicly available dataset related to some concept (or concepts) in political science. Be sure to download the codebook associated with the dataset as well. Below are some suggested sources for datasets, but feel free to choose any dataset with at least 1 categorical and 1 continuous (or ordinal) variable.

I highly recommend working with cross-sectional data. If you download choose time-series cross sectional data you can easily make it a cross section by keeping only one time period (e.g., year).

Take a look at the codebook and load your dataset into R. Print out the first 5 rows and first 5 variables (i.e., columns) of your dataset below:

#Importar el archivo de Excel
data <- read_excel("political_data.xlsx")
#Mostrar las primeras 5 filas de las primeras 5 columnas
print(data[1:5, 1:5]) 
## # A tibble: 5 × 5
##    year country   countryn iso   iso3n
##   <dbl> <chr>        <dbl> <chr> <dbl>
## 1  1960 Australia        1 AUS      36
## 2  1961 Australia        1 AUS      36
## 3  1962 Australia        1 AUS      36
## 4  1963 Australia        1 AUS      36
## 5  1964 Australia        1 AUS      36
  1. Describe the dataset. Who collected or assembled the data, and for what purpose?

The Comparative Political Data Set (CPDS) 1960-2022 is a collection of political and institutional data assembled for research projects led by Klaus Armingeon and funded by the Swiss National Science Foundation. It is designed for cross-national, longitudinal, and time-series analyses of democratic countries.

  1. What are the time and space dimensions of the dataset? What are the units?

The dataset covers the years 1960 to 2022 and includes 36 OECD and/or EU-member countries. The units of analysis are country-year observations, meaning data is recorded annually for each country, but only for democratic periods.

  1. How many variables are in your dataset?

This dataset has 335 variables.

  1. Choose one variable that is of interest to you. (Do not choose an id variable such as “country name,” “respondent id” or “year.”) What concept does that variable measure? How is that concept operationalized?

The variable vturn measures voter turnout in national parliamentary (lower house) elections. This concept represents the level of electoral participation within a country, indicating the proportion of eligible voters who cast a ballot in an election.

The concept is operationalized as the percentage of eligible voters who participated in the election. The dataset ensures consistency by only recording turnout for the most significant election in a given year, meaning that in years with multiple elections, only the final one is included.

Descriptive Statistics

Categorical Variable

Choose a categorical variable from the dataset you’ve just loaded. You can use a variable with ordered categories if needed, but avoid id variables such as “country”.

  1. Briefly describe the categorical variable you’ve chosen. How many categories are there and what do they indicate?

The categorical variable gov_type represents the type of government in a country. It has seven categories, each indicating a different government formation based on party composition and parliamentary majority:

  1. Single-party majority – One party holds all government seats and a majority.
  2. Minimal winning coalition – A coalition just large enough to secure a majority.
  3. Surplus coalition – A coalition larger than the minimal majority required.
  4. Single-party minority – One party governs without a parliamentary majority.
  5. Multi-party minority – Multiple parties govern without a majority.
  6. Caretaker government – Temporary government maintaining the status quo.
  7. Technocratic government – Led by technocrats with a mandate for reform.

This variable helps classify the structure and stability of governments across countries and time.

  1. Create a table showing the number of observations in each category. If there are more than, say, 10 categories, just show the categories with the most observations. (See the descriptive statistics lab for an example of this.)
# 1. Crear tabla de frecuencias
gov_counts <- table(data$gov_type)

# 2. Ordenar las categorías de mayor a menor
gov_counts <- sort(gov_counts, decreasing = TRUE)


# your table 
# 3. Mostrar la tabla
print(gov_counts)
## 
##   2   1   3   4   5   6   7 
## 601 503 329 217 178  14  11
  1. Choose another categorical variable of interest. Present a cross tabulation of the two variables, what does this tell you?

The most common type of government is 2 (Minimal Winning Coalition), with 601 observations.

This suggests that minimal winning coalitions are the most frequent form of government. It is most common with parties 2 and 3 (111 and 185 observations, respectively).

Type 1 (Single-party Majority) is also common (502 observations). It is especially frequent with party 1 (323 observations), indicating that this party frequently governs alone with a majority.

Type 6 (Caretaker) and type 7 (Technocratic) governments are the least common. There are only 14 and 11 cases, respectively, indicating that these governments are rare.

Party 1 is the most frequently represented in government (784 times), while party 5 is the least frequent (229 times).

This suggests that party 1 is more likely to form governments in various types, while party 5 is less represented.

Type 4 (Single-party Minority) governments have 217 cases, and party 5 is most common in this type.

This could indicate that party 5 struggles to form coalitions or gain a majority.

#Crear la tabla cruzada entre gov_type y gov_party 
cross_tab <- table(data$gov_type, data$gov_party)

#Agregar totales a filas y columnas
cross_tab_with_totals <- addmargins(cross_tab)

#Mostrar la tabla creada
print(cross_tab_with_totals)
##      
##          1    2    3    4    5  Sum
##   1    323   19   16   32  112  502
##   2    224  111  185   69   12  601
##   3     78  132  101   10    8  329
##   4     74    6    7   51   79  217
##   5     71   44   14   31   18  178
##   6      6    4    4    0    0   14
##   7      8    1    2    0    0   11
##   Sum  784  317  329  193  229 1852

Continuous Variable

  1. Choose a continuous variable from your dataset. For the purposes of this assignment, its fine if you pick an ordinal variable with more than 4 categories (you are likley to find these in survey data.) Briefly describe your variable: What does it measure? What are its minimum and maximum values?

The variable gov_sup measures the total government support, which represents the seat share of all parties in government. It is weighted by the number of days each party spent in office during a given year. This variable reflects the relative strength of the governing parties in terms of parliamentary seats over time. The minimum value it’s 0 and the maximum value it’s 95.2.

#1. Valores mínimos y valores máximos.
# Obtener el valor mínimo de la variable gov_sup
min_value <- min(data$gov_sup, na.rm = TRUE)

# Obtener el valor máximo de la variable gov_sup
max_value <- max(data$gov_sup, na.rm = TRUE)

# Imprimir los resultados
print(paste("Minimum value: ", min_value))
## [1] "Minimum value:  0"
print(paste("Maximum value: ", max_value))
## [1] "Maximum value:  95.2"
  1. Report the mean and standard deviation of the continuous variable you’ve selected.

The mean value it’s 54.81 and the standard deviation of this variable it’s 12.29.

# Obtener la media de la variable gov_sup
mean_value <- mean(data$gov_sup, na.rm = TRUE)

# Obtener la desviación estándar de la variable gov_sup
sd_value <- sd(data$gov_sup, na.rm = TRUE)

# Imprimir los resultados
print(paste("Mean value: ", mean_value))
## [1] "Mean value:  54.8161672602247"
print(paste("Standard deviation: ", sd_value))
## [1] "Standard deviation:  12.2905454636257"

Data Visualization

  1. Create an informative visualization for one of the variables you discussed above. Be sure to title your plot and label its axes.
#Gráfico de barras de tipos de gobierno 
barplot(gov_counts, las = 2, col = "steelblue",
        main = "Distribución de Tipos de Gobierno",
        ylab = "Número de Observaciones")

Bonus: Bivariate Data Visualation

Create a plot showing the relationship between two or more variables.

# Scatter plot with a regression line and smaller title
plot(data$gov_party, data$gov_type,
     xlab = "Government Party", ylab = "Government Type",
     main = "Relationship between Government Type and Government Party",
     pch = 19, col = "blue", cex.main = 0.8)  # Adjust the title size with cex.main

# Add a regression line
abline(lm(data$gov_type ~ data$gov_party), col = "red")

# Load ggplot2 if not already done
library(ggplot2)

# Scatter plot with a smoothing line
ggplot(data, aes(x = gov_party, y = gov_type)) +
  geom_point(color = "blue") +
  geom_smooth(method = "loess", color = "red", se = FALSE) +  # Smoothing line
  labs(title = "Relationship between Government Type and Government Party",
       x = "Government Party",
       y = "Government Type") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 14 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 0.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 2.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 3.7173e-15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 4
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_point()`).