Polabroad: preliminary

Dataset

First, I display the datasets from Irish and UK parliamentary debates with annotations, allowing us to navigate the labeled sentences. At this stage, I have only selected data from the years 2018 and 2019 for both countries. Initially, I trained ChatGPT-4 to label the datasets, and I manually coded 500 sentences. I then calculated inter-rater reliability using Cohen’s kappa. The model’s classifications showed substantial agreement with the human annotations (K = 0.71), indicating high reliability.

datatable(ir, options = list(scrollX = TRUE, pageLength = 10))

## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

datatable(uk, options = list(scrollX = TRUE, pageLength = 10))

## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

Descriptive Information

uk <- uk %>% 
  filter(!category %in% c('No Benchmarking', 'Performance Benchmarking')) %>%
  filter(!found_countries == 'Ireland')

ggplot(uk, aes(x = category)) +
  geom_bar(fill = "skyblue", color = "black") +
  labs(title = "Benchmark categories in UK parliament debate: 2018-2019",
       x = "Categories",
       y = "Frequency") +
  theme_minimal()

ir <- ir %>% 
  filter(!category %in% c('No Benchmarking', 'Performance Benchmarking'))

ggplot(ir, aes(x = category)) +
  geom_bar(fill = "skyblue", color = "black") +
  labs(title = "Benchmark categories in Irish parliament debate: 2018-2019",
       x = "Categories",
       y = "Frequency") +
  theme_minimal()

n_colors <- length(unique(coded$policyarea))
ggplot(uk, aes(x = category, fill = as.factor(policyarea))) +
    geom_bar(position = "dodge") +
    labs(title = "Policy Area Distribution: UK Parliament, 2018-2019",
         x = "Category",
         y = "Count",
         fill = "Policy Areas") +
    scale_fill_manual(values = pals::stepped3(n_colors)) +  # Use stepped3 from pals
    theme_minimal()

ggplot(ir, aes(x = category, fill = as.factor(policyarea))) +
    geom_bar(position = "dodge") +
    labs(title = "Policy Area Distribution: Irish Parliament, 2018-2019",
         x = "Category",
         y = "Count",
         fill = "Policy Areas") +
    scale_fill_manual(values = pals::stepped3(n_colors)) +  # Use stepped3 from pals
    theme_minimal()

Here I attached the CAP codebook here for reference.

Number	Policy
1	Macroeconomics
2	Civil Rights
3	Health
4	Agriculture
5	Labor
6	Education
7	Environment
8	Energy
9	Immigration
10	Transportation
12	Law and Crime
13	Social Welfare
14	Housing
15	Domestic Commerce
16	Defence
17	Technology
18	Foreign Trade
19	International Affairs
20	Government Operations
21	Public Lands
23	Culture

Parliament Debate at Singapore

Thanks to Oliver and Simon, we have now scraped all the parliamentary debates in Singapore from 2018 to 2019. Of course, we could expand to more years, but we are presenting these as a starting point.

However, the parliamentary debates in the UK and Ireland, sourced from the ParlEE datasets, are already coded according to the CAP codebooks. Therefore, we began by coding the policy areas for each sentence using two approaches. The first approach we employed was the Bable Machine, a state-of-the-art tool. AAnother approach is to apply the party-press-monolingual-uk model, available on Hugging Face, for automated coding of our new dataset.

setwd("/Users/au760950/Library/CloudStorage/OneDrive-SharedLibraries-Aarhusuniversitet/Roman Senninger - POLABROAD/Benchmarking")
sing <- read.csv('3_data/Singapore/Classified_SingaporeTest.csv')
datatable(sing, options = list(scrollX = TRUE, pageLength = 10))

ggplot(sing, aes(x = policy)) +
    geom_bar(position = "dodge", fill = "skyblue" ) +
    labs(title = "Policy Area Distribution: Singapore Parliament, 2018-2019 \n(only for foreign countries being mentioned)",
         x = "Category",
         y = "Count",
         fill = "Policy Areas") + 
    theme_minimal() +
    theme(
        axis.text.x = element_text(angle = 45, hjust = 1, size = 10)
    )

Here, we present some preliminary analysis using the ChatGPT chatbot that we shared earlier. One adjustment we made was to suggest that the chatbot respond with “I don’t know” when it is uncertain about the category a sentence belongs to. While we cannot directly adjust the chatbot’s temperature, we have recommended it operate in a low-temperature mode. As you can see, the results so far are not satisfactory, but this provides a starting point for further improvements.

sing_cleaned <- sing %>% 
  filter(!classification %in% c('No Benchmarking', 'Performance Benchmarking')) 

ggplot(sing_cleaned , aes(x = classification)) +
  geom_bar(fill = "skyblue", color = "black", width=0.2) +
  labs(title = "Benchmark categories in Singapore parliament debate: 2018-2019",
       x = "Categories",
       y = "Frequency") +
  theme_minimal()

Legislator Information

In the future, we will theorize whether partisan identity, gender, and educational background (such as whether they have studied abroad or which university they attended) can explain preferences for how benchmark cues are constructed. So we are exploring options to obtain relevant information on MPs.

We obtained the MP list for Ireland and the UK since 2000 through legislatoR. The dataset includes information on their parliamentary sessions, gender, party affiliation, and constituencies.

# library(legislatoR) I obtain the MP information from this package
setwd("/Users/au760950/Library/CloudStorage/OneDrive-SharedLibraries-Aarhusuniversitet/Roman Senninger - POLABROAD/Benchmarking")
irmp <- read_excel("3_data/Politician_Info/IRMPs_Fulllist.xlsx")

reactable(irmp, filterable = TRUE, searchable = TRUE, highlight = TRUE, 
           bordered = TRUE, striped = TRUE)

# library(legislatoR) I obtain the MP information from this package
setwd("/Users/au760950/Library/CloudStorage/OneDrive-SharedLibraries-Aarhusuniversitet/Roman Senninger - POLABROAD/Benchmarking")
ukmp <- read_excel("3_data/Politician_Info/UKMPs_Fulllist.xlsx")

reactable(ukmp, filterable = TRUE, searchable = TRUE, highlight = TRUE, 
           bordered = TRUE, striped = TRUE)

And I scrapped the current MP list of Singapore from the government website.

University Visualization

I have created two network plots: one showing whether MPs attended the same university.

library(igraph)
library(ggraph)
library(tidygraph)
# Filter out rows with missing information in 'name', 'PartyInfo', or 'uni'
clean_data <- simp %>%
  filter(!is.na(name), !is.na(PartyInfo), !is.na(uni))

# Create edges based on shared university attendance
edges <- clean_data %>%
  select(name, PartyInfo, uni) %>%
  inner_join(clean_data, by = "uni") %>%
  filter(name.x != name.y) %>%
  select(from = name.x, to = name.y, uni, PartyInfo = PartyInfo.x) %>%
  distinct()

# Convert edges into a tidygraph object and add PartyInfo to nodes
g <- as_tbl_graph(edges, directed = FALSE) %>%
  activate(nodes) %>%
  left_join(clean_data %>% select(name, PartyInfo), by = c("name" = "name")) %>%
   mutate(degree = centrality_degree())

library(plotly)
# Assuming `g` is your tidygraph object
network_plot <- ggraph(g, layout = "fr") +
  geom_edge_link(aes(color = uni), alpha = 0.6, width = 0.8) +  # Add lines with transparency and adjust width
  geom_node_point(aes(color = PartyInfo, text = name), size = 2) +  # Nodes with smaller size
  geom_node_text(aes(label = name), repel = TRUE, max.overlaps = 50) +
  theme_minimal() +
  ggtitle("Network Analysis by Shared University Attendance and Party")

# Convert to an interactive plot with ggplotly, including hover text
interactive_network <- ggplotly(network_plot, tooltip = c("text")) %>%
  layout(width = 800, height = 650)

# Display the interactive plot
interactive_network

Occupation Visualization

# Create edges based on shared occupational background (occu_back)
edges_occu <- clean_data %>%
  select(name, PartyInfo, occu_back) %>%
  inner_join(clean_data, by = "occu_back") %>%
  filter(name.x != name.y) %>%
  select(from = name.x, to = name.y, occu_back, PartyInfo = PartyInfo.x) %>%
  distinct()

# Convert edges into a tidygraph object and add PartyInfo to nodes
g_occu <- as_tbl_graph(edges_occu, directed = FALSE) %>%
  activate(nodes) %>%
  left_join(clean_data %>% select(name, PartyInfo), by = c("name" = "name")) %>%
  mutate(degree = centrality_degree())

# Create the network plot based on shared occupational background
network_plot_occu <- ggraph(g_occu, layout = "fr") +
  geom_edge_link(aes(color = occu_back), alpha = 0.6, width = 0.8) +  # Edges based on occu_back
  geom_node_point(aes(color = PartyInfo, text = name), size = 2) +  # Nodes with PartyInfo coloring
  geom_node_text(aes(label = name), repel = TRUE, max.overlaps = 50) +
  theme_minimal() +
  ggtitle("Network Analysis by Shared Occupational Background and Party")

# Convert to an interactive plot
interactive_network_occu <- ggplotly(network_plot_occu, tooltip = c("text"))  %>%
    layout(width = 750, height = 620)

# Display the interactive plot
interactive_network_occu

Polabroad: preliminary_analysis

2024-11-11