First, I display the datasets from Irish and UK parliamentary debates with annotations, allowing us to navigate the labeled sentences. At this stage, I have only selected data from the years 2018 and 2019 for both countries. Initially, I trained ChatGPT-4 to label the datasets, and I manually coded 500 sentences. I then calculated inter-rater reliability using Cohen’s kappa. The model’s classifications showed substantial agreement with the human annotations (K = 0.71), indicating high reliability.
datatable(ir, options = list(scrollX = TRUE, pageLength = 10))
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
datatable(uk, options = list(scrollX = TRUE, pageLength = 10))
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
uk <- uk %>%
filter(!category %in% c('No Benchmarking', 'Performance Benchmarking')) %>%
filter(!found_countries == 'Ireland')
ggplot(uk, aes(x = category)) +
geom_bar(fill = "skyblue", color = "black") +
labs(title = "Benchmark categories in UK parliament debate: 2018-2019",
x = "Categories",
y = "Frequency") +
theme_minimal()
ir <- ir %>%
filter(!category %in% c('No Benchmarking', 'Performance Benchmarking'))
ggplot(ir, aes(x = category)) +
geom_bar(fill = "skyblue", color = "black") +
labs(title = "Benchmark categories in Irish parliament debate: 2018-2019",
x = "Categories",
y = "Frequency") +
theme_minimal()
n_colors <- length(unique(coded$policyarea))
ggplot(uk, aes(x = category, fill = as.factor(policyarea))) +
geom_bar(position = "dodge") +
labs(title = "Policy Area Distribution: UK Parliament, 2018-2019",
x = "Category",
y = "Count",
fill = "Policy Areas") +
scale_fill_manual(values = pals::stepped3(n_colors)) + # Use stepped3 from pals
theme_minimal()
ggplot(ir, aes(x = category, fill = as.factor(policyarea))) +
geom_bar(position = "dodge") +
labs(title = "Policy Area Distribution: Irish Parliament, 2018-2019",
x = "Category",
y = "Count",
fill = "Policy Areas") +
scale_fill_manual(values = pals::stepped3(n_colors)) + # Use stepped3 from pals
theme_minimal()
Here I attached the CAP codebook here for reference.
| Number | Policy |
|---|---|
| 1 | Macroeconomics |
| 2 | Civil Rights |
| 3 | Health |
| 4 | Agriculture |
| 5 | Labor |
| 6 | Education |
| 7 | Environment |
| 8 | Energy |
| 9 | Immigration |
| 10 | Transportation |
| 12 | Law and Crime |
| 13 | Social Welfare |
| 14 | Housing |
| 15 | Domestic Commerce |
| 16 | Defence |
| 17 | Technology |
| 18 | Foreign Trade |
| 19 | International Affairs |
| 20 | Government Operations |
| 21 | Public Lands |
| 23 | Culture |
Thanks to Oliver and Simon, we have now scraped all the parliamentary debates in Singapore from 2018 to 2019. Of course, we could expand to more years, but we are presenting these as a starting point.
However, the parliamentary debates in the UK and Ireland, sourced from the ParlEE datasets, are already coded according to the CAP codebooks. Therefore, we began by coding the policy areas for each sentence using two approaches. The first approach we employed was the Bable Machine, a state-of-the-art tool. AAnother approach is to apply the party-press-monolingual-uk model, available on Hugging Face, for automated coding of our new dataset.
setwd("/Users/au760950/Library/CloudStorage/OneDrive-SharedLibraries-Aarhusuniversitet/Roman Senninger - POLABROAD/Benchmarking")
sing <- read.csv('3_data/Singapore/Classified_SingaporeTest.csv')
datatable(sing, options = list(scrollX = TRUE, pageLength = 10))
ggplot(sing, aes(x = policy)) +
geom_bar(position = "dodge", fill = "skyblue" ) +
labs(title = "Policy Area Distribution: Singapore Parliament, 2018-2019 \n(only for foreign countries being mentioned)",
x = "Category",
y = "Count",
fill = "Policy Areas") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, size = 10)
)
Here, we present some preliminary analysis using the ChatGPT chatbot that we shared earlier. One adjustment we made was to suggest that the chatbot respond with “I don’t know” when it is uncertain about the category a sentence belongs to. While we cannot directly adjust the chatbot’s temperature, we have recommended it operate in a low-temperature mode. As you can see, the results so far are not satisfactory, but this provides a starting point for further improvements.
sing_cleaned <- sing %>%
filter(!classification %in% c('No Benchmarking', 'Performance Benchmarking'))
ggplot(sing_cleaned , aes(x = classification)) +
geom_bar(fill = "skyblue", color = "black", width=0.2) +
labs(title = "Benchmark categories in Singapore parliament debate: 2018-2019",
x = "Categories",
y = "Frequency") +
theme_minimal()
In the future, we will theorize whether partisan identity, gender, and educational background (such as whether they have studied abroad or which university they attended) can explain preferences for how benchmark cues are constructed. So we are exploring options to obtain relevant information on MPs.
We obtained the MP list for Ireland and the UK since 2000 through legislatoR. The dataset includes information on their parliamentary sessions, gender, party affiliation, and constituencies.
# library(legislatoR) I obtain the MP information from this package
setwd("/Users/au760950/Library/CloudStorage/OneDrive-SharedLibraries-Aarhusuniversitet/Roman Senninger - POLABROAD/Benchmarking")
irmp <- read_excel("3_data/Politician_Info/IRMPs_Fulllist.xlsx")
reactable(irmp, filterable = TRUE, searchable = TRUE, highlight = TRUE,
bordered = TRUE, striped = TRUE)
# library(legislatoR) I obtain the MP information from this package
setwd("/Users/au760950/Library/CloudStorage/OneDrive-SharedLibraries-Aarhusuniversitet/Roman Senninger - POLABROAD/Benchmarking")
ukmp <- read_excel("3_data/Politician_Info/UKMPs_Fulllist.xlsx")
reactable(ukmp, filterable = TRUE, searchable = TRUE, highlight = TRUE,
bordered = TRUE, striped = TRUE)
And I scrapped the current MP list of Singapore from the government website.
I have created two network plots: one showing whether MPs attended the same university.
library(igraph)
library(ggraph)
library(tidygraph)
# Filter out rows with missing information in 'name', 'PartyInfo', or 'uni'
clean_data <- simp %>%
filter(!is.na(name), !is.na(PartyInfo), !is.na(uni))
# Create edges based on shared university attendance
edges <- clean_data %>%
select(name, PartyInfo, uni) %>%
inner_join(clean_data, by = "uni") %>%
filter(name.x != name.y) %>%
select(from = name.x, to = name.y, uni, PartyInfo = PartyInfo.x) %>%
distinct()
# Convert edges into a tidygraph object and add PartyInfo to nodes
g <- as_tbl_graph(edges, directed = FALSE) %>%
activate(nodes) %>%
left_join(clean_data %>% select(name, PartyInfo), by = c("name" = "name")) %>%
mutate(degree = centrality_degree())
library(plotly)
# Assuming `g` is your tidygraph object
network_plot <- ggraph(g, layout = "fr") +
geom_edge_link(aes(color = uni), alpha = 0.6, width = 0.8) + # Add lines with transparency and adjust width
geom_node_point(aes(color = PartyInfo, text = name), size = 2) + # Nodes with smaller size
geom_node_text(aes(label = name), repel = TRUE, max.overlaps = 50) +
theme_minimal() +
ggtitle("Network Analysis by Shared University Attendance and Party")
# Convert to an interactive plot with ggplotly, including hover text
interactive_network <- ggplotly(network_plot, tooltip = c("text")) %>%
layout(width = 800, height = 650)
# Display the interactive plot
interactive_network
# Create edges based on shared occupational background (occu_back)
edges_occu <- clean_data %>%
select(name, PartyInfo, occu_back) %>%
inner_join(clean_data, by = "occu_back") %>%
filter(name.x != name.y) %>%
select(from = name.x, to = name.y, occu_back, PartyInfo = PartyInfo.x) %>%
distinct()
# Convert edges into a tidygraph object and add PartyInfo to nodes
g_occu <- as_tbl_graph(edges_occu, directed = FALSE) %>%
activate(nodes) %>%
left_join(clean_data %>% select(name, PartyInfo), by = c("name" = "name")) %>%
mutate(degree = centrality_degree())
# Create the network plot based on shared occupational background
network_plot_occu <- ggraph(g_occu, layout = "fr") +
geom_edge_link(aes(color = occu_back), alpha = 0.6, width = 0.8) + # Edges based on occu_back
geom_node_point(aes(color = PartyInfo, text = name), size = 2) + # Nodes with PartyInfo coloring
geom_node_text(aes(label = name), repel = TRUE, max.overlaps = 50) +
theme_minimal() +
ggtitle("Network Analysis by Shared Occupational Background and Party")
# Convert to an interactive plot
interactive_network_occu <- ggplotly(network_plot_occu, tooltip = c("text")) %>%
layout(width = 750, height = 620)
# Display the interactive plot
interactive_network_occu