Programming Final

Author

Grace Flandreau

This dataset details the political giving patterns of board members and CEOs of Fortune 500 Companies. Created by Adam Bonica, a professor of political science at Stanford, the goal of this dataset is to show how corporate PACs and elites choose to support different political parties and legislators. Each row of this dataset represents one individual CEO or director of Fortune 500 companies, as of 2012. Each of the 5414 individuals (rows) are described using the 32 variables (columns) in the dataset. The most important of these variables include age, gender, industry, total, total.dem, and total.rep. The columns starting with “total” denote the total political spending each person spent for that category. For example, total.dem shows how much someone spent towards Democrats and total.rep represents how much they spent towards Republicans. A potential challenge with this dataset is the number of NAs present. Since there are such a large number of rows, there are gaps found particularly in columns such as industry and age. Since there are thousands of rows being examined, this is mostly made up for when looking at overall trends, but it is still something to consider. Another potential issue is that the dataset is looking at CEOs and directors as of the year 2012. Since this dataset examines spending from the years 2002-2012, it is possible that it is missing major trends from earlier CEOs and directors who have since retired or left their positions. There is also the potential issue of a high outlier in this dataset. Margaret C Whitman, of Hewlett-Packard Co and Procter & Gamble Co, has donated significantly more to politics compared to the rest of this list. From 2002 to 2012, she has contributed about $146,000,000 in political spending. The next biggest spender, Jon M Huntsman, has spent about $13,600,000. While these are both large amounts, Whitman has clearly contributed significantly more, which is something that should be considered when viewing this data.

dime <- read.csv("bod_fortune_500_DIME.csv")

# colnames
colnames(dime)
 [1] "dime.cid"       "corp.person.id" "ticker"         "corp.name"     
 [5] "last.name"      "first.name"     "middle.name"    "age"           
 [9] "gender"         "ceo"            "chairman"       "privatefirm"   
[13] "sector"         "industry"       "dime.cfscore"   "total"         
[17] "num.conts"      "self.funded"    "total.dem"      "total.rep"     
[21] "pct.to.dems"    "total.2002"     "total.2004"     "total.2006"    
[25] "total.2008"     "total.2010"     "total.2012"     "to.incumbs"    
[29] "to.open.seat"   "to.challs"      "to.winner"      "to.losers"     
# number of observations
nrow(dime)
[1] 5414
# number of variables
ncol(dime)
[1] 32
# number of NAs per column
colSums(is.na(dime))
      dime.cid corp.person.id         ticker      corp.name      last.name 
             0              0              0              0              0 
    first.name    middle.name            age         gender            ceo 
             0             31            454              0              0 
      chairman    privatefirm         sector       industry   dime.cfscore 
             0              0            408            408           1088 
         total      num.conts    self.funded      total.dem      total.rep 
            13             13              0              0              0 
   pct.to.dems     total.2002     total.2004     total.2006     total.2008 
          1166             48             42             15             52 
    total.2010     total.2012     to.incumbs   to.open.seat      to.challs 
            16             18             13             13             13 
     to.winner      to.losers 
            13             13 

The first variable that is important in this dataset is the “industry” column. This column shows the different categories that companies are placed into based on what good or service they provide. The industry column helps to organize the data by organizing the companies listed into categories. By grouping by the industry, it is easier to see what sectors of the economy are contributing the most to political spending. It is also important to note that industries that are highly represented in this dataset have been recognized by Fortune 500, suggesting they are highly financially successful. Using both the “industry” column and the “total” column, the histogram below shows that the Computer Hardware and Casinos & Gaming industries have contributed to political spending at a much higher rate than others, even compared to the rest of the top 10. This is an interesting finding because it suggests that these industries may be highly affected by politics, incentivizing them to invest in the political party or candidate that will support their needs.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2
library(scales)

top10 <- dime %>%
  filter(!is.na(industry)) %>%
  group_by(industry) %>%
  summarise(total_spent = sum(total, na.rm = TRUE), .groups = "drop") %>%
  slice_max(order_by = total_spent, n = 10) 

ggplot(top10, aes(x = reorder(industry, total_spent),
                  y = total_spent / 1000000)) +   
  geom_col(fill = "#218380") +
  coord_flip() +
  scale_y_continuous(
    breaks = scales::pretty_breaks(n = 3)   
  ) +
  labs(
    title = "Top 10 Industries by Total Political Spending",
    x = "Industry",
    y = "Total Spending (in millions)"
  ) +
  theme_bw(base_size = 12) + 
  theme(
    plot.title = element_text(hjust = 1)
  )

Another variable that is important to look at are the “total.dem” and “total.rep” columns. These columns represent the total spending amounts per person towards the Democratic Party and the Republican Party, respectively. These are important variables to look at when studying political spending because they show which party is receiving more funding from companies with the highest total revenue in the United States. According to the jitter plot, the CEOs and directors of these companies spend significantly more on the Republican Party compared to the Democratic Party. This makes sense because the Republican Party values laissez faire policies that allow businesses more freedom in their practices. For many of these people, they most likely see the Republican Party as a way to continue their business practices with less interference. One value that is not seen on the plot is the outlier of Margaret C Whitman. Since her spending towards the Republican Party was nearly $148,000,000 more than the next highest spender, adding her to the chart completely altered the rest of the data and made the trends much harder to see. Since I was trying to see the overall trends of these individuals, I figured that keeping her spending out would allow for a clearer picture of the data.

library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(stringr)
Warning: package 'stringr' was built under R version 4.5.2
dime_cap <- dime %>%
  mutate(
    last_clean = str_to_lower(str_trim(last.name))
  ) %>%
  distinct(last_clean, .keep_all = TRUE)

dime2 <- dime_cap %>%
  distinct(last.name, .keep_all = T) %>%
  mutate(
    party = case_when(
      total.rep > 0 ~ "Republican",
      total.dem > 0 ~ "Democrat",
      TRUE ~ NA_character_
    ),
    total_spent = if_else(total.rep > 0, total.rep, total.dem)
  ) %>%
  filter(!is.na(party)) %>%
  filter(total_spent < quantile(total_spent, 0.99, na.rm = TRUE))

ggplot(dime2, aes(x = party, y = total_spent, color = party)) +
  geom_jitter(width = 0.15, alpha = 0.6, size = 2) +
  scale_color_manual(values = c("Republican" = "#D74A4A", 
                                "Democrat" = "#3875D7")) +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Political Contributions by Party",
    x = NULL,
    y = "Total Spending"
  ) +
  theme_bw(base_size = 12) +
  theme(plot.title = element_text(hjust = 0.5))

The final variable that I found interesting when examining this data was the “age” column. Since the individuals represented in this dataset are in charge of these very successful companies, I wanted to see if there were any trends in the ages being represented. I figured that the results of this variable would skew older, as these people are in positions of power, but I was surprised to how old it skewed. Of the dataset, the median age of individuals was 63, which is only four years younger than the retirement age in the US, being 67. Considering this and the fact that these individuals are contributing heavily to politics, it shows that older generations did, and probably still do, have significant impact on American politics.

library(ggplot2)
library(dplyr)

dime_cap <- dime %>%
  mutate(
    last_clean = str_to_lower(str_trim(last.name))
  ) %>%
  distinct(last_clean, .keep_all = TRUE)

dime_age <- dime_cap %>%
  filter(!is.na(age), age != "--") %>%
  distinct(last.name, .keep_all = T) %>%
  mutate(age = as.numeric(age))

age_min <- min(dime_age$age, na.rm = TRUE)
age_max <- max(dime_age$age, na.rm = TRUE)
age_median <- median(dime_age$age, na.rm = TRUE)

ggplot(dime_age, aes(x = age)) +
  geom_density(fill = "#9163cb", alpha = 0.5) +
  geom_vline(
    aes(xintercept = mean(age, na.rm = TRUE)),
    color = "black",
    linetype = "dashed",
    linewidth = 1) + 
  annotate("text",
           x = age_min,
           y = 0,
           label = paste0("Min: ", age_min),
           hjust = -0.1,
           vjust = 1.5,
           size = 3) +
  annotate("text",
           x = age_median,
           y = 0,
           label = paste0("Median: ", age_median),
           hjust = 0.5,
           vjust = 1.5,
           size = 3) +
  annotate("text",
           x = age_max,
           y = 0,
           label = paste0("Max: ", age_max),
           hjust = 1.1,
           vjust = 1.5,
           size = 3) +
  scale_x_continuous(
    breaks = seq(5, max(dime_age$age, na.rm = TRUE), by = 5)
  ) +
  labs(
    title = "Distribution of Ages for Fortune 500 Board Members (as of 2012)",
    x = "Age",
    y = "Density"
  ) +
  theme_bw(base_size = 12) +
  theme(plot.title = element_text(hjust = 1))

Citation

Bonica, Adam. 2016. “Avenues of Influence: On the Political On the Political Expenditures of Corporations and Their Directors and Executives.”Business and Politics. 18(4): 367-394.

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/6R1HAS