MyersAssignment7

Comparing Sustainable Policies between California, Vermont, and New York

Background on Data Retrieval and Site

The US Climate Alliance is a “bipartisan coalition of 24 governors securing America’s net-zero future by advancing state-led, high-impact climate action”. Within the site, it has a database of actions and policies for the states participating in this alliance. Sustainability has always been something that I hold as a value and see as important in our individual lives, as well as it as a responsibility at the state and federal level.

My goal. is to use the API provided through this database to collect the action information of the actions cataloged for California, Vermont, and New York. I first, at webscraped the site, but once I got to the description, realized it was dynamic code instead of static and was being loaded in from an API after the rest of the code loaded. So, using the API URL that I found, I created my own URLs with the base URL and the action ID. Unfortunately, there was not much uniformity or pattern when it came to the IDs, so I hand-selected them out.

From there, I created a loop that would scrape each individual page for NY, CA, or VT actions within the last 5 years. The API provided brings in related actions or policies - some bringing in actions from other states, so I further filtered the dataset to ensure it was only from the three desired states. I deleted out some columns that were unnecessary, and changed a few columns with dates to the date data type, rather than the character data type.

Questions Posed:

  1. Does positive sentiment differ between months between the three?
  2. Based on the terms used in policy definitions, are there any similar trends in sustainability initiatives between the three states?
  3. How do positive policy sentiment scores change per month?

Prep for Analysis

Loading the Libraries:

library(lubridate)
library(ggwordcloud)
library(textdata)
library(tidyverse)
library(tidytext)
library(dplyr)

Downloading the CSV:

policy_data <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/myersa14_xavier_edu/ER-MwEqpcmNGtQnDB5wLn88BPgEjGGkskyGrnNLkGvHf1A?download=1")

Before continuing with the analysis, there were some additional mutations to make. Some of the values came in as character data types, and needed to be switched to date data types, using our lubridate function.

#turn date effective, enacted , created, and modified dates into date data types
policy_data$date_effective <- mdy(gsub("\\.", "", policy_data$date_effective))
policy_data$date_enacted <- mdy(gsub("\\.", "", policy_data$date_enacted))
policy_data$modified_date_ <- mdy(gsub("\\.", "", policy_data$modified_date_))

Additionally, I performed some data cleansing to make sure we could accurately portray sentiment, which included action description being broken down to a word level and any ‘stop’ words were removed. These were inserted into a new dataframe. I’ll be using the BING and NRC lexicons to help conduct my sentiment and emotional analyses, which will be inner-joined when needed. I also went ahead and filtered out the words “california”, “vermont”, and “york”, since those show up often and will be associated with a large variety of these words.

Question 1:

Does the positive and negative sentiment impact of words differ between the three states?

By answering this question, I’m hoping to see the terminology used in descriptions and if a majority of it has positive sentiment. My hypothesis is that it will - often, sustainable policies are proposed to limit or change something in hopes for a better impact. However, I’m curious if policies mention any negative impacts or current legislation that they’re trying to reverse or better. For this, I joined the Bing lexicon with my unnested token dataframe - with this code, we will be able to see what sentiment is attached to words used 15 times or more throughout, and how those balance with each other.

In looking at the results, all three states are pretty similar across the board. It seems like a lot of the words associated with positive sentiments are words, that to me, can definitely relate to the betterment of our future and structures. Support, protect, progress, protection, advance, and improve are all words that I definitely can associated with sustainable policies. “Clean” being the word with the highest sentiment doesn’t surprise me - whether it’s used for phrases like clean energy, clean cars, clean water, clean resources - are popular terms and popular initiatives and typically has a positive connotation. The words with negative sentiments, like “risks”, “limit”, and “disadvantaged” make sense in this context - “risks” and “limit’ don’t necessarily have completely negative connotations, as it is mostly referring to limiting something that might be causing harm to the environment or society.

Additionally, between the three states, many of the words mentioned greater than 15 times and have sentiment value are the same. This can lead to an assumption that there is a vocabulary associated with legislature and specifically, sustainable legislature that is common for policies to use.

Question 2:

Based on the terms used in policy definitions, are there any similar trends in sustainability initiatives between the three states?

To answer this question, I want to compare the three states and their word frequency, for words used more than 40 times. I created a separate dataframe, where I grouped my data by state and by word, then summarized the count, filtering for only those with a frequency above 40 times. From there, I want to plot them all on the same column graph, using the dodge feature to make it easy to visualize the differences or similarities.

In looking at my data, we are able to take away some potential trends that are appearing in the policies of each state. All states, but especially California, seem to be concerned with climate or have policies that will interact with the climate in some way. All three states also seem concerned with their energy and emissions produced. California and Vermont both have policies that mentioned “greenhouse” (assuming to be associated with greenhouse gases or emissions). Vermont has a high count of GHG, which stands for greenhouse gases, so a difference in terminology can be observed there. California also seems to have actions surrounding their carbon, resources, and transportation policies.

Question 3:

How do positive policy sentiment scores change per year?

For this question, I want to look at the positive sentiment scores of each state and see if there is any difference in scores throughout the years 2019-2024. I think it would be interested to see if language or sentiment changes around election years, and if there’s a normal balance following those years. To do so, I created a dataframe for my policy sentiments by year (I had to extract the year from my date_enacted value), subtract the positive from negative sentiments, and then grouped by state and year.

In looking at this data, it seems like California had a lot of positive terminology in their policies - it could be attributed to the push for sustainable initiatives before the uncertain results of the 2021 election. We saw a lower amount of positive sentiment found in all states in 2021. 2022 was a mixed bag from all states - California had high numbers, while New York and Vermont saw lower positivity scores. 2023 had the same trend, while 2024 saw a huge increase for both New York and Vermont in terms of their positive sentiments in the policies passed.