Finding Articles

We found articles from eight different newspapers across the country. We focused our analysis on a few of the major cities since they would likely have the most articles regarding climate change and we made sure to have newspapers from the major regions of the US to see if there were any differences. Since the issue of climate change gained momentum starting in 2006, we focused our analysis on the time period of 2006 to 2020. In order to conduct our analysis, we selected 100 articles from each of the papers we analyzed, making sure to include a variety of dates across the selected time frame. When downloading the articles we then stripped the article of everything except for the title of the article and the article’s body of text. Among the items stripped out were the publication type classification of each article, the language, and main topics, as well as other noise in the text file. In all cases, the file downloaded from NexisUniversity was a Rich Text File, which we then converted into a plain text .txt file in order to read into R. Once the .txt file was read into R, we stripped out any parts of the .txt file that were extraneous and performed the sentiment analysis.

Article Analysis

The New York Times

Most Commonly Used Words

## # A tibble: 12,477 x 2
##    word          n
##    <chr>     <int>
##  1 climate    2139
##  2 change     1408
##  3 global      535
##  4 warming     422
##  5 carbon      404
##  6 times       376
##  7 york        369
##  8 emissions   315
##  9 energy      307
## 10 people      305
## # … with 12,467 more rows

Bing Analysis

## 
## negative positive 
##     1105      599

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          394          367          279          491          256          987 
##     positive      sadness     surprise        trust 
##          955          376          199          558

Affin Analysis

## # A tibble: 1,036 x 3
##    word          n value
##    <chr>     <int> <dbl>
##  1 united      299     1
##  2 fire        156    -2
##  3 risk        105    -2
##  4 natural      89     1
##  5 threat       69    -2
##  6 risks        65    -2
##  7 clean        64     2
##  8 crisis       61    -3
##  9 agreement    59     1
## 10 increase     56     1
## # … with 1,026 more rows

Insights

Overall, the sentiment found in the New York Times was very negative. We found that overall in words that can be classified as “negative” or “positive” 1105 words were found to be negative while only 599 were considered to be positive. Based on these numbers 65% of the words that are able to be classified have a negative connotation. Furthermore, looking more specifically at the sentiment of the articles beyond just positive or negative interestingly the most common adjective found with 558 words associated was “trust”. This is fairly surprising given that in reality there seems to be a lack of trust on this issue. However, the likely cause of this is that the word “United” shows up a lot due to “United States” showing up so this is likely taken out of context. Another adjective found a lot was “fear” which makes sense since a lot of people are worried and fearful about the effects of climate change over the next few years. The sentiment also may be a bit more negative than what the analysis shows since the most common “positive” and “trust” word is united which likely just comes from “United States” so in reality it should be seen as a neutral word and not included. One word that really stood out as appearing a lot was “fire” in the affin anaysis which appeared 156 times across all the articles and is clearly a huge concern of climate change. This also shows us that The New York Times is concerned with climate change outside of their own region since much of the fires are going on in California. The words threat, risk, and crisis all also appear as 3 of the top 10 words seen in the afinn analysis and are each are a -2 or -3 on the scale from -5 to 5.

The Washington Post

Most Commonly Used Words

## # A tibble: 9,850 x 2
##    word           n
##    <chr>      <int>
##  1 climate     1652
##  2 change      1124
##  3 global       297
##  4 warming      288
##  5 washington   271
##  6 post         255
##  7 scientists   214
##  8 people       208
##  9 report       201
## 10 science      200
## # … with 9,840 more rows

Bing Analysis

## 
## negative positive 
##      780      433

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          290          312          171          371          212          703 
##     positive      sadness     surprise        trust 
##          782          291          170          465

Affin Analysis

## # A tibble: 842 x 3
##    word          n value
##    <chr>     <int> <dbl>
##  1 united      130     1
##  2 natural      68     1
##  3 increase     62     1
##  4 risk         57    -2
##  5 risks        52    -2
##  6 threat       49    -2
##  7 growing      44     1
##  8 support      43     2
##  9 clean        38     2
## 10 disasters    37    -2
## # … with 832 more rows

Insights

The results found after doing a sentiment analysis of Washington Post articles related to climate change found very similar results to New York. Of the words that could be classified 780 words were considered with 433 considered positive totaling to roughly 64% of words were classified as negative. It is interesting that less of the words from the articles can be classified as either positive or negative compared to the New York Times articles. This likely means that either the New York Times articles on climate change were longer on average or the New York Times articles were more opinionated. Looking at sentiment on a spectrum words with a sentiment classified as a -2 from -5 to 5 was the most common. Similarly to New York going beyond positive and negative the next two most common classifications were “trust” and “fear”. Looking closer at the affin analysis it was interesting that we see words like “increase” or “growing” more and the word “crisis” is not nearly as common as we saw in New York. Because of this it appears that the issue is a growing conern in DC but maybe not quite at the forefront of their attention like it is in New York.

Chicago Daily Herald

Most Commonly Used Words

## # A tibble: 7,364 x 2
##    word        n
##    <chr>   <int>
##  1 climate  1018
##  2 change    741
##  3 global    230
##  4 warming   202
##  5 chicago   180
##  6 people    142
##  7 daily     124
##  8 herald    120
##  9 carbon    119
## 10 u.s       118
## # … with 7,354 more rows

Bing Analysis

## 
## negative positive 
##      564      345

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          196          250          137          252          181          508 
##     positive      sadness     surprise        trust 
##          620          207          127          358

Affin Analysis

## # A tibble: 687 x 3
##    word          n value
##    <chr>     <int> <dbl>
##  1 united       51     1
##  2 clean        46     2
##  3 agreement    44     1
##  4 increase     39     1
##  5 support      35     2
##  6 natural      32     1
##  7 care         27     2
##  8 risk         27    -2
##  9 poor         23    -2
## 10 deniers      22    -2
## # … with 677 more rows

Insights

Results are extremely similar to New York, DC and Philadelphia. When classifying results that are negative or positive we see that 564 negative words appear with just 345 positive words appearing giving a rough percentage of 62% of the words being negative. Once again similar to the Washington Post we see much less negative and positive words as a whole so this could mean Chicago has a more neutral perspective compared to New York despite similar percentages. One word that shows up a lot is the word “lake” which appears 117 times so it appears that is one of the largest concerns Chicago has with regards to climate change given their proximity to the Great Lakes. To back this up in the affin analysis the word “clean” is used the 2nd most amount of times of any word categorized right behind “united” which can likely be mostly ignored. Another word that appeared in the top 10 most common words was “deniers” and “agreement” which shows that Chicago is likely trying to get everybody behind climate change and attempting to convince the people who are still in doubt about the seriousness of the issue.

The San Diego Union-Tribune

Most Commonly Used Words

## # A tibble: 8,374 x 2
##    word        n
##    <chr>   <int>
##  1 climate  1199
##  2 change    901
##  3 san       439
##  4 diego     387
##  5 global    293
##  6 warming   288
##  7 union     223
##  8 tribune   213
##  9 carbon    197
## 10 report    193
## # … with 8,364 more rows

Bing Analysis

## 
## negative positive 
##      651      398

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          246          263          154          305          177          586 
##     positive      sadness     surprise        trust 
##          684          241          141          413

Affin Analysis

## # A tibble: 726 x 3
##    word          n value
##    <chr>     <int> <dbl>
##  1 united       73     1
##  2 risk         58    -2
##  3 threat       45    -2
##  4 natural      43     1
##  5 increase     41     1
##  6 increased    36     1
##  7 support      36     2
##  8 growing      34     1
##  9 clean        32     2
## 10 cut          30    -1
## # … with 716 more rows

Insights

The sentiment of San Diego was very similar to the previous newspapers as we saw 651 words that could be classified as negative and 398 that could be classified as positive giving us a percentage of about 62% negative. Looking at the afinn analysis ignoring the word “United” the two most commonly used words were “threat” and “risk” showing up 58 and 45 times respectively. Although these only show up as a -2 on the spectrum from -5 to 5 these words clearly show that there is a lot of concern in this area. Furthermore, in the affin analysis it was surprising that “fire” was not one of the most common of the categorized words as it only appeared 24 times and was much further down the list than it was in New York.

The Daily Oklahoman

Most Commonly Used Words

## # A tibble: 5,584 x 2
##    word          n
##    <chr>     <int>
##  1 climate     927
##  2 change      671
##  3 oklahoma    486
##  4 city        239
##  5 oklahoman   225
##  6 global      174
##  7 water       161
##  8 report      150
##  9 warming     149
## 10 weather     143
## # … with 5,574 more rows

Bing Analysis

## 
## negative positive 
##      380      278

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          166          199          115          212          138          384 
##     positive      sadness     surprise        trust 
##          502          163           94          300

Affin Analysis

## # A tibble: 502 x 3
##    word          n value
##    <chr>     <int> <dbl>
##  1 severe       55    -2
##  2 united       48     1
##  3 natural      44     1
##  4 increase     30     1
##  5 fear         22    -2
##  6 hoax         22    -2
##  7 intense      21     1
##  8 risk         21    -2
##  9 disasters    19    -2
## 10 increased    19     1
## # … with 492 more rows

Insights

The results in Oklahoma City are slightly more positive than the rest of the newspapers examined so far. Running the basic bing analysis we see that 380 words can be classified as negative and 278 words can be classified as positive giving us a total percentage of about 58% which is 4% less than any of the other newspapers already examined. It is also noteworthy that the number of words that can be classified as negative or positive is relatively small so these articles are likely shorter than the articles in the New York Times and Washington Post. This can either be because the articles in this newspaper in general are shorter or because this is not as big of an issue in Oklahoma City. Furthermore, one of the most common words to appear was “hoax” appearing 22 times which was not seen in the top 10 categorized words in the affin analysis in New York, Philadelphia, San Diego, Chicago, or DC.

The Tampa Times

Most Commonly Used Words

## # A tibble: 9,653 x 2
##    word         n
##    <chr>    <int>
##  1 climate   1621
##  2 change    1266
##  3 global     401
##  4 energy     351
##  5 florida    318
##  6 times      315
##  7 warming    306
##  8 trump      234
##  9 tampa      229
## 10 national   207
## # … with 9,643 more rows

Bing Analysis

## 
## negative positive 
##      755      420

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          294          268          192          356          171          700 
##     positive      sadness     surprise        trust 
##          705          285          134          428

Affin Analysis

## # A tibble: 801 x 3
##    word          n value
##    <chr>     <int> <dbl>
##  1 united      117     1
##  2 natural      79     1
##  3 clean        73     2
##  4 hoax         71    -2
##  5 increase     59     1
##  6 threat       58    -2
##  7 support      53     2
##  8 true         46     2
##  9 free         41     1
## 10 agreement    37     1
## # … with 791 more rows

Insights

The sentiment of Tampa Bay is similar to that of Chicago as a basic analysis of the words used in the articles found that 755 of the words were negative and just 420 words were positive giving us a percentage of about 64% negative. Furthermore, Tampa Bay has much more words that can be classified as either negative or positive meaning that this is likely a bigger issue in Tampa Bay than it is in places like Oklahoma City. However, it is interesting that the 4th most common word that could be classified was “hoax” which is more in line with the Oklahoma point of view. The words natural and clean were the 2nd and 3rd which is tough to tell the context of but it seems that similar to Chicago climate change maybe is not seen as a current “crisis” but is more of a growing issue.

USA Today

Most Commonly Used Words

## # A tibble: 9,366 x 2
##    word          n
##    <chr>     <int>
##  1 climate     292
##  2 change      214
##  3 people      189
##  4 president   146
##  5 u.s         142
##  6 world       117
##  7 global      106
##  8 time         99
##  9 house        97
## 10 obama        94
## # … with 9,356 more rows

Bing Analysis

## 
## negative positive 
##      820      459

NRC Analysis

## Joining, by = "word"
## 
##        anger anticipation      disgust         fear          joy     negative 
##          309          330          204          394          246          747 
##     positive      sadness     surprise        trust 
##          793          322          176          464

Affin Analysis

## # A tibble: 840 x 3
##    word         n value
##    <chr>    <int> <dbl>
##  1 fire        90    -2
##  2 united      62     1
##  3 risk        37    -2
##  4 care        35     2
##  5 growing     35     1
##  6 natural     33     1
##  7 increase    28     1
##  8 war         28    -2
##  9 paradise    26     3
## 10 support     25     2
## # … with 830 more rows

Insights

Overall, the sentiment found in the examined articles published by the USA Today was fairly negative. Out of 1,369 words classified as negative or positive, our Bing analysis found 820 to be negative, meaning that approximately 59.90% of the words were found to have a negative connotation. The NRC analysis seems to dispute this, with positive being the single category with the highest score. However, we feel this may be deceiving, as terms such as “U.S.,” “President,” “World,” and “Change,” all words which generally and historically may have a positive connotation, were likely used in a negative context in many of these pieces. Like previously analyzed papers, a similar caveat must be noted with the high score trust received in the NRC analysis due to the likelihood that this is a case of our analysis tools not being able to understand the context in which words are used. Other emotions which had strong representations include fear, sadness, and anticipation, which is unsurprising given the general attitude toward climate change in the U.S. today. Finally, our Affin analysis of the USA Today revealed a negatively skewed distribution of sentiment scores, with “fire” being the scored word receiving the most mentions. Additional negative sentiment words included “risk” and “war.” It should be noted, that as was mentioned in the analysis for previous papers, the word “united” appears as the second most frequent word, earning a positive sentiment score. However, it is likely that some significant percentage of the instances of “united” come in reference to the United States, which should probably be scored as neutral as opposed to positive. Overall, the analyses indicate a decidedly negative sentiment across the USA Today articles we reviewed, which is in line with similar papers published in major East Coast markets such as the New York Times, Washington Post and Philadelphia Inquirer.

The Philadelphia Inquirer

Most Commonly Used Words

## # A tibble: 11,788 x 2
##    word              n
##    <chr>         <int>
##  1 climate         277
##  2 change          239
##  3 people          172
##  4 time            144
##  5 city            135
##  6 energy          132
##  7 philadelphia    131
##  8 environmental   103
##  9 johnson         102
## 10 epa              99
## # … with 11,778 more rows

Bing Analysis

## 
## negative positive 
##      878      579

NRC Analysis

## 
##        anger anticipation      disgust         fear          joy     negative 
##          360          387          240          435          300          843 
##     positive      sadness     surprise        trust 
##          959          365          203          544

Affin Analysis

## # A tibble: 943 x 3
##    word         n value
##    <chr>    <int> <dbl>
##  1 natural     46     1
##  2 united      45     1
##  3 support     43     2
##  4 top         40     2
##  5 care        34     2
##  6 pay         29    -1
##  7 hard        27    -1
##  8 crisis      26    -3
##  9 clean       23     2
## 10 benefits    22     2
## # … with 933 more rows

Insights

Based on the Bing sentiment analysis, the sentiment of climate change pieces written by the Philadelphia Inquirer over the past 14 years has been mostly negative, with 878 out of 1,457 scored words being categorized as negative. These 878 words represent approximately 60.26% of the words rated positive or negative. Of the most commonly used words in the articles, none stand out as being overly positive or negative, with all being fairly neutral words such as “climate,” “time,” “city,” etc. NRC analysis reveals a very high rating for trust, which as previously mentioned could be the result of a lack of contextual interpretation of words like “united” and “trump,” both of which the sentiment analysis scores positively despite the fact that they can have very different contextual usages. Also scoring high in the NRC sentiment analysis were fear, anticipation, sadness and anger. All of these emotions are unsurprising given the amount of public concern and uncertainty over the implications of climate change. In line with the other sentiment analyses, the distribution of Affin scores for the Inquirer articles analyzed is bimodal with a decided negative skew. However, many of the most prevalent scored words were positive, including “natural,” “support,” “clean,” etc. Top negative words included “crisis,” “pay,” and “hard.” As with the other papers, the caveat must be made that some of the positively scored words, such as “united,” may be taken out of context, but it is certainly possible that the same can be said for certain negative words. Overall, it appears that the sentiment around climate change in the Philadelphia Inquirer is similar to that expressed in other papers in similar East Coast cities such as the New York Times, Washington Post, and USA Today.

Recommendations

1) Air Pollution in the Northeast

Given that the articles from the New York Times has the highest percentage of words that are categorized as negative at 65%, the Northeast is where we recommend we focus a significant amount of our effort. Furthermore, the word “crisis” was in the top 10 most commonly used words and was categorized by affin analysis as a -3 where as in other cities we saw words such as “growing” and “increasing” which tells us that while the issue may be higher on the radar of those cities, it may not be the primary issue like it is in New York. Specifically one of the areas that we can really focus on in the Northeast is looking at policy to help reduce the level of emissions from cities like New York, Boston and Philadelphia, all of which are causing a lot of air pollution. This is clearly a central issue for New York especially since the words “emissions” and “carbon” showed up 315 and 404 times respectively across the 100 articles from the New York Times. A focus on changing laws in this area to reduce air pollution should be one of the primary goals of our efforts.

2) Fires in California

Since the word “fire” was the 2nd most common word categorized in the affin analysis in the New York Times it is a good idea to provide support for not just the northeast region but for the regions in California that are being hit hardest by the fires. It appears that the New York Times not only cares about what is happening in its specific region but cares more about climate change as a whole in the United States and since the issue of California fires seems to be their top priority, this is an issue where they would really support our efforts to help. Although the San Diego newspaper did not see the word “fire” as often this is likely a big issue for them too and cities such as LA and San Francisco that we did not examine in our analysis would likely see this as their top priority.

3) Water Pollution near Bodies of Water

After investing reducing air pollution in the northeast, and reducing the forest fires in California my 3rd recommendation and 3rd priority would be helping with water pollution in places like Tampa Bay and Chicago. Chicago and Tampa Bay had a negativity percentage based on our bing analysis of 62% and 64% respectively meaning that this is an important issue for them. Although it may not be seen as the crisis as it is in New York, it is a growing issue in both cities as words such as “increasing” and “growing” are commonly used in the articles. Furthermore, the word “lake” appears 117 times in the 100 articles of Chicago while the word “sea” appears 141 times in the Tampa newspaper. Given how common these words appear and how clearly climate change is a growing issue in the two cities it will be worthwhile to reduce pollution into the nearby bodies of water which can be done through various ad campaigns or by encouraging new community organizations to form focused on trash pickup in these water ways.

4) Lack of support in Southwest

It appears that Oklahoma City and likely places nearby do not care as much about these environmental issues as the other cities we examined. This hypothesis is based on our bing analysis which saw only 58% of the categorized words were negative which is significantly less than the other newspaper and the word hoax was the 6th most common seen in our affin analysis and used 22 over the 100 articles. While investing in anything specific would probably not get much support in these regions it may be worthwhile to make an effort attempting to raise awareness of environmental issues as a whole by having TV ads that show the fires in Califronia or how polluted specific water ways are to try to gain more support for environmental issues in this region.