Here I will explore how different stations report on a topic with different words, and how sentiment changes with time.
The TV News dataset about climate change contains almost 600 closed captioning snippets and four columns:
station, the TV news station where the text is from,show, the show on that station where the text was spoken,show_date, the broadcast date of the spoken text, andtext, the actual text spoken on TV.## Classes 'tbl_df', 'tbl' and 'data.frame': 41076 obs. of 4 variables:
## $ station : chr "MSNBC" "MSNBC" "MSNBC" "MSNBC" ...
## $ show : chr "Morning Meeting" "Morning Meeting" "Morning Meeting" "Morning Meeting" ...
## $ show_date: POSIXct, format: "2009-09-22 13:00:00" "2009-09-22 13:00:00" ...
## $ word : chr "the" "interior" "positively" "oozes" ...
## # A tibble: 6 x 4
## station show show_date word
## <chr> <chr> <dttm> <chr>
## 1 MSNBC Morning Meeting 2009-09-22 13:00:00 the
## 2 MSNBC Morning Meeting 2009-09-22 13:00:00 interior
## 3 MSNBC Morning Meeting 2009-09-22 13:00:00 positively
## 4 MSNBC Morning Meeting 2009-09-22 13:00:00 oozes
## 5 MSNBC Morning Meeting 2009-09-22 13:00:00 class
## 6 MSNBC Morning Meeting 2009-09-22 13:00:00 raves
Find out what words are most common when discussing climate change on TV news, as well as the total number of words from each station.
## # A tibble: 3,699 x 2
## word n
## <chr> <int>
## 1 climate 1627
## 2 change 1615
## 3 people 139
## 4 real 125
## 5 president 112
## 6 global 107
## 7 issue 87
## 8 trump 86
## 9 warming 85
## 10 issues 69
## # ... with 3,689 more rows
The most common words include “issue”, “global”, and “job”.
## # A tibble: 3 x 2
## station station_total
## <chr> <int>
## 1 MSNBC 19487
## 2 FOX News 10876
## 3 CNN 10713
## # A tibble: 3 x 5
## station sentiment station_total n percent
## <chr> <chr> <int> <int> <dbl>
## 1 MSNBC negative 19487 526 0.0270
## 2 CNN negative 10713 331 0.0309
## 3 FOX News negative 10876 403 0.0371
## # A tibble: 3 x 5
## station sentiment station_total n percent
## <chr> <chr> <int> <int> <dbl>
## 1 FOX News positive 10876 514 0.0473
## 2 CNN positive 10713 522 0.0487
## 3 MSNBC positive 19487 953 0.0489
MSNBC used a low proportion of negative words but a high proportion of positive words, the reverse is true of FOX News, and CNN is middle of the pack.
It’s important to understand which words specifically are driving sentiment scores.
Proper names like Gore and Trump, which should be treated as neutral, and that “change” was a strong driver of fear sentiment, even though it is by definition part of these texts on climate change.
Now it’s time to explore the different words that each station used in the context of discussing climate change. Which negative words did each station use when talking about climate change on the air?
Some words, like “threat” are used by all three stations but some word choices are quite different. FOX News talks about terrorism and hurricanes, while CNN discusses hoaxes.
Now it is time to see how sentiment is changing over time.
The proportion of positive words looks flat, and the proportion of negative words may be increasing.
We can also explore how individual words have been used over time.
You can see that words like “hoax” and “denier” have been used only recently, and “warming” is decreasing in monthly uses. You can see when a hurricane was being discussed as well.