This truth discovery analysis will focus on the role of the media on the current administration in today’s society. In particular it will focus on trustworthy information by detecting unreliable sources. This senario will analyze the amount of coverage and whether it is positive or negative on the current administration from sources such as the big media corporations, and the Big Tech content producers (Google, YouTube). Negative coverage will be determined by words associated with negative connotation or context, and positive coverage will be determined by words associated with positive connotation or anything other than negative connotation. The process of gathering methods regarding the media outlets will be based on the same search words and the first 10 search results from both Google and YouTube. The size of the sample is based off a 5 to 25-person recommendation of sample size from “Sample Size Planning for Classification Models” research paper. https://arxiv.org/pdf/1211.1323.pdf.
source1Corpus <- Corpus(VectorSource(source0))
docs <- tm_map(source1Corpus, removePunctuation)
## Warning in tm_map.SimpleCorpus(source1Corpus, removePunctuation): transformation
## drops documents
docs <- tm_map(docs, removeNumbers)
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
docs <- tm_map(docs, tolower)
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
docs <- tm_map(docs, stripWhitespace)
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
tdm <- term_stats(docs, ngrams = 10, types = TRUE, subset = type1 == "warning")
tdm
## term type1
## 1 warning signs with some of trumps public comments interspersed in warning
## 2 warning that apparently wasnt heeded in the early days of warning
## type2 type3 type4 type5 type6 type7 type8 type9 type10 count
## 1 signs with some of trumps public comments interspersed in 1
## 2 that apparently wasnt heeded in the early days of 1
## support
## 1 1
## 2 1
In the first link from the google search “trump coronavirus response” Is a link from Vox titled “The Trump administration’s botched coronavirus response, explained.” From the document search we can see various negative words an connotations within the results. The prediction was for negative and as displayed above the actual is a negative bias.
## Warning in tm_map.SimpleCorpus(source1Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2 type3 type4 type5 count support
## 1 botched its response to the botched its response to the 1 1
## 2 botched testing process â € botched testing process â € 1 1
The second link from the google search “trump coronavirus response” Is a link from the Washington Post titled “The Trump administration’s botched coronavirus response, explained.” From the document search we can see various negative words and connotations within the results. The prediction was for a negative response and the actual is negative.
## Warning in tm_map.SimpleCorpus(source2Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2 type3
## 1 downplaying the escalating outbreak trump downplaying the escalating
## type4 type5 count support
## 1 outbreak trump 1 1
The third link from the google search is an article from Politico titled “Trump reworks to write narrative on Coronavirus response.” Here we have a couple of negative conotations such as in the title and also in the results of the doc search. The prediction was for a negative response and the actual is negative.
## Warning in tm_map.SimpleCorpus(source3Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2 type3
## 1 response at headquarters regional headquarters response at headquarters
## 2 response to the coronavirus pandemic response to the
## 3 response to the outbreak has response to the
## type4 type5 count support
## 1 regional headquarters 1 1
## 2 coronavirus pandemic 1 1
## 3 outbreak has 1 1
The fourth search is an article by CNN Titled “WHO defends coronavirus response after Trump criticism.” Within the search there is no negative connotation or negative wording. Article is in regards only to the World Health Organization’s response. The prediction was for a negative bias however the actual result is not negative.
## Warning in tm_map.SimpleCorpus(source4Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2 type3
## 1 continued to try to shift blame for his response to continued to try
## type4 type5 type6 type7 type8 type9 type10 count support
## 1 to shift blame for his response to 1 1
The fifth article is from the news outlet ABC regarding the government’s response to the Coronavirus outbreak. With the word search we can see some negative connotation regarding how the President is trying to shift blame. The prediction was for a negative bias and the actual is negative results.
## Warning in tm_map.SimpleCorpus(source5Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2 type3
## 1 cast himself as the wise leader who rejected the advice cast himself as
## type4 type5 type6 type7 type8 type9 type10 count support
## 1 the wise leader who rejected the advice 1 1
This article by CNN titled “Fact-checking Trump’s attempt to erase his previous coronavirus response” we can immediately see in the title and in some of the content within the article the negative connotation. The prediction was for a negative bias and the actual result shows a negative bias.
## Warning in tm_map.SimpleCorpus(source6Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2 type3 type4
## 1 burden is on the governor and her team to distribute burden is on the
## type5 type6 type7 type8 type9 type10 count support
## 1 governor and her team to distribute 1 1
This article by Fox news shows in the search that there are no negative connotations. The prediction was for a positive article and this article shows a false positive, which means that while it is not positive it is not negative either.
## Warning in tm_map.SimpleCorpus(source7Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1
## 1 concerned about infection has risen steadily since the virus began concerned
## 2 concerned about the economy grew dramatically in the second half concerned
## 3 concerned americans are and what they think about the governmentâ concerned
## 4 concerned americans say they are about the coronavirusâ € ™ concerned
## 5 concerned americans say they are that they someone in their concerned
## 6 concerned that you or someone youâ € ™ re close concerned
## type2 type3 type4 type5 type6 type7 type8 type9
## 1 about infection has risen steadily since the virus
## 2 about the economy grew dramatically in the second
## 3 americans are and what they think about the
## 4 americans say they are about the coronavirusâ €
## 5 americans say they are that they someone in
## 6 that you or someone youâ € ™ re
## type10 count support
## 1 began 1 1
## 2 half 1 1
## 3 governmentâ 1 1
## 4 ™ 1 1
## 5 their 1 1
## 6 close 1 1
This article is from the site fivethiryfive titled “How Americans View The Coronavirus Crisis And Trump’s Response.” There is no indication from a search of the article that shows a negative or dishonest message. The prediction was for a negative article however the actual shows a false positive, while the article is not positive it is not negative either.
## Warning in tm_map.SimpleCorpus(source8Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2
## 1 messages from the mercurial president have left state and local messages from
## type3 type4 type5 type6 type7 type8 type9 type10 count support
## 1 the mercurial president have left state and local 1 1
In this article from our word search there is a negative message in the article. This was a prediction for a negative article and the actual coverage is negative.
## Warning in tm_map.SimpleCorpus(source9Corpus, removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(docs, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(docs, tolower): transformation drops documents
## Warning in tm_map.SimpleCorpus(docs, stripWhitespace): transformation drops
## documents
## term type1 type2
## 1 calling on congress to increase funding for this program by calling on
## type3 type4 type5 type6 type7 type8 type9 type10 count support
## 1 congress to increase funding for this program by 1 1
This article is from the white house offical website. From the doc word search we can not see any overt bias. The Prediction was negative however the actual result is a false positive.
library(readr)
##
## Attaching package: 'readr'
## The following object is masked from 'package:tau':
##
## tokenize
library(caTools)
library(caret)
## Warning: package 'caret' was built under R version 3.6.3
## Loading required package: lattice
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
##
## annotate
print("Google Search")
## [1] "Google Search"
# Sample Data
predicted <- c(0,0,0,0,0,0,0,0,1,0) # first 10 elements
actual <- c(0,0,0,1,0,0,1,0,1,0) # first 10 elements
u <- union(predicted, actual)
t <- table(factor(predicted, u), factor(actual, u))
print(confusionMatrix(t))
## Confusion Matrix and Statistics
##
##
## 0 1
## 0 7 2
## 1 0 1
##
## Accuracy : 0.8
## 95% CI : (0.4439, 0.9748)
## No Information Rate : 0.7
## P-Value [Acc > NIR] : 0.3828
##
## Kappa : 0.4118
##
## Mcnemar's Test P-Value : 0.4795
##
## Sensitivity : 1.0000
## Specificity : 0.3333
## Pos Pred Value : 0.7778
## Neg Pred Value : 1.0000
## Prevalence : 0.7000
## Detection Rate : 0.7000
## Detection Prevalence : 0.9000
## Balanced Accuracy : 0.6667
##
## 'Positive' Class : 0
##
total = 10
googlescore = 8 / total
print("Google Score")
## [1] "Google Score"
print(googlescore)
## [1] 0.8
print("Coverage either positive or negative")
## [1] "Coverage either positive or negative"
faircoverage = .5
print(faircoverage)
## [1] 0.5
The Confusion Matrix above is regarding the Google search “Trump Coronavirus response.” As shown above there is an accuracy rate of 80% regarding media coverage that is viewed negatively. The 80% is represented by the prediction of which links would be trusted to report news unbiasedly. From the search results of the first 10 links that were provided by Google, 8 links displayed the current administration in a negative light, while 2 links showed no negative connotation. There were no indiciations of any positive coverage from any of the news organizations from Google’s search engine. From the search seven were major media or newspaper outlets. Out of the search 1 is determined to be conservative while the other’s are mainly viewed as left leaning. Six of the seven major media outlets were predicted correctly as negative reporting while one of the six results was a false positive. What it also shows is the positive prediction value(pos pred value) at which the model predicted google would provide a link with an unfavoriable view towards the current administration 77% of the time. It also showed the (neg pred value) which showed Google provided a link to a positive article 1 out of every 10 times. We can see that the specificity is at 33% which shows us that Google’s search engine will pull up 3 stories that are a false negative which means that 3 out of every 10 search results mislabels the links negative but are either neutral or positive. We can also see that the balanced accuracy of the model shows it at a rate of 80%, if we compare that to a set positive or negative news rating of 50% for the administration, it shows that Google will provide links to negative articles almost 30% more often than a positive link. Lastly, we can conclude from these results that Google’s algorithm tend’s to favor articles with more leftward or liberal leanings. Below is the link for the Google search results.
library(readr)
library(caTools)
library(caret)
print("YouTube")
## [1] "YouTube"
# Sample Data
predicted <- c(0,0,0,0,0,0,0,0,0,0) # elements 1 - 10
actual <- c(1,1,1,0,0,0,0,1,0,0) # elements 1 - 10
u <- union(predicted, actual)
t <- table(factor(predicted, u), factor(actual, u))
print(confusionMatrix(t))
## Confusion Matrix and Statistics
##
##
## 0 1
## 0 6 4
## 1 0 0
##
## Accuracy : 0.6
## 95% CI : (0.2624, 0.8784)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.6331
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 0.1336
##
## Sensitivity : 1.0
## Specificity : 0.0
## Pos Pred Value : 0.6
## Neg Pred Value : NaN
## Prevalence : 0.6
## Detection Rate : 0.6
## Detection Prevalence : 1.0
## Balanced Accuracy : 0.5
##
## 'Positive' Class : 0
##
total = 10
YouTubeScore = 6 / total
print("YouTube Score")
## [1] "YouTube Score"
print(YouTubeScore)
## [1] 0.6
print("Coverage either positive or negative")
## [1] "Coverage either positive or negative"
faircoverage = .5
print(faircoverage)
## [1] 0.5
The Confusion Matrix above is regarding the YouTube search “Trump Coronavirus response”. As shown above there is an accuracy rate of 60% regarding media coverage that is viewed negatively. The 60% is represented by the prediction of which links would be trusted to report news unbiasedly. From the search results of the first 10 links that were provided by YouTube, 6 links displayed the current administration negatively and 4 links that had no positive or negative connotation. There were no indiciations of any positive coverage from any of the search results from YouTube’s search engine. From the search however 10 were major media outlets. We can also see the positive prediction value(pos pred value) which shows the model predicted YouTube would provide a link with an unfavorable view towards the current administration 60% of the time. It also showed the (neg pred value) as NAN which shows that YouTube provided a link to a positive article at least for the first 10 search results 0 out of 10 times. We can see that the prevalence or our false negative is at 60% which shows that YouTube’s search engine for the first 10 results mislabeled them 4 out of every 10 times. We can also see that the balanced accuracy of the model shows it at a rate of 60%, if we compare that to a set positive or negative news rating of 50% for the administration, it shows that YouTube will provide links to negative articles approximately 10% more often than provide links to positive ones. Lastly, if we take a look at YouTube itself as a search engine we can see that all 10 results come from left leaning outlets and 0 come from outlets deemed conservative, of the 10 outlets, 3 are MSNBC, 2 are NBC, 2 are CBS, 1 is CNN, 1 is Global News, and 1 is The Late Show with Stephen Colbert. What we can speculate from here is that YouTube’s algorithm tends to favor more links with more leftward or liberal leanings. Below is the link to the first ten YouTube Search results.
https://www.youtube.com/results?search_query=Trump+coronavirus+response
library(readr)
library(caTools)
library(caret)
print("Both Google and YouTube")
## [1] "Both Google and YouTube"
# Sample Data
predicted <- c(0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0) # elements 1 - 20
actual <- c(0,0,0,1,0,0,1,0,1,0,1,1,1,0,0,0,0,1,0,0) # elements 1 - 20
u <- union(predicted, actual)
t <- table(factor(predicted, u), factor(actual, u))
print(confusionMatrix(t))
## Confusion Matrix and Statistics
##
##
## 0 1
## 0 13 6
## 1 0 1
##
## Accuracy : 0.7
## 95% CI : (0.4572, 0.8811)
## No Information Rate : 0.65
## P-Value [Acc > NIR] : 0.41663
##
## Kappa : 0.1781
##
## Mcnemar's Test P-Value : 0.04123
##
## Sensitivity : 1.0000
## Specificity : 0.1429
## Pos Pred Value : 0.6842
## Neg Pred Value : 1.0000
## Prevalence : 0.6500
## Detection Rate : 0.6500
## Detection Prevalence : 0.9500
## Balanced Accuracy : 0.5714
##
## 'Positive' Class : 0
##
total = 20
Both = 14 / total
print("Both Search Engines Together")
## [1] "Both Search Engines Together"
print(Both)
## [1] 0.7
print("Coverage either positive or negative")
## [1] "Coverage either positive or negative"
faircoverage = .5
print(faircoverage)
## [1] 0.5
Above is the presentation of a Confusion Matrix regarding both results from the Google and YouTube search. What the Confusion Matrix shows is that this model has an accuracy rate of 70%. We can also see the positive prediction value(pos pred value) which shows the model predicted both would provide a link with an unfavoriable view towards the current administration 68% of the time. It also shows the (neg pred value) as 1.0, so both YouTube and Google provided a link to a positive article given the first 20 search results 1 out of 20 times. We can see that the prevalence or our false negative is at 65% which shows that between both companies the search engines mislabeled the predictions 6.5 out of every 20 times. We can also see that the balanced accuracy of both search engines shows it at a rate of 70%. If we compare that to a set news rating of a positive or negative 50% for the administration, it shows that between both companies the result will present a negative article approximately 20% more often then a positive one.
This shows that both search results tend to favor articles and videos with more of leftward view, therefore displaying a tendency to have content that is more critical towards the current administration. What we can derive from these results is that if these companies are considered to be places of truth and sources of reliability, we have to take a look at the nature of truth thought and consider that truth is relative and how one views the world will influence their actions. If we look beyond these companies we see only individuals, all of whom have their own idologies and beliefs which can influence writing code, making programs, and practicing mathmatical models. While it may seem that mathmatics is objective, the way in which it can be implemented, or the way code may be written, will always be reflections of the indvidual or individuals who created it.