library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
library(statsr)
library(lattice)
load("gss.Rdata")
The data for this analysis is the General Social Survey (GSS), which is conducted biannually by the National Opinion Research Center (NORC). This data project is ongoing since its inception in 1972 with the purpose of “monitoring societal change and studying the growing complexity of the American society.” (NORC) Ultimately, the data aims to gather information on the attitudes, behaviors, and attributes of contemporary American society, and to compare these responses in the U.S. to those of other nation-states. More information is found on the NORC website: http://www.norc.org/Research/Projects/Pages/general-social-survey.aspx.
Methodology: According to the GSS Codebook: http://gss.norc.org/get-documentation, the sampling method is not a simple random sample of the population. Rather, samples were taken from metropolitan and non-metropolitan areas, using the blocking technique. Interviews were conducted by canvassing blocks within certain randomly selected Block Groups. Although a simple random sample was not used, the study does use weighting techniques to remedy this. The codebook further states that: “the GSS samples closely resemble distributions reported in the Census and other authoritative sources.” Thus, while not a simple random sample, the sample is sufficiently representative and randomly chosen, allowing it to be generalizable to the general population.
Generalizability: The data, given the methodology, is generalizable; however, it is important to note some areas for improvement. Since the data were collected using in-person interviews, there is a potential that the data is not completely accurate as the interviewee may have incentives to answer certain questions in certain ways, depending on how comfortable they feel with the interviewer. In addition, there’s also concern that the sampling method itself, of an interview, may exclude those individuals who do not have the sufficient time needed to participate in a long(er) in-person interview.
Correlation/Causation: Since this data is observational with no random assignment of the subjects to specific groups, the data CANNOT be used to evaluate causation. Rather, it can only be used as observational data to measure relationships and correlations between variables.
Research Question: Is there a relationship between the amount of media consumption (news) and the confidence in the executive branch of the federal government? What about confidence in the press?
“You’re fake news.” (quote by Donald Trump on multiple occasions). The current President of the United States has greatly antagonized the media, even calling it the “greatest enemy” of the U.S. Those who support Trump seem to agree in disbelieving the press. This question is interesting as it will aim to see if there is any relationship between the amount of media consumption (news) and the confidence in the executive branch of the federal government. It would also be interesting to see if there is any correlation between news consumption and the confidence in press. While the most recent data (from 2016 to 2018) is unavailable, this analysis will use data from 2008 onwards (the election year of the previous administration) to see if this there are any correlations between data over the past decade.
dim(gss)
## [1] 57061 114
Initially, there are over 57,000 responses in the GSS but for the purposes of this analysis, we will filter out the data from 2008 onwards for the variables: news (news consumption), confed (confidence in the executive branch of the federal government), and conpress (confidence in the press).
First, I will filter the data to figure out if there’s a correlation between news and confed.
gss %>%
filter(year >= 2008 &
!is.na(news) &
!is.na(confed))%>%
select(news,confed) -> gss_newsfed
dim(gss_newsfed)
## [1] 2047 2
The data has been reduced significantly to 2,047.
Next, the data I will filter will be to find if there is a correlation between news and confidence in the press.
gss %>%
filter(year >= 2008 &
!is.na(news) &
!is.na(conpress))%>%
select(news,conpress) -> gss_newspress
dim(gss_newspress)
## [1] 2063 2
Again, the data has been reduced significantly to 2,063.
Taking this filtered data, we’ll look at the data in table form.
First: for news consumption
table(gss_newsfed$news, useNA = "ifany")
##
## Everyday Few Times A Week Once A Week Less Than Once Wk
## 650 386 293 321
## Never
## 397
Based on the results, we can see that most of the respondents read the news every day.
Then we’ll see how confident these respondents feel about the federal government.
table(gss_newsfed$confed, useNA = "ifany")
##
## A Great Deal Only Some Hardly Any
## 255 966 826
Just looking at the numbers, we can see that very few respondents have “a great deal” of confidence in the executive branch of the federal government and most have only some or hardly any.
Next we’ll look at how the respondents’ responses to news consumption and their confidence in the press compare:
table(gss_newspress$news, useNA = "ifany")
##
## Everyday Few Times A Week Once A Week Less Than Once Wk
## 654 390 297 324
## Never
## 398
Again, a majority of respondents read the newspaper every day.
table(gss_newspress$conpress, useNA = "ifany")
##
## A Great Deal Only Some Hardly Any
## 183 894 986
Yet, very few have “a great deal” of confidence in the press, with a majority of the respondents expressing only some or hardly any confidence.
+++++++++++++
Now we plot for confed and news:
table(gss_newsfed$news,gss_newsfed$confed)
##
## A Great Deal Only Some Hardly Any
## Everyday 92 276 282
## Few Times A Week 56 194 136
## Once A Week 37 151 105
## Less Than Once Wk 32 170 119
## Never 38 175 184
g <- ggplot(data = gss_newsfed, aes(x=news))
g <- g + geom_bar(aes(fill=confed), position = "dodge")
g + theme(axis.text.x = element_text(angle=60,hjust = 1))
Observations on news and confed:
Another plot to use to compare the results of news-confed:
plot(table(gss_newsfed$news,gss_newsfed$confed))
The mosaic plot above shows that proportionately, the number of respondents who have “A Great Deal” of confidence in the executive branch of the federal government progressively decreases as the amount of news consumption decreases.
+++++++++++++++++
Now let’s look at the plots for news consumption and confidence in the press:
g2 <- ggplot(data = gss_newspress, aes(x=news))
g2 <- g2 + geom_bar(aes(fill=conpress), position = "dodge")
g2 + theme(axis.text.x = element_text(angle=60,hjust = 1))
plot(table(gss_newspress$news,gss_newspress$conpress))
Observations: 1. Similar to the plots for confed and news, the confidence in press (a great deal), seems to progressively decrease as news consumption also decreases. 2. Yet, interestingly, the proportion of respondents who have the least confidence (hardly any) seems to increase as news consumptions decreases.
For confidence in the exec branch of the federal government and news consumption:
Null hypothesis: News consumption and confidence in the executive branch of the federal government are independent.
Alternative hypothesis: News consumption and confidence in the executive branch of the federal government are associated.
+++++++++++++++
Independence: The GSS data is collected through random sample surveys, thus we can assume independence for the data.
Sample Size: The samples are collected without replacement and based on the number of results, the sample size is well under 10% of the entire U.S. adult population.
Degrees of Freedom: There are 3 confidence levels (hardly any, some, a great deal) and 5 levels of news consumption (every day, a few times a week, once a week, < once a week, never). since there are two categorical variables, each variable with more than 2 levels, the chi-squared test of independence to test the hypothesis should be used.
Expected Counts: To conduct a chi-square test (GOF or independence), the expected counts for each cell should be at least 5.
We can check this below:
chisq.test(gss_newsfed$news,gss_newsfed$confed)$expected
## gss_newsfed$confed
## gss_newsfed$news A Great Deal Only Some Hardly Any
## Everyday 80.97215 306.7416 262.2863
## Few Times A Week 48.08500 182.1573 155.7577
## Once A Week 36.49976 138.2697 118.2306
## Less Than Once Wk 39.98779 151.4831 129.5291
## Never 49.45530 187.3483 160.1964
The table above clearly has an expected count more than 5 for all cells.
Then we can now conduct the chi-square test
chisq.test(gss_newsfed$news,gss_newsfed$confed)
##
## Pearson's Chi-squared test
##
## data: gss_newsfed$news and gss_newsfed$confed
## X-squared = 25.022, df = 8, p-value = 0.001541
The results are as follows: X-squared statistic is 25.728 The corresponding p-value for 8 degrees of freedom is 0.001541, which is much lower than the significance level of 0.05.
Conclusion for confed - news: Based on the data analysis, there is convincing evidence to reject the null hypothesis in favor of the alternative hypothesis. News consumption and confidence in the executive branch of the federal government are associated (not independent). Only non-causal links/correlation between the two variables can be assumed as the study is observation (NOT causation!).
+++++++++++++++++++++++++++++
For confidence in the press and news consumption:
Null hypothesis: News consumption and confidence in the press are independent.
Alternative hypothesis: News consumption and confidence in the press are associated.
+++++++++++++++
Independence: The GSS data is collected through random sample surveys, thus we can assume independence for the data.
Sample Size: The samples are collected without replacement and based on the number of results, the sample size is well under 10% of the entire U.S. adult population.
Degrees of Freedom: There are 3 confidence levels (hardly any, some, a great deal) and 5 levels of news consumption (every day, a few times a week, once a week, < once a week, never). since there are two categorical variables, each variable with more than 2 levels, the chi-squared test of independence to test the hypothesis should be used.
Expected Counts: To conduct a chi-square test (GOF or independence), the expected counts for each cell should be at least 5.
We can check this below:
chisq.test(gss_newspress$news,gss_newspress$conpress)$expected
## gss_newspress$conpress
## gss_newspress$news A Great Deal Only Some Hardly Any
## Everyday 58.01357 283.4106 312.5759
## Few Times A Week 34.59525 169.0063 186.3984
## Once A Week 26.34561 128.7048 141.9496
## Less Than Once Wk 28.74067 140.4052 154.8541
## Never 35.30490 172.4731 190.2220
The table above clearly has an expected count more than 5 for all cells.
Then we can now conduct the chi-square test
chisq.test(gss_newspress$news,gss_newspress$conpress)
##
## Pearson's Chi-squared test
##
## data: gss_newspress$news and gss_newspress$conpress
## X-squared = 10.329, df = 8, p-value = 0.2427
The results are as follows: X-squared = , df = 8, p-value = 0.2427 X-squared statistic is 10.329 The corresponding p-value for 8 degrees of freedom is 0.2427, which is much higher than the significance level of 0.05.
Conclusion for conpress - news: Based on the data analysis, there is no convincing evidence to reject the null hypothesis in favor of the alternative hypothesis. We can reject the alternative hypothesis and can assume that news consumption and confidence in the press are independent.