library(ggplot2)## Warning: package 'ggplot2' was built under R version 3.3.2
library(dplyr)
library(statsr)Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.
load("gss.Rdata")Since 1972, the General Social Survey(GSS) aims to collect the data for monitoring and explaing the growing complexity of the US society. The GSS became a single best source for sociological and attitudinal trend data covering the Unites Satest (http://gss.norc.org). Since the GSS is a observational study conducted annually since 1972 and not randomly assgined the respondents into different groups, we can’t reach to any cause-effect conclusion. However, we can generalize the results representing the population of the United States due to the fact that the data come from a simple random sampling by household address design to represent cross-section of country. In conclusion, we are able to generalize the results reflecting the those of population but not able to reach to any cause-effect relationships.
According to the article by Erik Voeten from Washington Post, Dec 14, 2016, the 50 y.o. or older generation have become more cynical than the younger generation about the Congress. He reports that there is almost a 20 percent-point gap between the youngest (18~49 y.o.) and the oldest (50 y.o. +) generations. And it says there were barely any age differentials for most of the past four decades. So is it really true that the attitude toward Congress differs by generations? According to my experinece in my country, Korea, most of the youngest tends to be less interested or involved in the politics or democracy, they also do not pay attention to the Congress which leaves almost no stands for Congress or democracy. In short, those difference may be occurred by chance of the different level of interest among generations.
First, let’s select the variables of interest for the research questions. And create a new variables called gae_cat which turn the numerical age variable into the categorical variable which meets our research question’s age generations. In order to see if the data is well selected and modified, print out the summary of the variables. In our data, 2012 data is the latest data point I can work with. Since the article from the Washington Post on which my research question is built reflectd the recent political climate after the Trump’s election, the research question might not end up with the same result as the article.
gss_r1 <- gss %>%
select(year, age, conlegis) %>%
mutate(age_cat = ifelse(age <= 49, "Youngest", "Oldest"),
young_cynic = ifelse(age_cat <= 49 & conlegis == "Hardly Any", "Young cynical","Not"),
old_cynic = ifelse(age >=50 & conlegis == "Hardly Any", "Old cynical", "Not"))
head(gss_r1)## year age conlegis age_cat young_cynic old_cynic
## 1 1972 23 <NA> Youngest Not Not
## 2 1972 70 <NA> Oldest Not <NA>
## 3 1972 48 <NA> Youngest Not Not
## 4 1972 27 <NA> Youngest Not Not
## 5 1972 61 <NA> Oldest Not <NA>
## 6 1972 26 <NA> Youngest Not Not
str(gss_r1)## 'data.frame': 57061 obs. of 6 variables:
## $ year : int 1972 1972 1972 1972 1972 1972 1972 1972 1972 1972 ...
## $ age : int 23 70 48 27 61 26 28 27 21 30 ...
## $ conlegis : Factor w/ 3 levels "A Great Deal",..: NA NA NA NA NA NA NA NA NA NA ...
## $ age_cat : chr "Youngest" "Oldest" "Youngest" "Youngest" ...
## $ young_cynic: chr "Not" "Not" "Not" "Not" ...
## $ old_cynic : chr "Not" NA "Not" "Not" ...
summary(gss_r1)## year age conlegis age_cat
## Min. :1972 Min. :18.0 A Great Deal: 4899 Length:57061
## 1st Qu.:1983 1st Qu.:31.0 Only Some :21756 Class :character
## Median :1993 Median :43.0 Hardly Any :10959 Mode :character
## Mean :1992 Mean :45.7 NA's :19447
## 3rd Qu.:2002 3rd Qu.:59.0
## Max. :2012 Max. :89.0
## NA's :202
## young_cynic old_cynic
## Length:57061 Length:57061
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
Second, we need to visualize the trends of the cynical attitude toward Congress by plotting a graph from 1972 to 2012 which are indicated by summary as the start and the end of the year of data respectively. The visualization of the trends will show if there is any visual different pattern between the youngest and oldest generations toward Congress.
gss_r12 <- summarise(group_by(gss_r1 %>% filter(age_cat != "Don't know", conlegis == "Hardly Any"), year, age_cat, conlegis), count=n())
print(gss_r12)## Source: local data frame [54 x 4]
## Groups: year, age_cat [?]
##
## year age_cat conlegis count
## <int> <chr> <fctr> <int>
## 1 1973 Oldest Hardly Any 82
## 2 1973 Youngest Hardly Any 140
## 3 1974 Oldest Hardly Any 128
## 4 1974 Youngest Hardly Any 180
## 5 1975 Oldest Hardly Any 145
## 6 1975 Youngest Hardly Any 229
## 7 1976 Oldest Hardly Any 167
## 8 1976 Youngest Hardly Any 214
## 9 1977 Oldest Hardly Any 103
## 10 1977 Youngest Hardly Any 155
## # ... with 44 more rows
gss_r12_t <- summarise(group_by(gss_r1, year), total=n())
print(gss_r12_t)## # A tibble: 29 Ă— 2
## year total
## <int> <int>
## 1 1972 1613
## 2 1973 1504
## 3 1974 1484
## 4 1975 1490
## 5 1976 1499
## 6 1977 1530
## 7 1978 1532
## 8 1980 1468
## 9 1982 1860
## 10 1983 1599
## # ... with 19 more rows
gss_r12 <- merge(gss_r12, gss_r12_t, by = "year")
print(gss_r12)## year age_cat conlegis count total
## 1 1973 Oldest Hardly Any 82 1504
## 2 1973 Youngest Hardly Any 140 1504
## 3 1974 Oldest Hardly Any 128 1484
## 4 1974 Youngest Hardly Any 180 1484
## 5 1975 Oldest Hardly Any 145 1490
## 6 1975 Youngest Hardly Any 229 1490
## 7 1976 Oldest Hardly Any 167 1499
## 8 1976 Youngest Hardly Any 214 1499
## 9 1977 Oldest Hardly Any 103 1530
## 10 1977 Youngest Hardly Any 155 1530
## 11 1978 Oldest Hardly Any 119 1532
## 12 1978 Youngest Hardly Any 199 1532
## 13 1980 Oldest Hardly Any 190 1468
## 14 1980 Youngest Hardly Any 296 1468
## 15 1982 Oldest Hardly Any 155 1860
## 16 1982 Youngest Hardly Any 279 1860
## 17 1983 Oldest Hardly Any 125 1599
## 18 1983 Youngest Hardly Any 245 1599
## 19 1984 Oldest Hardly Any 79 1473
## 20 1984 Youngest Hardly Any 133 1473
## 21 1986 Oldest Hardly Any 113 1470
## 22 1986 Youngest Hardly Any 185 1470
## 23 1987 Oldest Hardly Any 120 1819
## 24 1987 Youngest Hardly Any 211 1819
## 25 1988 Oldest Hardly Any 87 1481
## 26 1988 Youngest Hardly Any 106 1481
## 27 1989 Oldest Hardly Any 97 1537
## 28 1989 Youngest Hardly Any 127 1537
## 29 1990 Oldest Hardly Any 80 1372
## 30 1990 Youngest Hardly Any 124 1372
## 31 1991 Oldest Hardly Any 104 1517
## 32 1991 Youngest Hardly Any 155 1517
## 33 1993 Oldest Hardly Any 160 1606
## 34 1993 Youngest Hardly Any 267 1606
## 35 1994 Oldest Hardly Any 276 2992
## 36 1994 Youngest Hardly Any 511 2992
## 37 1996 Oldest Hardly Any 282 2904
## 38 1996 Youngest Hardly Any 536 2904
## 39 1998 Oldest Hardly Any 219 2832
## 40 1998 Youngest Hardly Any 352 2832
## 41 2000 Oldest Hardly Any 223 2817
## 42 2000 Youngest Hardly Any 306 2817
## 43 2002 Oldest Hardly Any 94 2765
## 44 2002 Youngest Hardly Any 127 2765
## 45 2004 Oldest Hardly Any 110 2812
## 46 2004 Youngest Hardly Any 140 2812
## 47 2006 Oldest Hardly Any 335 4510
## 48 2006 Youngest Hardly Any 366 4510
## 49 2008 Oldest Hardly Any 284 2023
## 50 2008 Youngest Hardly Any 239 2023
## 51 2010 Oldest Hardly Any 329 2044
## 52 2010 Youngest Hardly Any 257 2044
## 53 2012 Oldest Hardly Any 350 1974
## 54 2012 Youngest Hardly Any 289 1974
gss_r12_y <- gss_r12 %>%
filter(age_cat == "Youngest") %>%
mutate(cynical_perc_y = count/total*100)
print(gss_r12_y)## year age_cat conlegis count total cynical_perc_y
## 1 1973 Youngest Hardly Any 140 1504 9.308511
## 2 1974 Youngest Hardly Any 180 1484 12.129380
## 3 1975 Youngest Hardly Any 229 1490 15.369128
## 4 1976 Youngest Hardly Any 214 1499 14.276184
## 5 1977 Youngest Hardly Any 155 1530 10.130719
## 6 1978 Youngest Hardly Any 199 1532 12.989556
## 7 1980 Youngest Hardly Any 296 1468 20.163488
## 8 1982 Youngest Hardly Any 279 1860 15.000000
## 9 1983 Youngest Hardly Any 245 1599 15.322076
## 10 1984 Youngest Hardly Any 133 1473 9.029192
## 11 1986 Youngest Hardly Any 185 1470 12.585034
## 12 1987 Youngest Hardly Any 211 1819 11.599780
## 13 1988 Youngest Hardly Any 106 1481 7.157326
## 14 1989 Youngest Hardly Any 127 1537 8.262850
## 15 1990 Youngest Hardly Any 124 1372 9.037901
## 16 1991 Youngest Hardly Any 155 1517 10.217535
## 17 1993 Youngest Hardly Any 267 1606 16.625156
## 18 1994 Youngest Hardly Any 511 2992 17.078877
## 19 1996 Youngest Hardly Any 536 2904 18.457300
## 20 1998 Youngest Hardly Any 352 2832 12.429379
## 21 2000 Youngest Hardly Any 306 2817 10.862620
## 22 2002 Youngest Hardly Any 127 2765 4.593128
## 23 2004 Youngest Hardly Any 140 2812 4.978663
## 24 2006 Youngest Hardly Any 366 4510 8.115299
## 25 2008 Youngest Hardly Any 239 2023 11.814137
## 26 2010 Youngest Hardly Any 257 2044 12.573386
## 27 2012 Youngest Hardly Any 289 1974 14.640324
gss_r12_o <- gss_r12 %>%
filter(age_cat == "Oldest") %>%
mutate(cynical_perc_o = count/total*100)
print(gss_r12_o)## year age_cat conlegis count total cynical_perc_o
## 1 1973 Oldest Hardly Any 82 1504 5.452128
## 2 1974 Oldest Hardly Any 128 1484 8.625337
## 3 1975 Oldest Hardly Any 145 1490 9.731544
## 4 1976 Oldest Hardly Any 167 1499 11.140761
## 5 1977 Oldest Hardly Any 103 1530 6.732026
## 6 1978 Oldest Hardly Any 119 1532 7.767624
## 7 1980 Oldest Hardly Any 190 1468 12.942779
## 8 1982 Oldest Hardly Any 155 1860 8.333333
## 9 1983 Oldest Hardly Any 125 1599 7.817386
## 10 1984 Oldest Hardly Any 79 1473 5.363204
## 11 1986 Oldest Hardly Any 113 1470 7.687075
## 12 1987 Oldest Hardly Any 120 1819 6.597031
## 13 1988 Oldest Hardly Any 87 1481 5.874409
## 14 1989 Oldest Hardly Any 97 1537 6.310995
## 15 1990 Oldest Hardly Any 80 1372 5.830904
## 16 1991 Oldest Hardly Any 104 1517 6.855636
## 17 1993 Oldest Hardly Any 160 1606 9.962640
## 18 1994 Oldest Hardly Any 276 2992 9.224599
## 19 1996 Oldest Hardly Any 282 2904 9.710744
## 20 1998 Oldest Hardly Any 219 2832 7.733051
## 21 2000 Oldest Hardly Any 223 2817 7.916223
## 22 2002 Oldest Hardly Any 94 2765 3.399638
## 23 2004 Oldest Hardly Any 110 2812 3.911807
## 24 2006 Oldest Hardly Any 335 4510 7.427938
## 25 2008 Oldest Hardly Any 284 2023 14.038557
## 26 2010 Oldest Hardly Any 329 2044 16.095890
## 27 2012 Oldest Hardly Any 350 1974 17.730496
plot(gss_r12_y$year, gss_r12_y$cynical_perc, type = "l", xlab = "year", ylab = "Hardly Any confidence in Congress[%]", col = "red")
par(new=TRUE)
plot(gss_r12_o$year, gss_r12_o$cynical_perc, type = "l", xlab = "year", ylab = "Hardly Any confidence in Congress[%]", col = "blue")In the graph, the red line represent the trend of the youngest cynical atttitude toward Congress and the blue line of the oldest. Based on the graph, we can see that the oldest generation become more cycnical toward Congress than the youngest in recent years. Compared to the earlier years when the youngest generation show stronger cynical attitude toward Congress, the trend became reversed by generations.
H0: The generations and the attitude toward Congress are independent. The attitude does not vary by generations. HA: The generations and the attitude toward Congress are dependent. The attitude does vary by generations.
Since there are two categorical variables and one variable with more than 2 levels, the Chi-square independence test. We will evaluate the relationships between generations and cynical attitude toward Congress.
summarise(group_by(gss_r1 %>% filter(age_cat != "Don't know" & conlegis != "Don't know"), age_cat, conlegis), count=n())## Source: local data frame [6 x 3]
## Groups: age_cat [?]
##
## age_cat conlegis count
## <chr> <fctr> <int>
## 1 Oldest A Great Deal 1861
## 2 Oldest Only Some 7900
## 3 Oldest Hardly Any 4556
## 4 Youngest A Great Deal 3021
## 5 Youngest Only Some 13782
## 6 Youngest Hardly Any 6368
gss_chi <- table(gss_r1$age_cat, gss_r1$conlegis)
print(gss_chi)##
## A Great Deal Only Some Hardly Any
## Oldest 1861 7900 4556
## Youngest 3021 13782 6368
chisq.test(gss_chi)##
## Pearson's Chi-squared test
##
## data: gss_chi
## X-squared = 85.497, df = 2, p-value < 2.2e-16