Setup

Load packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
library(dplyr)
library(statsr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.

load("gss.Rdata")

Part 1: Data

Since 1972, the General Social Survey(GSS) aims to collect the data for monitoring and explaing the growing complexity of the US society. The GSS became a single best source for sociological and attitudinal trend data covering the Unites Satest (http://gss.norc.org). Since the GSS is a observational study conducted annually since 1972 and not randomly assgined the respondents into different groups, we can’t reach to any cause-effect conclusion. However, we can generalize the results representing the population of the United States due to the fact that the data come from a simple random sampling by household address design to represent cross-section of country. In conclusion, we are able to generalize the results reflecting the those of population but not able to reach to any cause-effect relationships.


Part 2: Research question

According to the article by Erik Voeten from Washington Post, Dec 14, 2016, the 50 y.o. or older generation have become more cynical than the younger generation about the Congress. He reports that there is almost a 20 percent-point gap between the youngest (18~49 y.o.) and the oldest (50 y.o. +) generations. And it says there were barely any age differentials for most of the past four decades. So is it really true that the attitude toward Congress differs by generations? According to my experinece in my country, Korea, most of the youngest tends to be less interested or involved in the politics or democracy, they also do not pay attention to the Congress which leaves almost no stands for Congress or democracy. In short, those difference may be occurred by chance of the different level of interest among generations.


Part 3: Exploratory data analysis

First, let’s select the variables of interest for the research questions. And create a new variables called gae_cat which turn the numerical age variable into the categorical variable which meets our research question’s age generations. In order to see if the data is well selected and modified, print out the summary of the variables. In our data, 2012 data is the latest data point I can work with. Since the article from the Washington Post on which my research question is built reflectd the recent political climate after the Trump’s election, the research question might not end up with the same result as the article.

gss_r1 <- gss %>%
    select(year, age, conlegis) %>%
    mutate(age_cat = ifelse(age <= 49, "Youngest", "Oldest"),
           young_cynic = ifelse(age_cat <= 49 & conlegis == "Hardly Any", "Young cynical","Not"), 
           old_cynic = ifelse(age >=50 & conlegis == "Hardly Any", "Old cynical", "Not"))
head(gss_r1)
##   year age conlegis  age_cat young_cynic old_cynic
## 1 1972  23     <NA> Youngest         Not       Not
## 2 1972  70     <NA>   Oldest         Not      <NA>
## 3 1972  48     <NA> Youngest         Not       Not
## 4 1972  27     <NA> Youngest         Not       Not
## 5 1972  61     <NA>   Oldest         Not      <NA>
## 6 1972  26     <NA> Youngest         Not       Not
str(gss_r1)
## 'data.frame':    57061 obs. of  6 variables:
##  $ year       : int  1972 1972 1972 1972 1972 1972 1972 1972 1972 1972 ...
##  $ age        : int  23 70 48 27 61 26 28 27 21 30 ...
##  $ conlegis   : Factor w/ 3 levels "A Great Deal",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ age_cat    : chr  "Youngest" "Oldest" "Youngest" "Youngest" ...
##  $ young_cynic: chr  "Not" "Not" "Not" "Not" ...
##  $ old_cynic  : chr  "Not" NA "Not" "Not" ...
summary(gss_r1)
##       year           age               conlegis       age_cat         
##  Min.   :1972   Min.   :18.0   A Great Deal: 4899   Length:57061      
##  1st Qu.:1983   1st Qu.:31.0   Only Some   :21756   Class :character  
##  Median :1993   Median :43.0   Hardly Any  :10959   Mode  :character  
##  Mean   :1992   Mean   :45.7   NA's        :19447                     
##  3rd Qu.:2002   3rd Qu.:59.0                                          
##  Max.   :2012   Max.   :89.0                                          
##                 NA's   :202                                           
##  young_cynic         old_cynic        
##  Length:57061       Length:57061      
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Second, we need to visualize the trends of the cynical attitude toward Congress by plotting a graph from 1972 to 2012 which are indicated by summary as the start and the end of the year of data respectively. The visualization of the trends will show if there is any visual different pattern between the youngest and oldest generations toward Congress.

gss_r12 <- summarise(group_by(gss_r1 %>% filter(age_cat != "Don't know", conlegis == "Hardly Any"), year, age_cat, conlegis), count=n())
print(gss_r12)
## Source: local data frame [54 x 4]
## Groups: year, age_cat [?]
## 
##     year  age_cat   conlegis count
##    <int>    <chr>     <fctr> <int>
## 1   1973   Oldest Hardly Any    82
## 2   1973 Youngest Hardly Any   140
## 3   1974   Oldest Hardly Any   128
## 4   1974 Youngest Hardly Any   180
## 5   1975   Oldest Hardly Any   145
## 6   1975 Youngest Hardly Any   229
## 7   1976   Oldest Hardly Any   167
## 8   1976 Youngest Hardly Any   214
## 9   1977   Oldest Hardly Any   103
## 10  1977 Youngest Hardly Any   155
## # ... with 44 more rows
gss_r12_t <- summarise(group_by(gss_r1, year), total=n())
print(gss_r12_t)
## # A tibble: 29 Ă— 2
##     year total
##    <int> <int>
## 1   1972  1613
## 2   1973  1504
## 3   1974  1484
## 4   1975  1490
## 5   1976  1499
## 6   1977  1530
## 7   1978  1532
## 8   1980  1468
## 9   1982  1860
## 10  1983  1599
## # ... with 19 more rows
gss_r12 <- merge(gss_r12, gss_r12_t, by = "year")
print(gss_r12)
##    year  age_cat   conlegis count total
## 1  1973   Oldest Hardly Any    82  1504
## 2  1973 Youngest Hardly Any   140  1504
## 3  1974   Oldest Hardly Any   128  1484
## 4  1974 Youngest Hardly Any   180  1484
## 5  1975   Oldest Hardly Any   145  1490
## 6  1975 Youngest Hardly Any   229  1490
## 7  1976   Oldest Hardly Any   167  1499
## 8  1976 Youngest Hardly Any   214  1499
## 9  1977   Oldest Hardly Any   103  1530
## 10 1977 Youngest Hardly Any   155  1530
## 11 1978   Oldest Hardly Any   119  1532
## 12 1978 Youngest Hardly Any   199  1532
## 13 1980   Oldest Hardly Any   190  1468
## 14 1980 Youngest Hardly Any   296  1468
## 15 1982   Oldest Hardly Any   155  1860
## 16 1982 Youngest Hardly Any   279  1860
## 17 1983   Oldest Hardly Any   125  1599
## 18 1983 Youngest Hardly Any   245  1599
## 19 1984   Oldest Hardly Any    79  1473
## 20 1984 Youngest Hardly Any   133  1473
## 21 1986   Oldest Hardly Any   113  1470
## 22 1986 Youngest Hardly Any   185  1470
## 23 1987   Oldest Hardly Any   120  1819
## 24 1987 Youngest Hardly Any   211  1819
## 25 1988   Oldest Hardly Any    87  1481
## 26 1988 Youngest Hardly Any   106  1481
## 27 1989   Oldest Hardly Any    97  1537
## 28 1989 Youngest Hardly Any   127  1537
## 29 1990   Oldest Hardly Any    80  1372
## 30 1990 Youngest Hardly Any   124  1372
## 31 1991   Oldest Hardly Any   104  1517
## 32 1991 Youngest Hardly Any   155  1517
## 33 1993   Oldest Hardly Any   160  1606
## 34 1993 Youngest Hardly Any   267  1606
## 35 1994   Oldest Hardly Any   276  2992
## 36 1994 Youngest Hardly Any   511  2992
## 37 1996   Oldest Hardly Any   282  2904
## 38 1996 Youngest Hardly Any   536  2904
## 39 1998   Oldest Hardly Any   219  2832
## 40 1998 Youngest Hardly Any   352  2832
## 41 2000   Oldest Hardly Any   223  2817
## 42 2000 Youngest Hardly Any   306  2817
## 43 2002   Oldest Hardly Any    94  2765
## 44 2002 Youngest Hardly Any   127  2765
## 45 2004   Oldest Hardly Any   110  2812
## 46 2004 Youngest Hardly Any   140  2812
## 47 2006   Oldest Hardly Any   335  4510
## 48 2006 Youngest Hardly Any   366  4510
## 49 2008   Oldest Hardly Any   284  2023
## 50 2008 Youngest Hardly Any   239  2023
## 51 2010   Oldest Hardly Any   329  2044
## 52 2010 Youngest Hardly Any   257  2044
## 53 2012   Oldest Hardly Any   350  1974
## 54 2012 Youngest Hardly Any   289  1974
gss_r12_y <- gss_r12 %>%
    filter(age_cat == "Youngest") %>%
    mutate(cynical_perc_y = count/total*100)
print(gss_r12_y)
##    year  age_cat   conlegis count total cynical_perc_y
## 1  1973 Youngest Hardly Any   140  1504       9.308511
## 2  1974 Youngest Hardly Any   180  1484      12.129380
## 3  1975 Youngest Hardly Any   229  1490      15.369128
## 4  1976 Youngest Hardly Any   214  1499      14.276184
## 5  1977 Youngest Hardly Any   155  1530      10.130719
## 6  1978 Youngest Hardly Any   199  1532      12.989556
## 7  1980 Youngest Hardly Any   296  1468      20.163488
## 8  1982 Youngest Hardly Any   279  1860      15.000000
## 9  1983 Youngest Hardly Any   245  1599      15.322076
## 10 1984 Youngest Hardly Any   133  1473       9.029192
## 11 1986 Youngest Hardly Any   185  1470      12.585034
## 12 1987 Youngest Hardly Any   211  1819      11.599780
## 13 1988 Youngest Hardly Any   106  1481       7.157326
## 14 1989 Youngest Hardly Any   127  1537       8.262850
## 15 1990 Youngest Hardly Any   124  1372       9.037901
## 16 1991 Youngest Hardly Any   155  1517      10.217535
## 17 1993 Youngest Hardly Any   267  1606      16.625156
## 18 1994 Youngest Hardly Any   511  2992      17.078877
## 19 1996 Youngest Hardly Any   536  2904      18.457300
## 20 1998 Youngest Hardly Any   352  2832      12.429379
## 21 2000 Youngest Hardly Any   306  2817      10.862620
## 22 2002 Youngest Hardly Any   127  2765       4.593128
## 23 2004 Youngest Hardly Any   140  2812       4.978663
## 24 2006 Youngest Hardly Any   366  4510       8.115299
## 25 2008 Youngest Hardly Any   239  2023      11.814137
## 26 2010 Youngest Hardly Any   257  2044      12.573386
## 27 2012 Youngest Hardly Any   289  1974      14.640324
gss_r12_o <- gss_r12 %>%
    filter(age_cat == "Oldest") %>%
    mutate(cynical_perc_o = count/total*100)
print(gss_r12_o)
##    year age_cat   conlegis count total cynical_perc_o
## 1  1973  Oldest Hardly Any    82  1504       5.452128
## 2  1974  Oldest Hardly Any   128  1484       8.625337
## 3  1975  Oldest Hardly Any   145  1490       9.731544
## 4  1976  Oldest Hardly Any   167  1499      11.140761
## 5  1977  Oldest Hardly Any   103  1530       6.732026
## 6  1978  Oldest Hardly Any   119  1532       7.767624
## 7  1980  Oldest Hardly Any   190  1468      12.942779
## 8  1982  Oldest Hardly Any   155  1860       8.333333
## 9  1983  Oldest Hardly Any   125  1599       7.817386
## 10 1984  Oldest Hardly Any    79  1473       5.363204
## 11 1986  Oldest Hardly Any   113  1470       7.687075
## 12 1987  Oldest Hardly Any   120  1819       6.597031
## 13 1988  Oldest Hardly Any    87  1481       5.874409
## 14 1989  Oldest Hardly Any    97  1537       6.310995
## 15 1990  Oldest Hardly Any    80  1372       5.830904
## 16 1991  Oldest Hardly Any   104  1517       6.855636
## 17 1993  Oldest Hardly Any   160  1606       9.962640
## 18 1994  Oldest Hardly Any   276  2992       9.224599
## 19 1996  Oldest Hardly Any   282  2904       9.710744
## 20 1998  Oldest Hardly Any   219  2832       7.733051
## 21 2000  Oldest Hardly Any   223  2817       7.916223
## 22 2002  Oldest Hardly Any    94  2765       3.399638
## 23 2004  Oldest Hardly Any   110  2812       3.911807
## 24 2006  Oldest Hardly Any   335  4510       7.427938
## 25 2008  Oldest Hardly Any   284  2023      14.038557
## 26 2010  Oldest Hardly Any   329  2044      16.095890
## 27 2012  Oldest Hardly Any   350  1974      17.730496
plot(gss_r12_y$year, gss_r12_y$cynical_perc, type = "l", xlab = "year", ylab = "Hardly Any confidence in Congress[%]", col = "red")
par(new=TRUE)
plot(gss_r12_o$year, gss_r12_o$cynical_perc, type = "l", xlab = "year", ylab = "Hardly Any confidence in Congress[%]", col = "blue")

In the graph, the red line represent the trend of the youngest cynical atttitude toward Congress and the blue line of the oldest. Based on the graph, we can see that the oldest generation become more cycnical toward Congress than the youngest in recent years. Compared to the earlier years when the youngest generation show stronger cynical attitude toward Congress, the trend became reversed by generations.


Part 4: Inference

  1. The hypothesis for our research questions are as below:

H0: The generations and the attitude toward Congress are independent. The attitude does not vary by generations. HA: The generations and the attitude toward Congress are dependent. The attitude does vary by generations.

  1. Since there are two categorical variables and one variable with more than 2 levels, the Chi-square independence test. We will evaluate the relationships between generations and cynical attitude toward Congress.

  2. Condition for Chi-square independence test.
    1. Independence: The data come from a simple random sample without replacement with n < 10% of puplulations. Besides one case contributes to only one cell made by generations and attitude towards Congress
    2. Sample size: Each particular cell have at least 5 expected cases
summarise(group_by(gss_r1 %>% filter(age_cat != "Don't know" & conlegis != "Don't know"), age_cat, conlegis), count=n())
## Source: local data frame [6 x 3]
## Groups: age_cat [?]
## 
##    age_cat     conlegis count
##      <chr>       <fctr> <int>
## 1   Oldest A Great Deal  1861
## 2   Oldest    Only Some  7900
## 3   Oldest   Hardly Any  4556
## 4 Youngest A Great Deal  3021
## 5 Youngest    Only Some 13782
## 6 Youngest   Hardly Any  6368
  1. Using Chi-square test only allows the hypothesis test. We can’t find the confidence interval since the Chi-square distribution is always postive and right=skewed. With larger sample size, the theorectical test is available.
gss_chi <- table(gss_r1$age_cat, gss_r1$conlegis)
print(gss_chi)
##           
##            A Great Deal Only Some Hardly Any
##   Oldest           1861      7900       4556
##   Youngest         3021     13782       6368
chisq.test(gss_chi)
## 
##  Pearson's Chi-squared test
## 
## data:  gss_chi
## X-squared = 85.497, df = 2, p-value < 2.2e-16
  1. The Chi-square is 85.487 with df = 2. The p-value is nearly zero which means we reject the null hypothesis. In the other words, there is a convincing evidence that the attitude toward Congress does vary by generations.