Instructions

Exercises: 1,3 (Pg. 227); 2 (Pg. 232); 1,2 (Pg. 235);

Submission: Submit via an electronic document on Sakai. Must be submitted as a HTML file generated in RStudio. All assigned problems are chosen according to the textbook R for Data Science. You do not need R code to answer every question. If you answer without using R code, delete the code chunk. If the question requires R code, make sure you display R code. If the question requires a figure, make sure you display a figure. A lot of the questions can be answered in written response, but require R code and/or figures for understanding and explaining.

Chapter 12 (Pg. 227)

Exercise 1

gss_cat %>% ggplot(aes(rincome)) + geom_bar()

The default bar chart of the distribution of rincome (reported income) is hard to understand because there are so many labels on the x-axis; the overlapping labels make the data difficult to comprehend.

gss_cat %>% mutate(rincome = fct_relevel(rincome, c("Not applicable", "Refused", "Don't know", "No answer"))) %>% ggplot(aes(rincome)) + geom_bar() + coord_flip()

I used fct_relevel() to put all the categorical variables together and I used coord_flip() to flip the axes. This reorganization made the bar chart not only more comprehendable, but also more appealing to visualize.

Exercise 3

gss_cat_denom = gss_cat %>% filter(!denom %in% c("No answer", "Other", "Don't know", "Not applicable", "No denomination"))
gss_cat_denom
## # A tibble: 7,025 x 9
##     year marital      age race  rincome    partyid      relig  denom     tvhours
##    <int> <fct>      <int> <fct> <fct>      <fct>        <fct>  <fct>       <int>
##  1  2000 Never mar…    26 White $8000 to … Ind,near rep Prote… Southern…      12
##  2  2000 Divorced      48 White $8000 to … Not str rep… Prote… Baptist-…      NA
##  3  2000 Married       25 White $20000 - … Strong demo… Prote… Southern…      NA
##  4  2000 Divorced      44 White $7000 to … Ind,near dem Prote… Lutheran…      NA
##  5  2000 Married       47 White $25000 or… Strong repu… Prote… Southern…       3
##  6  2000 Married       52 White $25000 or… Strong demo… Prote… Southern…       1
##  7  2000 Married       51 White $25000 or… Strong repu… Prote… United m…      NA
##  8  2000 Married       40 Black $25000 or… Strong demo… Prote… Baptist-…       7
##  9  2000 Married       45 Black Not appli… Independent  Prote… United m…      NA
## 10  2000 Married       49 White Refused    Strong repu… Prote… United m…       2
## # … with 7,015 more rows

The table above shows which religions have a denomination - the code filters out any answers that did not define a denomination.

gss_cat_denom %>% ggplot(aes(relig)) + geom_bar() + coord_flip()

The graph above takes the new dataset, which filters out religions that did not have a corresponding denomination, and graphs the religions listed and their counts. The only religion demonstrated in the graph is Protestant.

Chapter 12 (Pg. 232)

Exercise 2

gss_cat %>% ggplot(aes(tvhours)) + geom_bar()
## Warning: Removed 10146 rows containing non-finite values (stat_count).

The mean would not be a good summary because the distribution of tvhours is skewed right and there are obvious outliers that will impact the summary data. The median would be a better summary for the tvhours data.

Chapter 12 (Pg. 235)

Exercise 1

gss_cat %>% mutate(partyid = fct_collapse(partyid, Republican = c("Strong republican", "Not str republican"), Independent = c("Ind,near rep", "Independent", "Ind,near dem"), Democrat = c("Not str democrat", "Strong democrat"), Other = c("No answer", "Don't know", "Other party"),)) %>% group_by(partyid, year) %>% count() %>% ggplot(mapping = aes(x = year, y = n, color = partyid)) + geom_point() + geom_line()

Over time, the number of Republicans gradually increased, reached a peak around 2006, and then slowly declined. The number of Democrats gradually increased, reached a peak around 2006, and then slowly declined. The number of Independents slowly decreased, reached a peak around 2006, and gradually increased. Each year, from 2000 to 2014, Independents always had a larger proportion of registered voters than Democrats and Republicans and Republicans always had a smaller proportion of registered voters than Democrats and Independents. Democrats and Republicans followed a rather similar trend and Independents went the opposite direction, trend-wise, except for the peak in 2006 and the gradual decrease after 2012.

Exercise 2

gss_cat %>% mutate(rincome = fct_collapse(rincome, Other = c("Not applicable", "Refused", "Don't know", "No answer"), "Less than $1k to $9999" = c("Lt $1000", "$1000 to 2999", "$3000 to 3999", "$4000 to 4999", "$5000 to 5999", "$6000 to 6999", "$7000 to 7999", "$8000 to 9999"), "$10k to $19999 "= c("$10000 to $14999", "$15000 to 19999"), "$20k or more" = c("$20000 to $24999", "$25000 or more"))) %>%
count(rincome)
## Warning: Unknown levels in `f`: $10000 to $14999, $15000 to 19999, $20000 to
## $24999
## # A tibble: 6 x 2
##   rincome                    n
##   <fct>                  <int>
## 1 Other                   8468
## 2 $20k or more            7363
## 3 $20000 - 24999          1283
## 4 $15000 - 19999          1048
## 5 $10000 - 14999          1168
## 6 Less than $1k to $9999  2153

I collapsed the data into categories of 10k.