suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3

1. Explore the distribution of rincome (reported income). What makes the default bar chart hard to understand? How could you improve the plot?

My first attempt is to use geom_bar() with the default settings.

rincome_plot <-
  gss_cat %>%
  ggplot(aes(x = rincome)) +
  geom_bar()
rincome_plot

The problem with default bar chart settings, are that the labels overlapping and impossible to read. I’ll try changing the angle of the x-axis labels to vertical so that they will not overlap.

rincome_plot +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

This is better because the labels are not overlapping, but also difficult to read because the labels are vertical. I could try angling the labels so that they are easier to read, but not overlapping.

rincome_plot +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

But the solution I prefer for bar charts with long labels is to flip the axes, so that the bars are horizontal. Then the category labels are also horizontal, and easy to read.

rincome_plot +
  coord_flip()

Though more than asked for in this question, I could further improve this plot by

  1. removing the “Not applicable” responses,
  2. renaming “Lt $1000” to “Less than $1000”,
  3. using color to distinguish non-response categories (“Refused”, “Don’t know”, and “No answer”) from income levels (“Lt $1000”, …),
  4. adding meaningful y- and x-axis titles, and
  5. formatting the counts axis labels to use commas.
gss_cat %>%
  filter(!rincome %in% c("Not applicable")) %>%
  mutate(rincome = fct_recode(rincome,
    "Less than $1000" = "Lt $1000"
  )) %>%
  mutate(rincome_na = rincome %in% c("Refused", "Don't know", "No answer")) %>%
  ggplot(aes(x = rincome, fill = rincome_na)) +
  geom_bar() +
  coord_flip() +
  scale_y_continuous("Number of Respondents", labels = scales::comma) +
  scale_x_discrete("Respondent's Income") +
  scale_fill_manual(values = c("FALSE" = "black", "TRUE" = "gray")) +
  theme(legend.position = "None")

If I were only interested in non-missing responses, then I could drop all respondents who answered “Not applicable”, “Refused”, “Don’t know”, or “No answer”.

gss_cat %>%
  filter(!rincome %in% c("Not applicable", "Don't know", "No answer", "Refused")) %>%
  mutate(rincome = fct_recode(rincome,
    "Less than $1000" = "Lt $1000"
  )) %>%
  ggplot(aes(x = rincome)) +
  geom_bar() +
  coord_flip() +
  scale_y_continuous("Number of Respondents", labels = scales::comma) +
  scale_x_discrete("Respondent's Income")

2. What is the most common relig in this survey? What’s the most common partyid?

The most common relig is “Protestant”

gss_cat %>%
  count(relig) %>%
  arrange(desc(n)) %>%
  head(1)

The most common partyid is “Independent”

gss_cat %>%
  count(partyid) %>%
  arrange(desc(n)) %>%
  head(1)

3 .Which relig does denom (denomination) apply to? How can you find out with a table? How can you find out with a visualization?

levels(gss_cat$denom)
 [1] "No answer"            "Don't know"           "No denomination"     
 [4] "Other"                "Episcopal"            "Presbyterian-dk wh"  
 [7] "Presbyterian, merged" "Other presbyterian"   "United pres ch in us"
[10] "Presbyterian c in us" "Lutheran-dk which"    "Evangelical luth"    
[13] "Other lutheran"       "Wi evan luth synod"   "Lutheran-mo synod"   
[16] "Luth ch in america"   "Am lutheran"          "Methodist-dk which"  
[19] "Other methodist"      "United methodist"     "Afr meth ep zion"    
[22] "Afr meth episcopal"   "Baptist-dk which"     "Other baptists"      
[25] "Southern baptist"     "Nat bapt conv usa"    "Nat bapt conv of am" 
[28] "Am bapt ch in usa"    "Am baptist asso"      "Not applicable"      

From the context it is clear that denom refers to “Protestant” (and unsurprising given that it is the largest category in freq). Let’s filter out the non-responses, no answers, others, not-applicable, or no denomination, to leave only answers to denominations. After doing that, the only remaining responses are “Protestant”.

gss_cat %>%
  filter(!denom %in% c(
    "No answer", "Other", "Don't know", "Not applicable",
    "No denomination"
  )) %>%
  count(relig)

This is also clear in a scatter plot of relig vs. denom where the points are proportional to the size of the number of answers (since otherwise there would be overplotting).

gss_cat %>%
  count(relig, denom) %>%
  ggplot(aes(x = relig, y = denom, size = n)) +
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))

LS0tDQp0aXRsZTogIkdlbmVyYWwgU29jaWFsIFN1cnZleSINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3J9DQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidGlkeXZlcnNlIikpDQpgYGANCg0KIyMjIDEuIEV4cGxvcmUgdGhlIGRpc3RyaWJ1dGlvbiBvZiBgcmluY29tZWAgKHJlcG9ydGVkIGluY29tZSkuIFdoYXQgbWFrZXMgdGhlIGRlZmF1bHQgYmFyIGNoYXJ0IGhhcmQgdG8gdW5kZXJzdGFuZD8gSG93IGNvdWxkIHlvdSBpbXByb3ZlIHRoZSBwbG90Pw0KDQpNeSBmaXJzdCBhdHRlbXB0IGlzIHRvIHVzZSBgZ2VvbV9iYXIoKWAgd2l0aCB0aGUgZGVmYXVsdCBzZXR0aW5ncy4NCg0KYGBge3J9DQpyaW5jb21lX3Bsb3QgPC0NCiAgZ3NzX2NhdCAlPiUNCiAgZ2dwbG90KGFlcyh4ID0gcmluY29tZSkpICsNCiAgZ2VvbV9iYXIoKQ0KcmluY29tZV9wbG90DQpgYGANCg0KVGhlIHByb2JsZW0gd2l0aCBkZWZhdWx0IGJhciBjaGFydCBzZXR0aW5ncywgYXJlIHRoYXQgdGhlIGxhYmVscyBvdmVybGFwcGluZyBhbmQgaW1wb3NzaWJsZSB0byByZWFkLiBJ4oCZbGwgdHJ5IGNoYW5naW5nIHRoZSBhbmdsZSBvZiB0aGUgeC1heGlzIGxhYmVscyB0byB2ZXJ0aWNhbCBzbyB0aGF0IHRoZXkgd2lsbCBub3Qgb3ZlcmxhcC4NCg0KYGBge3J9DQpyaW5jb21lX3Bsb3QgKw0KICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDkwLCBoanVzdCA9IDEpKQ0KYGBgDQoNClRoaXMgaXMgYmV0dGVyIGJlY2F1c2UgdGhlIGxhYmVscyBhcmUgbm90IG92ZXJsYXBwaW5nLCBidXQgYWxzbyBkaWZmaWN1bHQgdG8gcmVhZCBiZWNhdXNlIHRoZSBsYWJlbHMgYXJlIHZlcnRpY2FsLiBJIGNvdWxkIHRyeSBhbmdsaW5nIHRoZSBsYWJlbHMgc28gdGhhdCB0aGV5IGFyZSBlYXNpZXIgdG8gcmVhZCwgYnV0IG5vdCBvdmVybGFwcGluZy4NCg0KYGBge3J9DQpyaW5jb21lX3Bsb3QgKw0KICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDQ1LCBoanVzdCA9IDEpKQ0KYGBgDQoNCkJ1dCB0aGUgc29sdXRpb24gSSBwcmVmZXIgZm9yIGJhciBjaGFydHMgd2l0aCBsb25nIGxhYmVscyBpcyB0byBmbGlwIHRoZSBheGVzLCBzbyB0aGF0IHRoZSBiYXJzIGFyZSBob3Jpem9udGFsLiBUaGVuIHRoZSBjYXRlZ29yeSBsYWJlbHMgYXJlIGFsc28gaG9yaXpvbnRhbCwgYW5kIGVhc3kgdG8gcmVhZC4NCg0KYGBge3J9DQpyaW5jb21lX3Bsb3QgKw0KICBjb29yZF9mbGlwKCkNCmBgYA0KDQpUaG91Z2ggbW9yZSB0aGFuIGFza2VkIGZvciBpbiB0aGlzIHF1ZXN0aW9uLCBJIGNvdWxkIGZ1cnRoZXIgaW1wcm92ZSB0aGlzIHBsb3QgYnkNCg0KMS4gcmVtb3ZpbmcgdGhlIOKAnE5vdCBhcHBsaWNhYmxl4oCdIHJlc3BvbnNlcywNCjIuIHJlbmFtaW5nIOKAnEx0ICQxMDAw4oCdIHRvIOKAnExlc3MgdGhhbiAkMTAwMOKAnSwNCjMuIHVzaW5nIGNvbG9yIHRvIGRpc3Rpbmd1aXNoIG5vbi1yZXNwb25zZSBjYXRlZ29yaWVzICjigJxSZWZ1c2Vk4oCdLCDigJxEb27igJl0IGtub3figJ0sIGFuZCDigJxObyBhbnN3ZXLigJ0pIGZyb20gaW5jb21lIGxldmVscyAo4oCcTHQgJDEwMDDigJ0sIOKApiksDQo0LiBhZGRpbmcgbWVhbmluZ2Z1bCB5LSBhbmQgeC1heGlzIHRpdGxlcywgYW5kDQo1LiBmb3JtYXR0aW5nIHRoZSBjb3VudHMgYXhpcyBsYWJlbHMgdG8gdXNlIGNvbW1hcy4NCg0KYGBge3J9DQpnc3NfY2F0ICU+JQ0KICBmaWx0ZXIoIXJpbmNvbWUgJWluJSBjKCJOb3QgYXBwbGljYWJsZSIpKSAlPiUNCiAgbXV0YXRlKHJpbmNvbWUgPSBmY3RfcmVjb2RlKHJpbmNvbWUsDQogICAgIkxlc3MgdGhhbiAkMTAwMCIgPSAiTHQgJDEwMDAiDQogICkpICU+JQ0KICBtdXRhdGUocmluY29tZV9uYSA9IHJpbmNvbWUgJWluJSBjKCJSZWZ1c2VkIiwgIkRvbid0IGtub3ciLCAiTm8gYW5zd2VyIikpICU+JQ0KICBnZ3Bsb3QoYWVzKHggPSByaW5jb21lLCBmaWxsID0gcmluY29tZV9uYSkpICsNCiAgZ2VvbV9iYXIoKSArDQogIGNvb3JkX2ZsaXAoKSArDQogIHNjYWxlX3lfY29udGludW91cygiTnVtYmVyIG9mIFJlc3BvbmRlbnRzIiwgbGFiZWxzID0gc2NhbGVzOjpjb21tYSkgKw0KICBzY2FsZV94X2Rpc2NyZXRlKCJSZXNwb25kZW50J3MgSW5jb21lIikgKw0KICBzY2FsZV9maWxsX21hbnVhbCh2YWx1ZXMgPSBjKCJGQUxTRSIgPSAiYmxhY2siLCAiVFJVRSIgPSAiZ3JheSIpKSArDQogIHRoZW1lKGxlZ2VuZC5wb3NpdGlvbiA9ICJOb25lIikNCmBgYA0KDQpJZiBJIHdlcmUgb25seSBpbnRlcmVzdGVkIGluIG5vbi1taXNzaW5nIHJlc3BvbnNlcywgdGhlbiBJIGNvdWxkIGRyb3AgYWxsIHJlc3BvbmRlbnRzIHdobyBhbnN3ZXJlZCDigJxOb3QgYXBwbGljYWJsZeKAnSwg4oCcUmVmdXNlZOKAnSwg4oCcRG9u4oCZdCBrbm934oCdLCBvciDigJxObyBhbnN3ZXLigJ0uDQoNCmBgYHtyfQ0KZ3NzX2NhdCAlPiUNCiAgZmlsdGVyKCFyaW5jb21lICVpbiUgYygiTm90IGFwcGxpY2FibGUiLCAiRG9uJ3Qga25vdyIsICJObyBhbnN3ZXIiLCAiUmVmdXNlZCIpKSAlPiUNCiAgbXV0YXRlKHJpbmNvbWUgPSBmY3RfcmVjb2RlKHJpbmNvbWUsDQogICAgIkxlc3MgdGhhbiAkMTAwMCIgPSAiTHQgJDEwMDAiDQogICkpICU+JQ0KICBnZ3Bsb3QoYWVzKHggPSByaW5jb21lKSkgKw0KICBnZW9tX2JhcigpICsNCiAgY29vcmRfZmxpcCgpICsNCiAgc2NhbGVfeV9jb250aW51b3VzKCJOdW1iZXIgb2YgUmVzcG9uZGVudHMiLCBsYWJlbHMgPSBzY2FsZXM6OmNvbW1hKSArDQogIHNjYWxlX3hfZGlzY3JldGUoIlJlc3BvbmRlbnQncyBJbmNvbWUiKQ0KYGBgDQoNCiMjIyAyLiBXaGF0IGlzIHRoZSBtb3N0IGNvbW1vbiBgcmVsaWdgIGluIHRoaXMgc3VydmV5PyBXaGF04oCZcyB0aGUgbW9zdCBjb21tb24gYHBhcnR5aWRgPw0KDQpUaGUgbW9zdCBjb21tb24gYHJlbGlnYCBpcyDigJxQcm90ZXN0YW504oCdDQoNCmBgYHtyfQ0KZ3NzX2NhdCAlPiUNCiAgY291bnQocmVsaWcpICU+JQ0KICBhcnJhbmdlKGRlc2MobikpICU+JQ0KICBoZWFkKDEpDQpgYGANCg0KVGhlIG1vc3QgY29tbW9uIHBhcnR5aWQgaXMg4oCcSW5kZXBlbmRlbnTigJ0NCg0KYGBge3J9DQpnc3NfY2F0ICU+JQ0KICBjb3VudChwYXJ0eWlkKSAlPiUNCiAgYXJyYW5nZShkZXNjKG4pKSAlPiUNCiAgaGVhZCgxKQ0KYGBgDQoNCiMjIyAzIC5XaGljaCBgcmVsaWdgIGRvZXMgYGRlbm9tYCAoZGVub21pbmF0aW9uKSBhcHBseSB0bz8gSG93IGNhbiB5b3UgZmluZCBvdXQgd2l0aCBhIHRhYmxlPyBIb3cgY2FuIHlvdSBmaW5kIG91dCB3aXRoIGEgdmlzdWFsaXphdGlvbj8NCg0KYGBge3J9DQpsZXZlbHMoZ3NzX2NhdCRkZW5vbSkNCmBgYA0KDQpGcm9tIHRoZSBjb250ZXh0IGl0IGlzIGNsZWFyIHRoYXQgZGVub20gcmVmZXJzIHRvIOKAnFByb3Rlc3RhbnTigJ0gKGFuZCB1bnN1cnByaXNpbmcgZ2l2ZW4gdGhhdCBpdCBpcyB0aGUgbGFyZ2VzdCBjYXRlZ29yeSBpbiBmcmVxKS4gTGV04oCZcyBmaWx0ZXIgb3V0IHRoZSBub24tcmVzcG9uc2VzLCBubyBhbnN3ZXJzLCBvdGhlcnMsIG5vdC1hcHBsaWNhYmxlLCBvciBubyBkZW5vbWluYXRpb24sIHRvIGxlYXZlIG9ubHkgYW5zd2VycyB0byBkZW5vbWluYXRpb25zLiBBZnRlciBkb2luZyB0aGF0LCB0aGUgb25seSByZW1haW5pbmcgcmVzcG9uc2VzIGFyZSDigJxQcm90ZXN0YW504oCdLg0KDQpgYGB7cn0NCmdzc19jYXQgJT4lDQogIGZpbHRlcighZGVub20gJWluJSBjKA0KICAgICJObyBhbnN3ZXIiLCAiT3RoZXIiLCAiRG9uJ3Qga25vdyIsICJOb3QgYXBwbGljYWJsZSIsDQogICAgIk5vIGRlbm9taW5hdGlvbiINCiAgKSkgJT4lDQogIGNvdW50KHJlbGlnKQ0KYGBgDQoNClRoaXMgaXMgYWxzbyBjbGVhciBpbiBhIHNjYXR0ZXIgcGxvdCBvZiBgcmVsaWdgIHZzLiBgZGVub21gIHdoZXJlIHRoZSBwb2ludHMgYXJlIHByb3BvcnRpb25hbCB0byB0aGUgc2l6ZSBvZiB0aGUgbnVtYmVyIG9mIGFuc3dlcnMgKHNpbmNlIG90aGVyd2lzZSB0aGVyZSB3b3VsZCBiZSBvdmVycGxvdHRpbmcpLg0KDQpgYGB7cn0NCmdzc19jYXQgJT4lDQogIGNvdW50KHJlbGlnLCBkZW5vbSkgJT4lDQogIGdncGxvdChhZXMoeCA9IHJlbGlnLCB5ID0gZGVub20sIHNpemUgPSBuKSkgKw0KICBnZW9tX3BvaW50KCkgKw0KICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDkwKSkNCmBgYA0KDQo=