library(tidyverse)
library(openintro)
data(acs12)
options(scipen=10000)

In this quiz we will be using the acs12 data frame. The data comes from the American Community Survey which is conducted by the US Census to gather information on American households. The data we have is from 2012. After running the code above, if you write ?acs12 in your Console, you would find the descriptions for each variable. Note that by making visualizations we are exploring what we see, we will not jump to conclusions yet!

Throughout the homework, make sure you use a reasonable font size and label axes. You are welcome to use colors and themes whenever you would like. Adhere to tidyverse style guide.

Question 1

How many observations and variables are there? What does each observation represent? Answer with inline code when needed.

##A data frame with 2000 observations on the following 13 variables.

##Each observation represents socioeconomical and racial background and how each subgroup provides differing lifestyles.

Question 2

For each variable, classify it as categorical or numerical and determine the type: factor, logical, integer, double, or character. Read descriptions for each variable using ?acs12 if needed.

##categorical - employment (logical), race (character), gender (character), citizen (logical), lang (character), married (logical), edu (character), disability (logical), birth_qtr (character)

##numerical - income (integer), hrs_work (double), age (integer), time_to_work (integer)"

Question 3

To get an idea about who has taken this survey, make a dodged bar plot that shows gender of participants for each race group.

library(tidyverse)

ggplot(acs12) + geom_bar(aes (x = race), fill = 'palegoldenrod', position = position_dodge())

Question 4

Make a plot that can help the viewer compare income based on race. In addition to the plot, write what you see in this plot.

library(tidyverse)

ggplot(acs12) +
geom_bar(aes(x = race,y = income),
                 stat = "identity")
## Warning: Removed 377 rows containing missing values (position_stack).

## In this plot, it is conveyed that income has a relationship with race. Those in the white race category are conveyed to make the highest income in 2012: 30000000, followed with asian, then black, and lastly, other.

Question 5

Make a plot to examine the relationship between hours worked per week and income. What can you tell from this plot?

ggplot(data = acs12) +
       geom_point(aes(x = hrs_work, y = income), color = "thistle2")
## Warning: Removed 1041 rows containing missing values (geom_point).

Question 6

Remake this plot. I almost did something very similar in the lecture video. This is a little bit more challenging. Challenge yourself. If you cannot figure it out, ask us during office hours.

ggplot(data = acs12,
       aes(x = hrs_work,
           y = income)) +
 geom_point(size = 1, color = "coral") +
  labs(y = "Income") +
  facet_grid(race~gender)
## Warning: Removed 1041 rows containing missing values (geom_point).

Question 7

Write a question that you are interested in answering. Answer with a visual. Note that we are only exploring data at this stage. In this homework, comment on what you see in the data, do not generalize to the population.

## Convey the relationship between citizenship and income through a graph.

ggplot(acs12) +
        geom_bar(aes(x=citizen, y = income), color = "royalblue1", position = "stack", stat = "identity")
## Warning: Removed 377 rows containing missing values (position_stack).