Problem 1

Create a vector of five numbers of your choice between 0 and 10, save that vector to an object, and use the sum() function to calculate the sum of the numbers.

vector1 <- c(1, 2, 3, 4, 5)
sum(vector1)

## [1] 15

Problem 2

Create a data frame that includes two columns. One column must have the numbers 1 through 5, and the other column must have the numbers 6 through 10. The first column must be named “alpha” and the second column must be named “beta”. Name the object “my_dat”. Display the data.

Put your code and solution here:

my_dat <- c()
my_dat$alpha <- c(1, 2, 3, 4, 5)
my_dat$beta <- c(6, 7, 8, 9, 10)
my_dat <- as.data.frame(my_dat)

my_dat

##   alpha beta
## 1     1    6
## 2     2    7
## 3     3    8
## 4     4    9
## 5     5   10

Problem 3

Using the data frame created in Problem 2, use the summary() command a create a five-number summary for the column named “beta”.

Put your code and solution here:

summary(my_dat$beta)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       6       7       8       8       9      10

Problem 4

There is code for importing the example survey data that will run automatically in the setup chunk for this report (Line 13). Using that data, make a boxplot of the Family Income column using the Base R function (not a figure drawn using qplot). Include your name in the title for the plot. Your name should be in the title. Relabel that x-axis as “Family Income”.

Hint: consult the codebook to identify the correct column name for the family income question.

Put your code and solution here:

boxplot(dat$faminc,
        data = dat,
        frame = TRUE,
        main = "Maca's Boxplot for Family Income",
        xlab = "Family Income")

Problem 5

Using the survey data, filter to subset the survey data so you only have male survey respondents who live in the northwest or midwest of the United States, are married, and identify as being interested in the news most of the time.

Use the str() function to provide information about the resulting dataset.

Put your code and solution here:

dat_filtered <- dat %>%
  filter(region == 1 | region == 2) %>%
  filter(gender == 1 & marstat == 1 & newsint == 1)

str(dat_filtered)

## 'data.frame':    75 obs. of  25 variables:
##  $ caseid      : int  420208067 412948037 411855595 414480371 416806180 414729651 412021973 412348831 412929385 412047867 ...
##  $ region      : int  2 1 1 1 2 2 1 2 1 2 ...
##  $ gender      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ educ        : int  3 5 6 5 3 2 6 5 5 5 ...
##  $ edloan      : int  2 1 2 1 1 2 2 2 2 2 ...
##  $ race        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ hispanic    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ employ      : int  1 1 1 1 1 5 5 1 1 1 ...
##  $ marstat     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pid7        : int  4 1 4 4 6 2 1 1 1 7 ...
##  $ ideo5       : int  5 1 3 3 3 1 2 3 1 5 ...
##  $ pew_religimp: int  4 4 4 1 2 4 4 2 4 1 ...
##  $ newsint     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ faminc_new  : int  10 11 10 9 10 2 13 8 7 11 ...
##  $ union       : int  1 2 3 3 1 3 2 2 2 3 ...
##  $ investor    : int  2 1 1 1 1 2 1 2 1 2 ...
##  $ CC18_308a   : int  4 4 4 1 2 3 4 4 4 1 ...
##  $ CC18_310a   : int  2 3 3 3 2 2 3 2 3 2 ...
##  $ CC18_310b   : int  2 3 3 3 2 3 3 5 3 3 ...
##  $ CC18_310c   : int  3 3 3 3 3 3 2 5 3 3 ...
##  $ CC18_310d   : int  2 5 3 3 3 2 2 2 3 3 ...
##  $ CC18_325a   : int  1 2 1 1 1 2 2 2 2 1 ...
##  $ CC18_325b   : int  2 2 1 1 1 1 2 1 2 2 ...
##  $ CC18_325c   : int  1 2 2 1 1 2 2 1 2 2 ...
##  $ CC18_325d   : int  1 1 1 1 1 1 1 1 2 1 ...

Problem 6

Filter the data the same as in Problem 5. Use a R function to create a frequency table for the responses for the question asking whether these survey respondents are invested in the stock market.

Put your code and solution here:

dat_stock <- dat %>%
  filter(region == 1 | region == 2) %>%
  filter(gender == 1 & marstat == 1 & newsint == 1) %>%
  group_by(investor) %>%
  summarise(sum(investor))

dat_stock

## # A tibble: 2 × 2
##   investor `sum(investor)`
##      <int>           <int>
## 1        1              57
## 2        2              36

Problem 7

Going back to using all rows in the dataset, create a new column in the data using mutate that is equal to either 0, 1, or 2, to reflect whether the respondent supports increasing the standard deduction from 12,000 to 25,000, supports cutting the corporate income tax rate from 39 to 21 percent, or both (so, support for neither policy equals 0, one of the two policies equals 1, and both policies equals two). Name the column “tax_scale”. Hint: you’ll need to use recode() as well.

Display the first twenty elements of the new column you create.

Put your code and solution here:

dat1 <- dat %>%
  mutate(tax_scale = ifelse(dat$CC18_325d == 1 & dat$CC18_325a == 1,
                            2,
                            ifelse(dat$CC18_325a == 1 | dat$CC18_325d ==1,
                                   1,
                                   0)))


head(dat1$tax_scale, 20)

##  [1] 2 1 1 1 2 2 2 1 0 1 0 2 0 1 0 1 1 1 2 1

Problem 8

Use a frequency table command to show how many 0s, 1s, and 2s are in the column you created in Problem 7.

Put your code and solution here:

tax_freq <- dat1 %>%
  group_by(tax_scale) %>%
  count(tax_scale)

tax_freq

## # A tibble: 3 × 2
## # Groups:   tax_scale [3]
##   tax_scale     n
##       <dbl> <int>
## 1         0   130
## 2         1   408
## 3         2   331

Problem 9

Again using all rows in the original dataset, use summarise and group_by to calculate the average (mean) job of approval for President Trump in each of the four regions listed in the “region” column.

Put your code and solution here:

Trump_Approval <- dat %>%
  group_by(region) %>%
  summarise(mean(CC18_308a)) 
names(Trump_Approval)[2] <- paste("Mean Approval")

Trump_Approval

## # A tibble: 4 × 2
##   region `Mean Approval`
##    <int>           <dbl>
## 1      1            2.77
## 2      2            2.76
## 3      3            2.71
## 4      4            3.03

Problem 10

Again start with all rows in the original dataset, use summarise() to create a summary table for survey respondents who are not investors and who have an annual family income of between $40,000 and $119,999 per year. The table should have the mean, median and standard deviations for the importance of religion column.

Put your code and solution here:

Religion1 <- dat %>% 
  filter(faminc_new == 5|6|7|8|9|10) %>%
  filter(investor == 2) %>%
  summarise(mean(pew_religimp)) 

Religion2 <- dat %>% 
  filter(faminc_new == 5|6|7|8|9|10) %>%
  filter(investor == 2) %>%
  summarise(median(dat$pew_religimp))
  
Religion3 <- dat %>% 
  filter(faminc_new == 5|6|7|8|9|10) %>%
  filter(investor == 2) %>%
  summarise(sd(dat$pew_religimp))

Religion <- Religion1 %>%
  cross_join(Religion2) %>%
  cross_join(Religion3)

names(Religion)[1] <- paste("Mean")
names(Religion)[2] <- paste("Median")
names(Religion)[3] <- paste("Std. Deviation")

Religion

##       Mean Median Std. Deviation
## 1 2.292111      2       1.171772

Problem 11

Use kable() and the the summarise() function to create a table with one row and three columns that provides the mean, median, and standard deviation for the column named faminc_new in the survey data.

Put your code and solution here:

Income1 <- dat %>%
  summarise(mean(faminc_new))
Income2 <- dat %>%
  summarise(median(faminc_new))
Income3 <- dat %>%
  summarise(sd(faminc_new))

Income <- Income1 %>%
  cross_join(Income2) %>%
  cross_join(Income3)

names(Income)[1] <- paste("Mean")
names(Income)[2] <- paste("Median")
names(Income)[3] <- paste("Std. Deviation")

Income

##       Mean Median Std. Deviation
## 1 6.581128      6       3.247035

knitr::kable(Income, 
             "pipe",
             align = "c")

Mean	Median	Std. Deviation
6.581128	6	3.247035

Problem 12

With the survey data, use qplot() to make a histogram of the column named pid7. Change the x-axis label to “Seven Point Party ID” and the y-axis label to “Count”.

Note: you can ignore the “stat_bin()” message that R generates when you draw this. The setup for the code chunk will suppress the message.

Put your code and solution here:

ggplot(dat) +
  geom_histogram(aes(x = pid7)) +
  labs(x = "Seven-Point Party ID",
       y = "Count")

Final Report Exercise

Macarena L. Fernandez Carro

Problem 1

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Problem 7

Problem 8

Problem 9

Problem 10

Problem 11

Problem 12