Problem 1

Create a vector of five numbers of your choice between 0 and 10, save that vector to an object, and use the sum() function to calculate the sum of the numbers.

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Create a vector of five numbers
my_vector <- c(2, 5, 7, 1, 3)

# Calculate the sum of the numbers
sum(my_vector)

## [1] 18

Problem 2

Create a data frame that includes two columns. One column must have the numbers 1 through 5, and the other column must have the numbers 6 through 10. The first column must be named “alpha” and the second column must be named “beta”. Name the object “my_dat”. Display the data.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Create a data frame
my_dat <- data.frame(alpha = 1:5, beta = 6:10)

# Display the data
my_dat

##   alpha beta
## 1     1    6
## 2     2    7
## 3     3    8
## 4     4    9
## 5     5   10

Problem 3

Using the data frame created in Problem 2, use the summary() command a create a five-number summary for the column named “beta”.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Create a summary for the column "beta"
summary(my_dat$beta)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       6       7       8       8       9      10

Problem 4

There is code for importing the example survey data that will run automatically in the setup chunk for this report (Line 13). Using that data, make a boxplot of the Family Income column using the Base R function (not a figure drawn using qplot). Include your name in the title for the plot. Your name should be in the title. Relabel that x-axis as “Family Income”.

Hint: consult the codebook to identify the correct column name for the family income question.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.

# Boxplot of Family Income
boxplot(dat$faminc_new, main = "Family Income Boxplot by Carlos Galvsi", xlab = "Family Income")

Problem 5

Using the survey data, filter to subset the survey data so you only have male survey respondents who live in the northwest or midwest of the United States, are married, and identify as being interested in the news most of the time.

Use the str() function to provide information about the resulting dataset.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Subset the survey data
filtered_data <- dat %>%
  filter(gender == 1 & (region == 1 | region == 2) & marstat == 1 & newsint == 1)

# Display the structure of the resulting dataset
str(filtered_data)

## 'data.frame':    75 obs. of  25 variables:
##  $ caseid      : int  420208067 412948037 411855595 414480371 416806180 414729651 412021973 412348831 412929385 412047867 ...
##  $ region      : int  2 1 1 1 2 2 1 2 1 2 ...
##  $ gender      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ educ        : int  3 5 6 5 3 2 6 5 5 5 ...
##  $ edloan      : int  2 1 2 1 1 2 2 2 2 2 ...
##  $ race        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ hispanic    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ employ      : int  1 1 1 1 1 5 5 1 1 1 ...
##  $ marstat     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pid7        : int  4 1 4 4 6 2 1 1 1 7 ...
##  $ ideo5       : int  5 1 3 3 3 1 2 3 1 5 ...
##  $ pew_religimp: int  4 4 4 1 2 4 4 2 4 1 ...
##  $ newsint     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ faminc_new  : int  10 11 10 9 10 2 13 8 7 11 ...
##  $ union       : int  1 2 3 3 1 3 2 2 2 3 ...
##  $ investor    : int  2 1 1 1 1 2 1 2 1 2 ...
##  $ CC18_308a   : int  4 4 4 1 2 3 4 4 4 1 ...
##  $ CC18_310a   : int  2 3 3 3 2 2 3 2 3 2 ...
##  $ CC18_310b   : int  2 3 3 3 2 3 3 5 3 3 ...
##  $ CC18_310c   : int  3 3 3 3 3 3 2 5 3 3 ...
##  $ CC18_310d   : int  2 5 3 3 3 2 2 2 3 3 ...
##  $ CC18_325a   : int  1 2 1 1 1 2 2 2 2 1 ...
##  $ CC18_325b   : int  2 2 1 1 1 1 2 1 2 2 ...
##  $ CC18_325c   : int  1 2 2 1 1 2 2 1 2 2 ...
##  $ CC18_325d   : int  1 1 1 1 1 1 1 1 2 1 ...

Problem 6

Filter the data the same as in Problem 5. Use a R function to create a frequency table for the responses for the question asking whether these survey respondents are invested in the stock market.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Frequency table for stock market investment
table(filtered_data$investor)

## 
##  1  2 
## 57 18

Problem 7

Going back to using all rows in the dataset, create a new column in the data using mutate that is equal to either 0, 1, or 2, to reflect whether the respondent supports increasing the standard deduction from 12,000 to 25,000, supports cutting the corporate income tax rate from 39 to 21 percent, or both (so, support for neither policy equals 0, one of the two policies equals 1, and both policies equals two). Name the column “tax_scale”. Hint: you’ll need to use recode() as well.

Display the first twenty elements of the new column you create.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Create a new column using mutate and recode
survey_data <- dat %>%
  mutate( CC18_325d = ifelse(is.na(CC18_325d), 0, CC18_325d), CC18_325a = ifelse(is.na(CC18_325a), 0, CC18_325a), tax_scale = recode(CC18_325d + CC18_325a, `0` = 0, `1` = 1, `2` = 2, .default = 0))

# Display the first twenty elements of the new column
head(survey_data$tax_scale, 20)

##  [1] 2 0 0 0 2 2 2 0 0 0 0 2 0 0 0 0 0 0 2 0

Problem 8

Use a frequency table command to show how many 0s, 1s, and 2s are in the column you created in Problem 7.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Frequency table for tax_scale
table(survey_data$tax_scale)

## 
##   0   2 
## 538 331

Problem 9

Again using all rows in the original dataset, use summarise and group_by to calculate the average (mean) job of approval for President Trump in each of the four regions listed in the “region” column.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Group by region and calculate the average job approval
survey_data %>%
  group_by(region) %>%
  summarise(average_approval = mean(CC18_308a, na.rm = TRUE))

## # A tibble: 4 × 2
##   region average_approval
##    <int>            <dbl>
## 1      1             2.77
## 2      2             2.76
## 3      3             2.71
## 4      4             3.03

Problem 10

Again start with all rows in the original dataset, use summarise() to create a summary table for survey respondents who are not investors and who have an annual family income of between $40,000 and $119,999 per year. The table should have the mean, median and standard deviations for the importance of religion column.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.
# Filter and summarise the data
survey_data %>%
  filter(investor == 2 & faminc_new >4 & faminc_new <11) %>%
  summarise(mean_religion = mean(pew_religimp, na.rm = TRUE),
            median_religion = median(pew_religimp, na.rm = TRUE),
            sd_religion = sd(pew_religimp, na.rm = TRUE))

##   mean_religion median_religion sd_religion
## 1      2.325893               2    1.188906

Problem 11

Use kable() and the the summarise() function to create a table with one row and three columns that provides the mean, median, and standard deviation for the column named faminc_new in the survey data.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.

# Create a summary table with kable
summary_table <- survey_data %>%
  summarise(mean_faminc = mean(faminc_new, na.rm = TRUE),
            median_faminc = median(faminc_new, na.rm = TRUE),
            sd_faminc = sd(faminc_new, na.rm = TRUE))

kable(summary_table)

mean_faminc	median_faminc	sd_faminc
6.581128	6	3.247035

Problem 12

With the survey data, use qplot() to make a histogram of the column named pid7. Change the x-axis label to “Seven Point Party ID” and the y-axis label to “Count”.

Note: you can ignore the “stat_bin()” message that R generates when you draw this. The setup for the code chunk will suppress the message.

Put your code and solution here:

#Put your code here, then delete this commented line before submission. Don't modify the setup code for this chunk - you want your code and output to both display.

# Histogram of pid7
qplot(survey_data$pid7, geom = "histogram", xlab = "Seven Point Party ID", ylab = "Count")

## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.