STEP 1 (Loading the R Packages and the Data)

library(readr)
## Warning: package 'readr' was built under R version 3.6.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
data <- read.csv("F:/Skills Drill 2 COVID Survey Data.csv")

STEP 2 (Labeling the Data and Storing the Data)

new_data <- data%>%
  mutate(Likelihood_Infected=ifelse(Likelihood_Infected==0, "Not likely at all",
                             ifelse(Likelihood_Infected==1,"Not too likely",
                             ifelse(Likelihood_Infected==2, "Somewhat likely",
                             ifelse(Likelihood_Infected==3,"Very likely",
                             ifelse(Likelihood_Infected==4,"I have already contracted the virus",NA))))))
new_data <- new_data%>%
  mutate(Facemask_Wear = ifelse(Facemask_Wear==0, "No",
                         ifelse(Facemask_Wear==1,"Yes", NA)))

STEP 2 (Extra Credit)

new_data <- new_data%>%
  mutate(Likelihood_Infected = factor(Likelihood_Infected, levels=c("I have already contracted the virus", "Very likely", "Somewhat likely", "Not too likely", "Not likely at all", NA)))

STEP 3 (Comparing the Perceived Likelihood of Contracting the Virus with Facemask Wearing Behavior)

prop.table(table(new_data$Likelihood_Infected, new_data$Facemask_Wear),1)
##                                      
##                                              No       Yes
##   I have already contracted the virus 0.2142857 0.7857143
##   Very likely                         0.1851852 0.8148148
##   Somewhat likely                     0.2148148 0.7851852
##   Not too likely                      0.2848837 0.7151163
##   Not likely at all                   0.2812500 0.7187500
Based on the table generated above, about 81% think that they are likely to contract the coronavirus wear masks.

STEP 3 (Visualization)

new_data%>%
  group_by(Likelihood_Infected, Facemask_Wear)%>%
  summarize(n=n())%>%
  mutate(percent =n/sum(n))%>%
  ggplot()+geom_col(aes(x=Likelihood_Infected, y=percent, fill=Facemask_Wear))

According to the stacked bar chart, those who said they wear face masks are more likely to be concerned about contracting the coronavirus.

STEP 4 (Comparing the Perceived Likelihood of the Contracting the Virus with the Average Household Size)

new_data%>%
  group_by(Likelihood_Infected)%>%
  summarize(avg_householdsize = mean(Household_Size, na.rm=TRUE))
## # A tibble: 5 x 2
##   Likelihood_Infected                 avg_householdsize
##   <fct>                                           <dbl>
## 1 I have already contracted the virus              4.36
## 2 Very likely                                      4.04
## 3 Somewhat likely                                  3.91
## 4 Not too likely                                   3.66
## 5 Not likely at all                                3.24
Based on the table and the averages computed above, people who live in higher numbers of household tend to believe that they are likely to contract the virus or they have already contracted the virus.

STEP 5 (Comparison of the Perceived Likelihood of Contracting COVID-19 by Household Size Using Histogram)

new_data%>%
  ggplot()+geom_histogram(aes(x=Household_Size))+
  facet_wrap(~Likelihood_Infected)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 3 rows containing non-finite values (stat_bin).

Based on the histograms above, people who live in smaller numbers of household would believe they are either somewhat likely or not too likely to contract the coronavirus.