Load packages

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  2.0.0     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.3.1     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Read in vaccinations data

vaccines <-read.csv("Vaccination survey .csv")

vaccines

Glimpse data to look at variables.

glimpse(vaccines)
## Observations: 304
## Variables: 16
## $ Timestamp                                                                                                   <fct> …
## $ Gender                                                                                                      <fct> …
## $ Age                                                                                                         <fct> …
## $ Highest.Education.Level                                                                                     <fct> …
## $ Major.in.College                                                                                            <fct> …
## $ Children.                                                                                                   <fct> …
## $ What.state.are.you.from.                                                                                    <fct> …
## $ Are.you.religious.                                                                                          <int> …
## $ I.believe.children.should.be.vaccinated.                                                                    <int> …
## $ I.trust.the.information.I.receive.about.shots.                                                              <int> …
## $ I.believe.that.there.could.be.a.link.between.the.MMR.vaccination.and.autism.                                <int> …
## $ I.worry.about.possible.side.effects.of.vaccinations.                                                        <int> …
## $ I.believe.the.media.exaggerates.reports.about.disease.outbreak.and.vaccinations.                            <int> …
## $ If.I.were.to.have.a.child.today..I.would.want.them.to.have.all.of.the.recommended.vaccinations.             <int> …
## $ Healthy.children.should.be.required.to.be.vaccinated.to.attend.school.because.of.potential.risks.to.others. <int> …
## $ If.you.wish.to.expand.on.any.of.your.answers.above..do.so.here.                                             <fct> …

Change Gender to a factor for further analysis.

vaccines <- vaccines %>% 
  mutate(Gender = as.factor(Gender))


vaccines %>% 
  count(Gender)

As one can see from the table over, the Gender data is very messy and unorganized. We are going to clean that up to better analyze this factor.

Change information into categorical values for further analysis

vaccines <- vaccines %>% 
  mutate(Gender = fct_recode(Gender, "Male"  = "1", 
                                  "Female"= "2"))
## Warning: Unknown levels in `f`: 1, 2
vaccines %>% 
  count(Gender)

We want to clean this Gender category up a bit and codense factors.

vaccines <- vaccines %>% 
  mutate(Gender_simple = fct_collapse(Gender,
                                             Male = c("M", "male", "Male", "mucho", "Republican ( Male)") ,
                                             Female = c("f", "F", "Femail", "Femal", "female ", "Female ", "Female", "FEMALE","Girl", "Girl", "Feme", "Gemale", "replace sex with gender! Female")))
## Warning: Unknown levels in `f`: FEMALE, Girl, Girl
vaccines %>% 
  count(Gender_simple)

Now let’s mutate the Survey question so we can analyze the responses surrounding it. “I.believe.the.media.exaggerates.reports.about.disease.outbreak.and.vaccinations” and turn it into a factor.

vaccines <- vaccines %>%
  mutate(I.believe.the.media.exaggerates.reports.about.disease.outbreak.and.vaccinations. = as.factor(I.believe.the.media.exaggerates.reports.about.disease.outbreak.and.vaccinations.))

vaccines %>% 
  count()

This gives the exact number of repsonses for this column.

Next we will count the number of responses based on the number choice chosen within the survey. These responses were numerical utiliing 1,2,3,4,5, which I translated to “Strongly Agree”, “Agree”, “Undecided/Not Sure”, Disagree" and finally, “Strongly Disagree” respectively.

vaccines <- vaccines %>%
  mutate(media_concerns = fct_recode(I.believe.the.media.exaggerates.reports.about.disease.outbreak.and.vaccinations.,
                              "Strongly Agree" = "1",
                              "Agree" = "2",
                              "Undecided/Not Sure" = "3",
                              "Disagree" = "4",
                              "Strongly Disagree" = "5",
                              NULL = "8",
                              NULL = "9"))

vaccines %>% 
  count(media_concerns)

This table displays the different responses to the media_concerns.

You can also illustrate these responses in a bar graph.

vaccines %>% 
  drop_na(media_concerns) %>% 
  ggplot(aes(x = media_concerns)) +
  geom_bar() +
  coord_flip() +
  theme_minimal() +
  labs(y = "Number of people", x = "Resposne", title = "Response to the Media and Their Exaggeration on Disease Outbreaks")

We can also take a closer look at Gender in response to Media_concerns. The table below shows the particpants’ responses based on their gender.

vaccines %>% 
  count(Gender_simple, media_concerns)

We can also mutate and change age into a factor for further analysis.

vaccines <- vaccines %>%
  mutate(Age = as.factor(Age))

vaccines %>% 
  count(Age)

Let’s also mutate and factor in one more surey questions.

vaccines <- vaccines %>%
  mutate(I.worry.about.possible.side.effects.of.vaccinations. = as.factor(I.worry.about.possible.side.effects.of.vaccinations.))

vaccines %>% 
  count()

This question will be called side_effects and will be coded similar to the other question.

vaccines <- vaccines %>%
  mutate(side_effects = fct_recode(I.worry.about.possible.side.effects.of.vaccinations.,
                              "Strongly Agree" = "1",
                              "Agree" = "2",
                              "Undecided/Not Sure" = "3",
                              "Disagree" = "4",
                              "Strongly Disagree" = "5",
                              NULL = "8",
                              NULL = "9"))

vaccines %>% 
  count(side_effects)

First let’s graph the responses on the media_concerns question using gender As you can see, we have a variety of gender descriptions which I previously condensed. It appears that the highest number of people chose 5 meaning that they “strongly disagree” with this question.

vaccines %>% 
  drop_na(media_concerns, Gender_simple) %>% 
  ggplot(aes(x = media_concerns, fill = Gender_simple)) +
  geom_bar()+
  coord_flip()+
  theme_minimal()+
  labs(title = "Response to Media Concerns in Relation to Gender" , y = "Number of People", x = "Response")

As one can see, most females responded with “strongly disagree” the media exaggerating outbreaks. We don’t yet know if this response affects if parents choose to get their child vaccinated or not.

The next graph represents participants repsonses to media concerns question and how that relates to whether or not they have children.

vaccines %>% 
  drop_na(media_concerns) %>% 
  ggplot(aes(x = media_concerns, fill = Children.)) +
  geom_bar() +
  coord_flip() +
  theme_minimal() +
  labs(y = "Number of people", x = "Resposne ", title = "Responses to Survey Question Based on if  Particpants have Children")

Also, let’s condense and compare Agree VS Disagree for Media concerns. We collapsed our data into two categories: Agree_concerns and Disagree_concerns.

vaccines <- vaccines %>% 
  mutate(media_concerns_simple = fct_collapse(media_concerns,
                                             Agree_concerns = c("Strongly Agree", "Agree") ,
                                             Disagree_concerns = c("Disagree", "Strongly Disagree")))

vaccines %>% 
  count(media_concerns_simple)

Next we created a table and found tat more people chose “Disagree” in response to this statement.

vaccines %>% 
  drop_na(media_concerns_simple, Children.) %>% 
  ggplot(aes(x = media_concerns_simple, fill = Children.)) +
  geom_bar() +
  coord_flip() +
  theme_minimal() +
  labs(y = "Number of people", x = "Resposne ", title = "Simplified Responses Based on Participants Having Chidren")

As one can see from the table above, both participants with or without children disagreed with the statement that the media exaggerates reports about disease outbreak and vaccinations.

Lastly, let’s compare side effect responses to whether or not the particpants have children.

vaccines %>% 
  drop_na(side_effects, Children.) %>% 
  ggplot(aes(x = side_effects, fill = Children.)) +
  geom_bar() +
  coord_flip() +
  theme_minimal() +
  labs(y = "Number of people", x = "Resposne ", title = "Side Effect Responses Based on Participants Having Chidren")

More participants with kids responded that they do not worry about potential side effects which is very interesting since you would think it might be the opposite. It also appears that those with children were similar responses to those who don’t have children.