library(tidyverse)
library(corrr)
library(plotly)
library(ggplot2)
library(dplyr)
library(reshape2)
library(gplots)
library(data.table)
library(ggpubr)
library(xtable)
library(corrplot)
library(corrgram)

A survey was conducted examining how different subsets of people view vaccinations for children. The groups were organized based on sex, age, education levels, whether or not they had children, location(state); whether or not they believed in vaccinating; credible data from various sources; links between autism and MMR vaccination; their concern for side-effects in various areas. Survey was conducted via a link that participants responded to which means that they voluntered to take the survey. The were not compensated.

The link is listed below in correspondence with the import of the data.

survey <- read_csv("https://docs.google.com/spreadsheets/d/138AJq8uaPl0XlF3Bjj3q0g_Pb3RbmBQLFazj4hhJalw/export?format=csv")
Parsed with column specification:
cols(
  Timestamp = col_character(),
  `What is your sex?` = col_character(),
  `What is your age?` = col_character(),
  `What is your highest education completed?` = col_character(),
  `If you went to college, what was your major?` = col_character(),
  `Do you have children?` = col_character(),
  `What state are you from?` = col_character(),
  `Are you religious?` = col_double(),
  `I believe children should be vaccinated.` = col_double(),
  `I trust the information I receive about shots.` = col_double(),
  `I believe that there could be a link between the MMR vaccination and autism.` = col_double(),
  `I worry about possible side effects of vaccinations.` = col_double(),
  `I believe the media exaggerates reports about disease outbreak and vaccinations.` = col_double(),
  `If I were to have a child today, I would want them to have all of the recommended vaccinations.` = col_double(),
  `Healthy children should be required to be vaccinated to attend school because of potential risks to others.` = col_double(),
  `If you wish to expand on any of your answers above, do so here:` = col_character()
)

This glimpse provides a overview of the data in its raw form. As participants were given the option of filling in data it is not formatted in a way that allows an examination of individuals as they were meant to be grouped.

glimpse(survey)
Observations: 305
Variables: 16
$ Timestamp                                                                                                     <chr> …
$ `What is your sex?`                                                                                           <chr> …
$ `What is your age?`                                                                                           <chr> …
$ `What is your highest education completed?`                                                                   <chr> …
$ `If you went to college, what was your major?`                                                                <chr> …
$ `Do you have children?`                                                                                       <chr> …
$ `What state are you from?`                                                                                    <chr> …
$ `Are you religious?`                                                                                          <dbl> …
$ `I believe children should be vaccinated.`                                                                    <dbl> …
$ `I trust the information I receive about shots.`                                                              <dbl> …
$ `I believe that there could be a link between the MMR vaccination and autism.`                                <dbl> …
$ `I worry about possible side effects of vaccinations.`                                                        <dbl> …
$ `I believe the media exaggerates reports about disease outbreak and vaccinations.`                            <dbl> …
$ `If I were to have a child today, I would want them to have all of the recommended vaccinations.`             <dbl> …
$ `Healthy children should be required to be vaccinated to attend school because of potential risks to others.` <dbl> …
$ `If you wish to expand on any of your answers above, do so here:`                                             <chr> …

Category response questions were simplified to better analyze the data and code.

survey <- survey %>%
  rename(Sex = `What is your sex?`,
         Age = `What is your age?`,
         CollegeEducation = `What is your highest education completed?`,
         CollegeMajor = `If you went to college, what was your major?`,
         Children = `Do you have children?`,
         HomeState = `What state are you from?`,
         Religious = `Are you religious?`,
         VaccinationBeliefs = `I believe children should be vaccinated.`,
         TrustIt = `I trust the information I receive about shots.`,
         AutismLink = `I believe that there could be a link between the MMR vaccination and autism.`,
         SideEffectsWorry = `I worry about possible side effects of vaccinations.`,
         Exaggeration = `I believe the media exaggerates reports about disease outbreak and vaccinations.`,
         WantVaccinations = `If I were to have a child today, I would want them to have all of the recommended vaccinations.`,
         RequiredVaccinations = `Healthy children should be required to be vaccinated to attend school because of potential risks to others.`,
         Expansion = `If you wish to expand on any of your answers above, do so here:`)

A review of data in altered form.

glimpse(survey)
Observations: 305
Variables: 16
$ Timestamp            <chr> "2/19/2019 19:39:42", "2/19/2019 20…
$ Sex                  <chr> "female", "Female", "Female", "Fema…
$ Age                  <chr> "21", "26", "40", "41", "31", "55",…
$ CollegeEducation     <chr> "Some college", "Some college", "So…
$ CollegeMajor         <chr> "criminal justice", "Psychology and…
$ Children             <chr> "No", "Yes", "Yes", "Yes", "No", "N…
$ HomeState            <chr> "montana", "Missouri", "Montana", "…
$ Religious            <dbl> 1, 4, 4, 2, 3, 3, 3, 2, 1, 2, 4, 4,…
$ VaccinationBeliefs   <dbl> 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1,…
$ TrustIt              <dbl> 2, 1, 4, 2, 2, 1, 2, 1, 1, 3, 2, 1,…
$ AutismLink           <dbl> 5, 5, 5, 3, 3, 3, 5, 5, 5, 3, 5, 5,…
$ SideEffectsWorry     <dbl> 5, 5, 2, 4, 3, 3, 3, 5, 4, 3, 4, 4,…
$ Exaggeration         <dbl> 1, 3, 2, 4, 4, 3, 4, 5, 1, 3, 3, 1,…
$ WantVaccinations     <dbl> 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ RequiredVaccinations <dbl> 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ Expansion            <chr> NA, NA, NA, NA, "I would do my rese…

Count of sex. Note the variations that exist.

survey %>%
  count(Sex)

As a way to correct and properly categorize the data, categories were merged into their appropriate designation. For example “F” or Female for female was

survey <- survey %>%
  mutate(Sex = str_to_lower(Sex))
survey$Sex <- recode(survey$Sex, "m" = "male",
                                 "f" = "female", 
                                 "F" = "female", 
                                 "femail" = "female", 
                                 "femal" = "female", 
                                 "feme" = "female",
                                 "gemale" = "female",
                                 "girl" = "female",
                                 "replace sex with gender! female" = "female",
                                 "republican ( male)" = "male",
                                 "mucho" = "male",
                                 "apache attack helicopter" = "other",
                                 "californian" = "other",
                                 "not enough" = "other",
                                 "snap-on tool box" = "other",
                                 "the rough, passionate kind" = "other")

Count of the different sex categories

survey %>%
  count(Sex)

Table reflecting the data on each gender and their views on vaccination

tbl = table(survey$Sex, survey$VaccinationBeliefs) 
tbl
        
           1   2   3   4   5
  female 192  21  12   5  18
  male    36   3   6   1   3
  other    4   1   0   0   0
glimpse(survey)
Observations: 305
Variables: 16
$ Timestamp            <chr> "2/19/2019 19:39:42", "2/19/2019 20…
$ Sex                  <chr> "female", "female", "female", "fema…
$ Age                  <chr> "21", "26", "40", "41", "31", "55",…
$ CollegeEducation     <chr> "Some college", "Some college", "So…
$ CollegeMajor         <chr> "criminal justice", "Psychology and…
$ Children             <chr> "No", "Yes", "Yes", "Yes", "No", "N…
$ HomeState            <chr> "montana", "Missouri", "Montana", "…
$ Religious            <dbl> 1, 4, 4, 2, 3, 3, 3, 2, 1, 2, 4, 4,…
$ VaccinationBeliefs   <dbl> 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1,…
$ TrustIt              <dbl> 2, 1, 4, 2, 2, 1, 2, 1, 1, 3, 2, 1,…
$ AutismLink           <dbl> 5, 5, 5, 3, 3, 3, 5, 5, 5, 3, 5, 5,…
$ SideEffectsWorry     <dbl> 5, 5, 2, 4, 3, 3, 3, 5, 4, 3, 4, 4,…
$ Exaggeration         <dbl> 1, 3, 2, 4, 4, 3, 4, 5, 1, 3, 3, 1,…
$ WantVaccinations     <dbl> 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ RequiredVaccinations <dbl> 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ Expansion            <chr> NA, NA, NA, NA, "I would do my rese…

x-squared testing to determine significance.

chisq.test(tbl) 
Chi-squared approximation may be incorrect

    Pearson's Chi-squared test

data:  tbl
X-squared = 5.8045, df = 8, p-value = 0.6691

In order to be statistically significant the p-value must not exceed .05. In this scenario the sex categories in comparison to their view points do not have significant differences as the p-value was 0.6691. Sex is not a contributing factor to the way individuals view vaccinations.

The following is an analysis of age versus beliefs on vaccination:

survey<-survey%>%
  mutate(Age = as.numeric(Age)) %>%
  mutate(Children = as_factor(Children))
NAs introduced by coercion

Mean vaccination agreement of group based on yes or no answer.

survey %>%
  drop_na(Children, VaccinationBeliefs) %>%
  group_by(Children) %>%
  summarize(VaccinationBeliefs = mean(VaccinationBeliefs))

A breakdown of each of the age categories.

survey %>%
  drop_na(Age, RequiredVaccinations) %>%
  group_by(Age) %>%
  summarize(RequiredVaccinations = mean(RequiredVaccinations))

X squared ananlysis if age is determining factor in a person’s beliefs on vaccinating children.

tbl = table(survey$Age, survey$VaccinationBeliefs) 
chisq.test(tbl)
Chi-squared approximation may be incorrect

    Pearson's Chi-squared test

data:  tbl
X-squared = 41.257, df = 16, p-value = 0.0005088

P-value is less than .05 therefor age is a contributing factor to their beliefs on vaccinations.

Different age groups are established.

setDT(survey)
survey[Age <20, Age := "under 20"]
survey[Age >19 & Age <30, Age := "20-29"]
survey[Age >29 & Age <40, Age := "30-39"]
survey[Age >39 & Age <50, Age := "40-49"]
survey[Age >49 & Age <60, Age := "50-59"]
survey[Age >59, Age := "above 60"]

survey %>%
  drop_na(Age, VaccinationBeliefs) %>%
  count(Age, VaccinationBeliefs)

Visual Representation of age groups that believe in childhood vaccinations.

survey %>% 
  drop_na(Age, VaccinationBeliefs) %>% 
  ggplot(aes(x = VaccinationBeliefs, fill = Age)) +
  scale_fill_viridis_d() +
  coord_flip() +
  geom_bar(position = "fill")+
  labs(title = "Difference in beliefs based off of different age categories")

5 denotes Strongly agree where as 1 denotes disagreement.

Individuals in the 20-29 and 50-59 age range tend to agree more with vaccinations.

An analysis on whether or not having children makes a difference in beliefs on vaccinating.

survey %>%
  drop_na(Children, VaccinationBeliefs) %>%
  group_by(Children) %>%
  summarize(VaccinationBeliefs = mean(VaccinationBeliefs)) %>%
  ggplot(aes(x = Children, y = VaccinationBeliefs)) +
  geom_col()

Individuals with children arwe more likely, although not by a large measure, to agree with vaccinations.

t.test(survey$VaccinationBeliefs~survey$Children)

    Welch Two Sample t-test

data:  survey$VaccinationBeliefs by survey$Children
t = -2.7542, df = 237.19, p-value = 0.00634
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6365902 -0.1056679
sample estimates:
 mean in group No mean in group Yes 
         1.366460          1.737589 

Contrary to the above graph: T-test shows that participants who have children are less likely to agree with vaccinating children based off of their t-score in comparison to those who do not. Alpha .05

Analysis of Vaccination beliefs and their effects on autism spectrum disorder

cor.test(survey$VaccinationBeliefs, survey$AutismLink, method = "pearson")

    Pearson's product-moment correlation

data:  survey$VaccinationBeliefs and survey$AutismLink
t = -13.857, df = 300, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6889943 -0.5506707
sample estimates:
      cor 
-0.624709 

As there is a negative correlation between vaccination beliefs and the participants linking the disorder with the vaccination indicates thats individuals are less likely to vaccinate their children when they believe that it could lead to autism spectrum disorder.

ggscatter(survey, x = "VaccinationBeliefs", y = "AutismLink",
          color = "black",shape = 1, size = 1,
          add = "reg.line",
          add.params = list(color = "blue", fill = "light blue"),
          conf.int = TRUE,
          cor.coef = TRUE, 
          cor.method = "pearson",
          xlab = "Believe in childhood vaccinations", 
          ylab = "MMR vaccination is linked to Autism",
          caption="Note:1=Strongly agree; 5=Strongly disagree")

An analysis of whether or not being religious has an impact on agreement with vaccinations

cor.test(survey$VaccinationBeliefs, survey$Religious,  method = "pearson")

    Pearson's product-moment correlation

data:  survey$VaccinationBeliefs and survey$Religious
t = 0.47608, df = 302, p-value = 0.6344
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.0853704  0.1394476
sample estimates:
       cor 
0.02738489 

This sample indicates that religion has no impact on their beliefs in vaccinating.

Data seperated by region/ State

survey$HomeState <- recode(survey$HomeState, "montana" = "Montana",
                                 "Missouri" = "Missouri", 
                                 "406" = "Montana", 
                                 "Ca" = "California", 
                                 "CA" = "California", 
                                 "Calif" = "California",
                                 "California" = "California",
                                 "Co" = "Colorado",
                                 "Co Down" = "Colorado",
                                 "Colrado" = "Colorado",
                                 "Hard to say. MT I guess" = "Montana",
                                 "Grew up in Illinois" = "Illinois",
                                 "From Bulgaria actually" = "International",
                                 "IL" = "Illinois",
                                 "illinois/now living in california." = "Illinois",
                                 "International Student" = "International",
                                 "Mont9" = "Montana",
                                 "Va" = "Virginia",
                                 "VA" = "Virginia",
                                 "wa" = "Washington",
                                 "Wa" = "Washington",
                                 "WA" = "Washington",
                                 "Wa." = "WAshington",
                                 "TX" = "Texas",
                                 "WAshington" = "Washington",
                                 "province of Ontario, in Canada" = "International",
                                 "OR" = "Oregon",
                                 "PA" = "Pennsylvania",
                                 "ID" = "Idaho",
                                 "Md" = "Maryland",
                                 "MN" = "Minnesota",
                                 "MO" = "Missouri",
                                 "MONTANA" = "Montana",
                                 "mt" = "Montana",
                                 "Mt" = "Montana", 
                                 "MT" = "Montana",
                                 "N.Y." = "New York",
                                 "Omaha" = "Nebraska",
                                 "SC" = "South Carolina",
                                 "SD" = "South Dakota",
                                 "Switzerland" = "International",
                                 "Not" = "International",
                                 "WAshington" = "Washington")

Counts of location table.

survey %>%
  count(HomeState) %>%
  drop_na()

Education as an impacting factor on beliefs on vaccinations.

survey %>% 
  drop_na(CollegeEducation, VaccinationBeliefs) %>% 
  count(CollegeEducation, VaccinationBeliefs) 

Seperating group by education.

survey%>%
count(VaccinationBeliefs, CollegeEducation) %>%
group_by(CollegeEducation) %>%
mutate(prop = n / sum(n)) %>%
spread(key = VaccinationBeliefs, value = prop)

Graph of level of education and the beliefs on vaccination.

survey %>% 
  drop_na(CollegeEducation, VaccinationBeliefs) %>% 
  ggplot(aes(x = VaccinationBeliefs, fill = CollegeEducation)) +
  scale_fill_viridis_d() +
  coord_flip() +
  geom_bar(position = "fill")+
  labs(title = "Vaccination beliefs based off of level of education")+
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

Some college make up the majority of the belief scale categories. It appears that associate’s degrees or technical degrees tend to be most likely to vaccinate.

Table of level of education and vaccination beliefs.

tbl1 = table(survey$CollegeEducation, survey$VaccinationBeliefs) 
tbl1
                                 
                                   1  2  3  4  5
  Associate's or technical degree 22  1  2  2  3
  Bachelor's degree               45  5  1  0  5
  Graduate degree                 32  3  0  0  3
  High school degree              30  5  4  1  5
  No high school degree            4  0  1  0  0
  Some college                    81 10 10  3  5
  Some graduate school            19  1  0  0  0
chisq.test(tbl1) 
Chi-squared approximation may be incorrect

    Pearson's Chi-squared test

data:  tbl1
X-squared = 23.989, df = 24, p-value = 0.4622

Education has no impact on vaccination beliefs as the p-value was greater than 0.4622

survey %>% 
  drop_na(CollegeEducation, VaccinationBeliefs) %>% 
  count(CollegeEducation, VaccinationBeliefs) 
NA
survey1 <- survey %>%
  mutate(CollegeEducation_simple = fct_collapse(CollegeEducation,
                                         degree = 
                                           c("Associate's or technical degree",
                                             "Bachelor's degree",
                                             "Some graduate school",
                                             "Graduate degree"),
                                         no_degree = 
                                           c("No high school degree",
                                             "High school degree",
                                             "Some college")))

survey1 %>%
  drop_na(CollegeEducation_simple, VaccinationBeliefs) %>%
  count(CollegeEducation_simple, VaccinationBeliefs)

Dividing the groups between college education and non-college educated.

 survey1 %>% 
  drop_na(CollegeEducation_simple, VaccinationBeliefs) %>% 
  ggplot(aes(x = VaccinationBeliefs, fill = CollegeEducation_simple)) +
  scale_fill_viridis_d() +
  coord_flip() +
  geom_bar(position = "fill")+
  labs(title = "Difference in belief in vaccination based on college education")

Limitations: While there was an N of over 300 indivuals who participate, the study does not specifically spell out what might be an impacting variable as the N within each subset was too small to be certain. As a preliminary study, however, there is useful data that could be explored later. Whereas, this was a volunteer basis participation there could be bias to either prove or disprove based off of what the individual promoting the study might have indicated as their preference or hypothesis. The method for distributing the study was not controlled.

