library(tidyverse)
library(corrr)
library(plotly)
library(ggplot2)
library(dplyr)
library(reshape2)
library(gplots)
library(data.table)
library(ggpubr)
library(xtable)
library(corrplot)
library(corrgram)
A survey was conducted examining how different subsets of people view vaccinations for children. The groups were organized based on sex, age, education levels, whether or not they had children, location(state); whether or not they believed in vaccinating; credible data from various sources; links between autism and MMR vaccination; their concern for side-effects in various areas. Survey was conducted via a link that participants responded to which means that they voluntered to take the survey. The were not compensated.
The link is listed below in correspondence with the import of the data.
survey <- read_csv("https://docs.google.com/spreadsheets/d/138AJq8uaPl0XlF3Bjj3q0g_Pb3RbmBQLFazj4hhJalw/export?format=csv")
Parsed with column specification:
cols(
Timestamp = [31mcol_character()[39m,
`What is your sex?` = [31mcol_character()[39m,
`What is your age?` = [31mcol_character()[39m,
`What is your highest education completed?` = [31mcol_character()[39m,
`If you went to college, what was your major?` = [31mcol_character()[39m,
`Do you have children?` = [31mcol_character()[39m,
`What state are you from?` = [31mcol_character()[39m,
`Are you religious?` = [32mcol_double()[39m,
`I believe children should be vaccinated.` = [32mcol_double()[39m,
`I trust the information I receive about shots.` = [32mcol_double()[39m,
`I believe that there could be a link between the MMR vaccination and autism.` = [32mcol_double()[39m,
`I worry about possible side effects of vaccinations.` = [32mcol_double()[39m,
`I believe the media exaggerates reports about disease outbreak and vaccinations.` = [32mcol_double()[39m,
`If I were to have a child today, I would want them to have all of the recommended vaccinations.` = [32mcol_double()[39m,
`Healthy children should be required to be vaccinated to attend school because of potential risks to others.` = [32mcol_double()[39m,
`If you wish to expand on any of your answers above, do so here:` = [31mcol_character()[39m
)
This glimpse provides a overview of the data in its raw form. As participants were given the option of filling in data it is not formatted in a way that allows an examination of individuals as they were meant to be grouped.
glimpse(survey)
Observations: 305
Variables: 16
$ Timestamp [3m[90m<chr>[39m[23m …
$ `What is your sex?` [3m[90m<chr>[39m[23m …
$ `What is your age?` [3m[90m<chr>[39m[23m …
$ `What is your highest education completed?` [3m[90m<chr>[39m[23m …
$ `If you went to college, what was your major?` [3m[90m<chr>[39m[23m …
$ `Do you have children?` [3m[90m<chr>[39m[23m …
$ `What state are you from?` [3m[90m<chr>[39m[23m …
$ `Are you religious?` [3m[90m<dbl>[39m[23m …
$ `I believe children should be vaccinated.` [3m[90m<dbl>[39m[23m …
$ `I trust the information I receive about shots.` [3m[90m<dbl>[39m[23m …
$ `I believe that there could be a link between the MMR vaccination and autism.` [3m[90m<dbl>[39m[23m …
$ `I worry about possible side effects of vaccinations.` [3m[90m<dbl>[39m[23m …
$ `I believe the media exaggerates reports about disease outbreak and vaccinations.` [3m[90m<dbl>[39m[23m …
$ `If I were to have a child today, I would want them to have all of the recommended vaccinations.` [3m[90m<dbl>[39m[23m …
$ `Healthy children should be required to be vaccinated to attend school because of potential risks to others.` [3m[90m<dbl>[39m[23m …
$ `If you wish to expand on any of your answers above, do so here:` [3m[90m<chr>[39m[23m …
Category response questions were simplified to better analyze the data and code.
survey <- survey %>%
rename(Sex = `What is your sex?`,
Age = `What is your age?`,
CollegeEducation = `What is your highest education completed?`,
CollegeMajor = `If you went to college, what was your major?`,
Children = `Do you have children?`,
HomeState = `What state are you from?`,
Religious = `Are you religious?`,
VaccinationBeliefs = `I believe children should be vaccinated.`,
TrustIt = `I trust the information I receive about shots.`,
AutismLink = `I believe that there could be a link between the MMR vaccination and autism.`,
SideEffectsWorry = `I worry about possible side effects of vaccinations.`,
Exaggeration = `I believe the media exaggerates reports about disease outbreak and vaccinations.`,
WantVaccinations = `If I were to have a child today, I would want them to have all of the recommended vaccinations.`,
RequiredVaccinations = `Healthy children should be required to be vaccinated to attend school because of potential risks to others.`,
Expansion = `If you wish to expand on any of your answers above, do so here:`)
A review of data in altered form.
glimpse(survey)
Observations: 305
Variables: 16
$ Timestamp [3m[90m<chr>[39m[23m "2/19/2019 19:39:42", "2/19/2019 20…
$ Sex [3m[90m<chr>[39m[23m "female", "Female", "Female", "Fema…
$ Age [3m[90m<chr>[39m[23m "21", "26", "40", "41", "31", "55",…
$ CollegeEducation [3m[90m<chr>[39m[23m "Some college", "Some college", "So…
$ CollegeMajor [3m[90m<chr>[39m[23m "criminal justice", "Psychology and…
$ Children [3m[90m<chr>[39m[23m "No", "Yes", "Yes", "Yes", "No", "N…
$ HomeState [3m[90m<chr>[39m[23m "montana", "Missouri", "Montana", "…
$ Religious [3m[90m<dbl>[39m[23m 1, 4, 4, 2, 3, 3, 3, 2, 1, 2, 4, 4,…
$ VaccinationBeliefs [3m[90m<dbl>[39m[23m 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1,…
$ TrustIt [3m[90m<dbl>[39m[23m 2, 1, 4, 2, 2, 1, 2, 1, 1, 3, 2, 1,…
$ AutismLink [3m[90m<dbl>[39m[23m 5, 5, 5, 3, 3, 3, 5, 5, 5, 3, 5, 5,…
$ SideEffectsWorry [3m[90m<dbl>[39m[23m 5, 5, 2, 4, 3, 3, 3, 5, 4, 3, 4, 4,…
$ Exaggeration [3m[90m<dbl>[39m[23m 1, 3, 2, 4, 4, 3, 4, 5, 1, 3, 3, 1,…
$ WantVaccinations [3m[90m<dbl>[39m[23m 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ RequiredVaccinations [3m[90m<dbl>[39m[23m 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ Expansion [3m[90m<chr>[39m[23m NA, NA, NA, NA, "I would do my rese…
Count of sex. Note the variations that exist.
survey %>%
count(Sex)
As a way to correct and properly categorize the data, categories were merged into their appropriate designation. For example “F” or Female for female was
survey <- survey %>%
mutate(Sex = str_to_lower(Sex))
survey$Sex <- recode(survey$Sex, "m" = "male",
"f" = "female",
"F" = "female",
"femail" = "female",
"femal" = "female",
"feme" = "female",
"gemale" = "female",
"girl" = "female",
"replace sex with gender! female" = "female",
"republican ( male)" = "male",
"mucho" = "male",
"apache attack helicopter" = "other",
"californian" = "other",
"not enough" = "other",
"snap-on tool box" = "other",
"the rough, passionate kind" = "other")
Count of the different sex categories
survey %>%
count(Sex)
Table reflecting the data on each gender and their views on vaccination
tbl = table(survey$Sex, survey$VaccinationBeliefs)
tbl
1 2 3 4 5
female 192 21 12 5 18
male 36 3 6 1 3
other 4 1 0 0 0
glimpse(survey)
Observations: 305
Variables: 16
$ Timestamp [3m[90m<chr>[39m[23m "2/19/2019 19:39:42", "2/19/2019 20…
$ Sex [3m[90m<chr>[39m[23m "female", "female", "female", "fema…
$ Age [3m[90m<chr>[39m[23m "21", "26", "40", "41", "31", "55",…
$ CollegeEducation [3m[90m<chr>[39m[23m "Some college", "Some college", "So…
$ CollegeMajor [3m[90m<chr>[39m[23m "criminal justice", "Psychology and…
$ Children [3m[90m<chr>[39m[23m "No", "Yes", "Yes", "Yes", "No", "N…
$ HomeState [3m[90m<chr>[39m[23m "montana", "Missouri", "Montana", "…
$ Religious [3m[90m<dbl>[39m[23m 1, 4, 4, 2, 3, 3, 3, 2, 1, 2, 4, 4,…
$ VaccinationBeliefs [3m[90m<dbl>[39m[23m 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1,…
$ TrustIt [3m[90m<dbl>[39m[23m 2, 1, 4, 2, 2, 1, 2, 1, 1, 3, 2, 1,…
$ AutismLink [3m[90m<dbl>[39m[23m 5, 5, 5, 3, 3, 3, 5, 5, 5, 3, 5, 5,…
$ SideEffectsWorry [3m[90m<dbl>[39m[23m 5, 5, 2, 4, 3, 3, 3, 5, 4, 3, 4, 4,…
$ Exaggeration [3m[90m<dbl>[39m[23m 1, 3, 2, 4, 4, 3, 4, 5, 1, 3, 3, 1,…
$ WantVaccinations [3m[90m<dbl>[39m[23m 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ RequiredVaccinations [3m[90m<dbl>[39m[23m 1, 1, 2, 1, 2, 1, 1, 1, 1, 3, 1, 1,…
$ Expansion [3m[90m<chr>[39m[23m NA, NA, NA, NA, "I would do my rese…
x-squared testing to determine significance.
chisq.test(tbl)
Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: tbl
X-squared = 5.8045, df = 8, p-value = 0.6691
In order to be statistically significant the p-value must not exceed .05. In this scenario the sex categories in comparison to their view points do not have significant differences as the p-value was 0.6691. Sex is not a contributing factor to the way individuals view vaccinations.
The following is an analysis of age versus beliefs on vaccination:
survey<-survey%>%
mutate(Age = as.numeric(Age)) %>%
mutate(Children = as_factor(Children))
NAs introduced by coercion
Mean vaccination agreement of group based on yes or no answer.
survey %>%
drop_na(Children, VaccinationBeliefs) %>%
group_by(Children) %>%
summarize(VaccinationBeliefs = mean(VaccinationBeliefs))
A breakdown of each of the age categories.
survey %>%
drop_na(Age, RequiredVaccinations) %>%
group_by(Age) %>%
summarize(RequiredVaccinations = mean(RequiredVaccinations))
X squared ananlysis if age is determining factor in a person’s beliefs on vaccinating children.
tbl = table(survey$Age, survey$VaccinationBeliefs)
chisq.test(tbl)
Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: tbl
X-squared = 41.257, df = 16, p-value = 0.0005088
P-value is less than .05 therefor age is a contributing factor to their beliefs on vaccinations.
Different age groups are established.
setDT(survey)
survey[Age <20, Age := "under 20"]
survey[Age >19 & Age <30, Age := "20-29"]
survey[Age >29 & Age <40, Age := "30-39"]
survey[Age >39 & Age <50, Age := "40-49"]
survey[Age >49 & Age <60, Age := "50-59"]
survey[Age >59, Age := "above 60"]
survey %>%
drop_na(Age, VaccinationBeliefs) %>%
count(Age, VaccinationBeliefs)
Visual Representation of age groups that believe in childhood vaccinations.
survey %>%
drop_na(Age, VaccinationBeliefs) %>%
ggplot(aes(x = VaccinationBeliefs, fill = Age)) +
scale_fill_viridis_d() +
coord_flip() +
geom_bar(position = "fill")+
labs(title = "Difference in beliefs based off of different age categories")
5 denotes Strongly agree where as 1 denotes disagreement.
Individuals in the 20-29 and 50-59 age range tend to agree more with vaccinations.
An analysis on whether or not having children makes a difference in beliefs on vaccinating.
survey %>%
drop_na(Children, VaccinationBeliefs) %>%
group_by(Children) %>%
summarize(VaccinationBeliefs = mean(VaccinationBeliefs)) %>%
ggplot(aes(x = Children, y = VaccinationBeliefs)) +
geom_col()
Individuals with children arwe more likely, although not by a large measure, to agree with vaccinations.
t.test(survey$VaccinationBeliefs~survey$Children)
Welch Two Sample t-test
data: survey$VaccinationBeliefs by survey$Children
t = -2.7542, df = 237.19, p-value = 0.00634
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.6365902 -0.1056679
sample estimates:
mean in group No mean in group Yes
1.366460 1.737589
Contrary to the above graph: T-test shows that participants who have children are less likely to agree with vaccinating children based off of their t-score in comparison to those who do not. Alpha .05
Analysis of Vaccination beliefs and their effects on autism spectrum disorder
cor.test(survey$VaccinationBeliefs, survey$AutismLink, method = "pearson")
Pearson's product-moment correlation
data: survey$VaccinationBeliefs and survey$AutismLink
t = -13.857, df = 300, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6889943 -0.5506707
sample estimates:
cor
-0.624709
As there is a negative correlation between vaccination beliefs and the participants linking the disorder with the vaccination indicates thats individuals are less likely to vaccinate their children when they believe that it could lead to autism spectrum disorder.
ggscatter(survey, x = "VaccinationBeliefs", y = "AutismLink",
color = "black",shape = 1, size = 1,
add = "reg.line",
add.params = list(color = "blue", fill = "light blue"),
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "pearson",
xlab = "Believe in childhood vaccinations",
ylab = "MMR vaccination is linked to Autism",
caption="Note:1=Strongly agree; 5=Strongly disagree")
An analysis of whether or not being religious has an impact on agreement with vaccinations
cor.test(survey$VaccinationBeliefs, survey$Religious, method = "pearson")
Pearson's product-moment correlation
data: survey$VaccinationBeliefs and survey$Religious
t = 0.47608, df = 302, p-value = 0.6344
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.0853704 0.1394476
sample estimates:
cor
0.02738489
This sample indicates that religion has no impact on their beliefs in vaccinating.
Data seperated by region/ State
survey$HomeState <- recode(survey$HomeState, "montana" = "Montana",
"Missouri" = "Missouri",
"406" = "Montana",
"Ca" = "California",
"CA" = "California",
"Calif" = "California",
"California" = "California",
"Co" = "Colorado",
"Co Down" = "Colorado",
"Colrado" = "Colorado",
"Hard to say. MT I guess" = "Montana",
"Grew up in Illinois" = "Illinois",
"From Bulgaria actually" = "International",
"IL" = "Illinois",
"illinois/now living in california." = "Illinois",
"International Student" = "International",
"Mont9" = "Montana",
"Va" = "Virginia",
"VA" = "Virginia",
"wa" = "Washington",
"Wa" = "Washington",
"WA" = "Washington",
"Wa." = "WAshington",
"TX" = "Texas",
"WAshington" = "Washington",
"province of Ontario, in Canada" = "International",
"OR" = "Oregon",
"PA" = "Pennsylvania",
"ID" = "Idaho",
"Md" = "Maryland",
"MN" = "Minnesota",
"MO" = "Missouri",
"MONTANA" = "Montana",
"mt" = "Montana",
"Mt" = "Montana",
"MT" = "Montana",
"N.Y." = "New York",
"Omaha" = "Nebraska",
"SC" = "South Carolina",
"SD" = "South Dakota",
"Switzerland" = "International",
"Not" = "International",
"WAshington" = "Washington")
Counts of location table.
survey %>%
count(HomeState) %>%
drop_na()
Education as an impacting factor on beliefs on vaccinations.
survey %>%
drop_na(CollegeEducation, VaccinationBeliefs) %>%
count(CollegeEducation, VaccinationBeliefs)
Seperating group by education.
survey%>%
count(VaccinationBeliefs, CollegeEducation) %>%
group_by(CollegeEducation) %>%
mutate(prop = n / sum(n)) %>%
spread(key = VaccinationBeliefs, value = prop)
Graph of level of education and the beliefs on vaccination.
survey %>%
drop_na(CollegeEducation, VaccinationBeliefs) %>%
ggplot(aes(x = VaccinationBeliefs, fill = CollegeEducation)) +
scale_fill_viridis_d() +
coord_flip() +
geom_bar(position = "fill")+
labs(title = "Vaccination beliefs based off of level of education")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Some college make up the majority of the belief scale categories. It appears that associate’s degrees or technical degrees tend to be most likely to vaccinate.
Table of level of education and vaccination beliefs.
tbl1 = table(survey$CollegeEducation, survey$VaccinationBeliefs)
tbl1
1 2 3 4 5
Associate's or technical degree 22 1 2 2 3
Bachelor's degree 45 5 1 0 5
Graduate degree 32 3 0 0 3
High school degree 30 5 4 1 5
No high school degree 4 0 1 0 0
Some college 81 10 10 3 5
Some graduate school 19 1 0 0 0
chisq.test(tbl1)
Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: tbl1
X-squared = 23.989, df = 24, p-value = 0.4622
Education has no impact on vaccination beliefs as the p-value was greater than 0.4622
survey %>%
drop_na(CollegeEducation, VaccinationBeliefs) %>%
count(CollegeEducation, VaccinationBeliefs)
NA
survey1 <- survey %>%
mutate(CollegeEducation_simple = fct_collapse(CollegeEducation,
degree =
c("Associate's or technical degree",
"Bachelor's degree",
"Some graduate school",
"Graduate degree"),
no_degree =
c("No high school degree",
"High school degree",
"Some college")))
survey1 %>%
drop_na(CollegeEducation_simple, VaccinationBeliefs) %>%
count(CollegeEducation_simple, VaccinationBeliefs)
Dividing the groups between college education and non-college educated.
survey1 %>%
drop_na(CollegeEducation_simple, VaccinationBeliefs) %>%
ggplot(aes(x = VaccinationBeliefs, fill = CollegeEducation_simple)) +
scale_fill_viridis_d() +
coord_flip() +
geom_bar(position = "fill")+
labs(title = "Difference in belief in vaccination based on college education")
Limitations: While there was an N of over 300 indivuals who participate, the study does not specifically spell out what might be an impacting variable as the N within each subset was too small to be certain. As a preliminary study, however, there is useful data that could be explored later. Whereas, this was a volunteer basis participation there could be bias to either prove or disprove based off of what the individual promoting the study might have indicated as their preference or hypothesis. The method for distributing the study was not controlled.