library(readr)
library(dplyr)
library(texreg)
library(ggplot2)
library(Zelig)
library(ggrepel)
library(HistData)
library(tidyverse)
library(ggthemes)
The data in this assignment is collected from a online survey which was created for participants to choose the one candy they would perfer to recieve. The website was created with having 2 fun sized candies on display and each respondents would have to choose which one they perfer. The data collected more than 269, 000 votes from 8,371 different IP addresses. The data was obtained from Kaggle.com named as “The Ultimate Halloween Candy Power Ranking”
The variables in this dataset include attributes for each candy along with its rankings. For example attributes such as chocoalte, fruity, caramel, etc. Each candy(variable - competitorname) has 1 or 0 for each attribute. The binary variables indicate yes as 1 and no as 0. The data askes “Does it contain chocolate?” for each candy name chocolate. “Is it fruit flavored?” for fruity, “Is there caramel in the candy?” for caramel and so on. The variable sugarpercent indicates the percentile of sugar it falls under within the data set. Pricepercent indicates the unit price percentile compared to the rest of the set and the winpercent variable gives an overall win percentage according to 269,000 votes.
As mentioned above, this dataset gives us different information on different candies including how expensive they are and how much sugar they have in them. This assignment will be asking whether there is a relationship between a candy’s price and the amount of sugar in it.
candyRankings <- read_csv("/Users/Deepakie/Documents/Queens College/SOC712/Data/candy-data.csv")
head(candyRankings)
ggplot(data = candyRankings, aes(x = sugarpercent, y = pricepercent)) +
geom_point()
The scatter plot represents each individual observation as a single point. In other words, the position of each point is determinded by the value of the variables assigned to the sugarpercent(on the x-axis) and pricepercent (on the y-axis).We can keep adding additional layers to this plot to make it more informative. So lets take a look at some more graphs and plots to indiciate the realtionship between sugar and price of each candy.
ggplot(data = candyRankings, aes(x = sugarpercent, y = pricepercent)) +
geom_point() +
geom_smooth(method = "lm")
ggplot(data = candyRankings, aes(x = sugarpercent, y = pricepercent)) +
geom_point() + geom_bin2d()
ggplot(data = candyRankings, aes(x = sugarpercent, y = pricepercent)) +
geom_point() + geom_smooth()
ggplot(data = candyRankings, aes(x = sugarpercent,
y = pricepercent,
label = competitorname)) +
geom_point() +
geom_smooth(method = "lm") +
geom_text()
ggplot(data = candyRankings, aes(x = sugarpercent,
y = pricepercent,
label = competitorname
)) +
geom_point() +
geom_smooth(method = "lm") +
geom_text(check_overlap = T,
vjust = "bottom",
nudge_y = 0.01,
angle = 30,
size = 2) +
labs(title = "More sugary candies are more expensive",
x = "Sugar content (percentile)",
y = "Price (percentile)")
ggplot(data = candyRankings, aes(x = sugarpercent,
y = pricepercent,
label = competitorname
)) +
geom_smooth(method = "lm") +
geom_text(check_overlap = T,
angle = 30,
size = 2.5) +
labs(title = "More sugary candies equal more expensive",
x = "Sugar content (percentile)",
y = "Price (percentile)" )
candyFeatures <- candyRankings %>% select(2:10)
candyFeatures[] <- lapply(candyFeatures, as.logical)
# make a bar plot
ggplot(candyFeatures, aes(x = chocolate)) +
geom_bar()
ggplot(candyFeatures, aes(x = chocolate,
fill = caramel
)) + geom_bar()
ggplot(candyFeatures, aes(x = chocolate,
fill = caramel)) +
geom_bar(position = "dodge")
ggplot(candyFeatures, aes(x = chocolate,
fill = caramel)) +
geom_bar(position = "dodge") +
facet_wrap(c("caramel"))
ggplot(candyFeatures, aes(x = chocolate,
fill = caramel )) +
geom_bar(position = "dodge", size = 2) +
facet_wrap(c("caramel")) +
scale_fill_manual(values=c("#BBBBBB",
"#E69F00")) + #
labs(title = "Chocolate candies are more likely to have caramel",
x = "Is the candy chocolate?",
y = "Count of candies") +
theme(legend.position = c(0.9, 0.9),
strip.background = element_blank(),
strip.text.x = element_blank())
# make a bar plot
ggplot(candyFeatures, aes(x = chocolate,
fill = caramel # map the fill color to caramel
)) + # set up the plot
geom_bar(position = "dodge", size = 2) + # add the barpot
facet_wrap(c("caramel")) + # put each level of "caramel" in a different facet
scale_fill_manual(values=c("#BBBBBB", # a nice, neuteral grey
"#E69F00")) + # a gold caramel color
labs(title = "Chocolate candies are more likely to have caramel", # title
x = "Is the candy chocolate?", # x axis
y = "Count of candies") + # y axis
theme(legend.position = c(0.9, 0.9), # move legend inside plot
strip.background = element_blank(), # remove strip from top of facets
strip.text.x = element_blank()) # remove text from top of facets