Chinedu Onyeka “Datase2” GitHub RPubs
Krutika Patel “Human Trafficking” GitHub RPubs
Peter Phung “Jiho Kim’s Stock Market Data from India” Github RPubs
# load library
library(stringr)
library(readr)
library(tidyverse)
library (tidyr)
library(jpeg)
library(gridExtra)
This analysis was sourced from The Science Creative Quarterly written by "David Ng and published at BoingBoing. Halloween is right around the corner and let’s say everyone is stocking up on sweets. Why would individuals continue to buy bulks of candy during a pandemic? Simple! We all love candy … Kit Kat, Hershey, Snickers, Twix, and a handful of others. The researchers, Ng & Cohen, compiled years of hierarchy of candy preference for years as a geology joke.
The “Candy Hierarchy Data 2017” data was complied on survey responses with ratings on how you feel when you receive this item in your Halloween haul.
# load the data from GitHub
data <- read.csv(url("https://github.com/candrewxs/Project2/blob/main/candyhierarchy2017.csv?raw=true"), header = FALSE)
# show the first parts of the dataframe
data <- data %>% mutate_all(na_if,"") # change the blank cells to "NA"
head(data[1:4,1:5, drop = FALSE]) # show the first four rows of the data
## V1 V2 V3 V4 V5
## 1 Internal ID Q1: GOING OUT? Q2: GENDER Q3: AGE Q4: COUNTRY
## 2 90258773 <NA> <NA> <NA> <NA>
## 3 90272821 No Male 44 USA
## 4 90272829 <NA> Male 49 USA
newdf <- rbind(df, data) # combine rows from two dataframes into new
names(newdf) <- newdf[1, ] # copy first row to the header
newdf <- newdf[-1:-2,] # delete first and second rows
newdf = select(newdf, -1,-114) # delete unnecessary"V1" and empty/no named column "V114"
newdf <- mutate_all(newdf, .funs = toupper) # change entire dataframe to uppercase
newdf[!apply(newdf == "", 1, all), ]
newdf[rowSums(is.na(newdf)) !=ncol(newdf), ]
The data has 2459 individuals completed the survey. The survey informed individuals to they can skipped a option/question, leave the question blank, or indicate “they don’t know the candy”.
Feeling Values: JOY - Does it make you happy? DESPAIR - Is it something that you automatically place in the junk pile? MEH - Indifference BLANK - No idea what the item is.
A interest to see the preference for “Plot1: Butterfinger and Plot2: Snickers” candy. In the bar plot the distribution on feelings, “JOY, DESPAIR, MEH, BLANK”.
# First, clean up age
newdf$AGE <- as.numeric(newdf$AGE)
## Warning: NAs introduced by coercion
newdf$AGE[is.na(newdf$AGE)] <- 0
age_candy = newdf %>% select(AGE, Butterfinger, Snickers, `Heath Bar`)
library(ggplot2)
library(gcookbook)
# First plot
ggplot(age_candy, aes(x = Butterfinger)) +
geom_histogram(position = "identity", stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
# Second plot
ggplot(age_candy, aes(x = Snickers)) +
geom_histogram(position = "identity", stat = "count")
## Warning: Ignoring unknown parameters: binwidth, bins, pad