Project 2

Project 2 - Group Members

Chinedu Onyeka “Datase2” GitHub RPubs

Krutika Patel “Human Trafficking” GitHub RPubs

Peter Phung “Jiho Kim’s Stock Market Data from India” Github RPubs

Candy Hierarchy Data 2017

Load packages

# load library
library(stringr)
library(readr)
library(tidyverse)
library (tidyr)
library(jpeg)
library(gridExtra)

So Much Candy Data, Seriously

This analysis was sourced from The Science Creative Quarterly written by "David Ng and published at BoingBoing. Halloween is right around the corner and let’s say everyone is stocking up on sweets. Why would individuals continue to buy bulks of candy during a pandemic? Simple! We all love candy … Kit Kat, Hershey, Snickers, Twix, and a handful of others. The researchers, Ng & Cohen, compiled years of hierarchy of candy preference for years as a geology joke.

The “Candy Hierarchy Data 2017” data was complied on survey responses with ratings on how you feel when you receive this item in your Halloween haul.

Reading in Data

# load the data from GitHub
data <- read.csv(url("https://github.com/candrewxs/Project2/blob/main/candyhierarchy2017.csv?raw=true"), header = FALSE)
# show the first parts of the dataframe
data <- data %>% mutate_all(na_if,"") # change the blank cells to "NA"
head(data[1:4,1:5, drop = FALSE]) # show the first four rows of the data

##            V1             V2         V3      V4          V5
## 1 Internal ID Q1: GOING OUT? Q2: GENDER Q3: AGE Q4: COUNTRY
## 2    90258773           <NA>       <NA>    <NA>        <NA>
## 3    90272821             No       Male      44        USA 
## 4    90272829           <NA>       Male      49         USA

Clean candy names

newdf <- rbind(df, data)   # combine rows from two dataframes into new 
names(newdf) <- newdf[1, ] # copy first row to the header
newdf <- newdf[-1:-2,]     # delete first and second rows

newdf = select(newdf, -1,-114) # delete unnecessary"V1" and empty/no named column "V114"
newdf <- mutate_all(newdf, .funs = toupper) # change entire dataframe to uppercase

newdf[!apply(newdf == "", 1, all), ]
newdf[rowSums(is.na(newdf)) !=ncol(newdf), ]

Analysis - Type of Feelings when Candy is Recieved

The data has 2459 individuals completed the survey. The survey informed individuals to they can skipped a option/question, leave the question blank, or indicate “they don’t know the candy”.

Feeling Values: JOY - Does it make you happy? DESPAIR - Is it something that you automatically place in the junk pile? MEH - Indifference BLANK - No idea what the item is.

A interest to see the preference for “Plot1: Butterfinger and Plot2: Snickers” candy. In the bar plot the distribution on feelings, “JOY, DESPAIR, MEH, BLANK”.

# First, clean up age
newdf$AGE <- as.numeric(newdf$AGE)

## Warning: NAs introduced by coercion

newdf$AGE[is.na(newdf$AGE)] <- 0

age_candy = newdf %>% select(AGE, Butterfinger, Snickers, `Heath Bar`)

library(ggplot2)
library(gcookbook)
# First plot
ggplot(age_candy, aes(x = Butterfinger)) +
geom_histogram(position = "identity", stat = "count")

## Warning: Ignoring unknown parameters: binwidth, bins, pad

# Second plot
ggplot(age_candy, aes(x = Snickers)) +
  geom_histogram(position = "identity", stat = "count")

## Warning: Ignoring unknown parameters: binwidth, bins, pad

Source Github RPubs