Assignment 7/8 - Tidying and Transforming Data

1) Create an R data frame with 2 observations to store this data in its current “messy” state.
2) Use the functionality in the tidyr package to convert the data frame to be “tidy data.”
3) Analyze data and answer questions
Create an R data frame with 2 observations to store this data in its current “messy” state.
Yes is definded as the voter prefers Cullen Skink over Partan Bree
No is definded as the voter Does not prefer Cullen Skink over Partan Bree
df <- data.frame(yes = c(80100,  143000,  99400,    150400),
                  no = c(35900,  214800,    43000,  207000)
                )
df
##      yes     no
## 1  80100  35900
## 2 143000 214800
## 3  99400  43000
## 4 150400 207000
Start tidying the data;
put in columns with Response and counts
df <- df %>% 
  gather(Response, Count,  yes:no )
df
##   Response  Count
## 1      yes  80100
## 2      yes 143000
## 3      yes  99400
## 4      yes 150400
## 5       no  35900
## 6       no 214800
## 7       no  43000
## 8       no 207000
Add a column to data for city
df$City <- c("Edinburgh", "Edinburgh", "Glasgow", "Glasgow")
df
##   Response  Count      City
## 1      yes  80100 Edinburgh
## 2      yes 143000 Edinburgh
## 3      yes  99400   Glasgow
## 4      yes 150400   Glasgow
## 5       no  35900 Edinburgh
## 6       no 214800 Edinburgh
## 7       no  43000   Glasgow
## 8       no 207000   Glasgow
Add a column to data for Age Group
df$AgeGroup <- c("16-24", "25+")
df
##   Response  Count      City AgeGroup
## 1      yes  80100 Edinburgh    16-24
## 2      yes 143000 Edinburgh      25+
## 3      yes  99400   Glasgow    16-24
## 4      yes 150400   Glasgow      25+
## 5       no  35900 Edinburgh    16-24
## 6       no 214800 Edinburgh      25+
## 7       no  43000   Glasgow    16-24
## 8       no 207000   Glasgow      25+
Show the tidy data in a grid form
grid.table(df)

Data Analysis

1) Is Cullen Skink more preferred than Partan Bree overall?

____________________________________________________________
Sum of Total Responses
TotalResponses <- df %>%  
  summarise (TotalResponses = sum(Count))
TotalResponses
##   TotalResponses
## 1         973600
What are the total counts by responses?
Response <- df %>%
  group_by (Response)%>%
  summarise(Total = sum(Count, na.rm = TRUE))
Response
## Source: local data frame [2 x 2]
## 
##   Response  Total
## 1      yes 472900
## 2       no 500700
Let’s see only Yes Responses
Yes <- filter(Response, Response == "yes")
Yes
## Source: local data frame [1 x 2]
## 
##   Response  Total
## 1      yes 472900
Let’s see only No Responses
No <- filter(Response, Response == "no")
No
## Source: local data frame [1 x 2]
## 
##   Response  Total
## 1       no 500700
Less people (472900) prefer Cullen Skink over Partan Bree (500700)

—————————————————————————-

Data Analysis

2) Which city prefers Cullen Skink over Partan Bree, Edinburgh or Glasgow?

_____________________________________________________________________________
Sum of Responses by City
CityResponses <- df %>%
  group_by (City, Response)%>%
  summarise(Total = sum(Count, na.rm = TRUE))
CityResponses
## Source: local data frame [4 x 3]
## Groups: City
## 
##        City Response  Total
## 1 Edinburgh      yes 223100
## 2 Edinburgh       no 250700
## 3   Glasgow      yes 249800
## 4   Glasgow       no 250000
Let’s see yes Responses for city
CityYes <- CityResponses %>%
                  filter(Response == "yes" )
CityYes 
## Source: local data frame [2 x 3]
## Groups: City
## 
##        City Response  Total
## 1 Edinburgh      yes 223100
## 2   Glasgow      yes 249800
More people (249800) in Glasgow prefer Cullen Skink over Partan Bree than in Edningurgh (223100)
——————————————————————————–

Data Analysis

3) Which age group prefers Cullen Skink over Partan Bree, 16-24 or 25+?

_____________________________________________________________________________
Sum of Responses by AgeGroup
AgeResponses <- df %>%
  group_by (AgeGroup, Response)%>%
  summarise(Total = sum(Count, na.rm = TRUE))
AgeResponses
## Source: local data frame [4 x 3]
## Groups: AgeGroup
## 
##   AgeGroup Response  Total
## 1    16-24      yes 179500
## 2    16-24       no  78900
## 3      25+      yes 293400
## 4      25+       no 421800
Let’s see which yes responses by Age Group
AgeYes <- AgeResponses %>%
                  filter(Response == "yes" )
AgeYes 
## Source: local data frame [2 x 3]
## Groups: AgeGroup
## 
##   AgeGroup Response  Total
## 1    16-24      yes 179500
## 2      25+      yes 293400
More people in age group of 25+ (293400) prefer Cullen Skink over Partan Bree than in age group of 16-24(179500)

Conclusion

1) Overall less people prefer Cullen Skink over Partan Bree
2) More people in Glasgow prefer Cullen Skink than in Edinburgh
3) More people in the Age group 25+ prefer Cullen Skink over Partan Bree than in age group of 16-24