In a recent mythical poll in Scotland, voters were asked if they preferred Cullen skink over Partan bree. Here are the results of the poll, with results broken down by city and by age group.
3 Questions: 1. Percentage of responses by group 2. Total Yes/No responses with percentage of total 3. Percent responses for each location
Read data
require(tidyr)
## Loading required package: tidyr
## Warning: package 'tidyr' was built under R version 3.1.3
require(dplyr)
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.1.3
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- read.csv("c:/Users/Public/data.csv", sep = ",", header = F, skip = 2, stringsAsFactor = F) #read csv, skip top two rows
colnames(data) <- c("Response", "Edinburgh.1624", "Edinburgh.25", "Glasgow.1624", "Glasgow.25") #set colnames
print(data)
## Response Edinburgh.1624 Edinburgh.25 Glasgow.1624 Glasgow.25
## 1 Yes 80100 143000 99400 150400
## 2 No 35900 214800 43000 207000
Tidy data
data2 <- data %>% gather(Location.Age, Frequency, 2:5, convert = F)
data3 <- data2 %>% separate(Location.Age, into = c("Location", "Age"))
data3[data3=="1624"] <- "16-24"
data3[data3=="25"] <- "25+"
data4 <- data3[c(2, 3, 1, 4)]
print(data4)
## Location Age Response Frequency
## 1 Edinburgh 16-24 Yes 80100
## 2 Edinburgh 16-24 No 35900
## 3 Edinburgh 25+ Yes 143000
## 4 Edinburgh 25+ No 214800
## 5 Glasgow 16-24 Yes 99400
## 6 Glasgow 16-24 No 43000
## 7 Glasgow 25+ Yes 150400
## 8 Glasgow 25+ No 207000
Answer Questions
data4 %>%
mutate("% Total Response" = round((Frequency / sum(Frequency)*100), 2))
## Location Age Response Frequency % Total Response
## 1 Edinburgh 16-24 Yes 80100 8.23
## 2 Edinburgh 16-24 No 35900 3.69
## 3 Edinburgh 25+ Yes 143000 14.69
## 4 Edinburgh 25+ No 214800 22.06
## 5 Glasgow 16-24 Yes 99400 10.21
## 6 Glasgow 16-24 No 43000 4.42
## 7 Glasgow 25+ Yes 150400 15.45
## 8 Glasgow 25+ No 207000 21.26
data4 %>%
group_by(Response) %>%
summarise(Frequency = sum(Frequency)) %>%
mutate("% Total Response" = round((Frequency / sum(Frequency)*100), 2))
## Source: local data frame [2 x 3]
##
## Response Frequency % Total Response
## 1 No 500700 51.43
## 2 Yes 472900 48.57
data4 %>%
group_by(Location, Response) %>%
summarise(Frequency = sum(Frequency)) %>%
mutate("% Total Response" = round((Frequency / sum(Frequency)*100), 2))
## Source: local data frame [4 x 4]
## Groups: Location
##
## Location Response Frequency % Total Response
## 1 Edinburgh No 250700 52.91
## 2 Edinburgh Yes 223100 47.09
## 3 Glasgow No 250000 50.02
## 4 Glasgow Yes 249800 49.98
Additional Question
Calculate Yes/No responses for each age group (easier after tidying data)
data4 %>%
group_by(Age, Response) %>%
summarise(Frequency = sum(Frequency)) %>%
mutate("% Total Response" = round((Frequency / sum(Frequency)*100), 2))
## Source: local data frame [4 x 4]
## Groups: Age
##
## Age Response Frequency % Total Response
## 1 16-24 No 78900 30.53
## 2 16-24 Yes 179500 69.47
## 3 25+ No 421800 58.98
## 4 25+ Yes 293400 41.02