Week7_8Assignment

Load necessary packages

#install.packages("tidyr")
library(tidyr)

## Warning: package 'tidyr' was built under R version 3.1.3

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.1.3

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

1. Write down 3 questions that you might want to answer based on this data

a) How many people participated in the poll per city?

b) How many people voted yes in each city that are under 25?

c) How many people 25+ participated in the poll?

2. Create an R data frame with 2 observations to store this data in its current “messy” state. Use whatever method you want to re-create and/or load the data.

v.1 <- c("Edin.1624.Y","Edin.1624.N","Edin.25.Y","Edin.25.N","Glasgow.1624.Y","Glasgow.1624.N","Glasgow.25.Y","Glasgow.25.N")
v.2 <- c(80100,35900,143000,214800,99400,43000,150400,207000)

df.polldata <-data.frame(v.1,v.2)
names(df.polldata) <- c("Info","VoteCount")

3. Use the functionality in the tidyr package to convert the data frame to be “tidy data.”

df.tidyr1 <- separate(df.polldata, Info, into = c("City", "Under25", "VotedYN"), sep = "\\.")

df.tidyr1$Under25[df.tidyr1$Under25=="1624"] <- "y"
df.tidyr1$Under25[df.tidyr1$Under25=="25"] <- "n"

df.tidyr1$VotedYN[df.tidyr1$VotedYN=="Y"] <- "y"
df.tidyr1$VotedYN[df.tidyr1$VotedYN=="N"] <- "n"

4. Use the functionality in the dplyr package to answer the questions that you asked in step 1

df.tidyr1 %>% 
  group_by(City) %>% 
  summarise(
    Participants = sum(VoteCount)
    )

## Source: local data frame [2 x 2]
## 
##      City Participants
## 1    Edin       473800
## 2 Glasgow       499800

df.tidyr1 %>% 
  filter(Under25=="y") %>% 
  group_by(City) %>% 
  summarise(
    Under25 = sum(VoteCount)
  )

## Source: local data frame [2 x 2]
## 
##      City Under25
## 1    Edin  116000
## 2 Glasgow  142400

df.tidyr1 %>% 
  filter(Under25=="n") %>% 
  summarise(
    Above25 = sum(VoteCount)
  )

##   Above25
## 1  715200

5. Having gone through the process, would you ask different questions and/or change the way that you structured your data frame?

Since the dataset is really small, there cannot be lot of questions except asking about data slicing and dicing on few fields. But overall transforming data from wide format to long format makes sense for easy reporting and charting.

Week7_8Assignment

Senthil Dhanapal

Thursday, March 19, 2015

Load necessary packages

1. Write down 3 questions that you might want to answer based on this data

a) How many people participated in the poll per city?

b) How many people voted yes in each city that are under 25?

c) How many people 25+ participated in the poll?

2. Create an R data frame with 2 observations to store this data in its current “messy” state. Use whatever method you want to re-create and/or load the data.

3. Use the functionality in the tidyr package to convert the data frame to be “tidy data.”

4. Use the functionality in the dplyr package to answer the questions that you asked in step 1

5. Having gone through the process, would you ask different questions and/or change the way that you structured your data frame?

Since the dataset is really small, there cannot be lot of questions except asking about data slicing and dicing on few fields. But overall transforming data from wide format to long format makes sense for easy reporting and charting.