In this assignment, we will practice collaborating around a code project with Github as a class.
Using several TidyVerse packages, and the “How Popular is Trump?” dataset from
fivethirtyeight.com, I’m going to show a programming sample “vignette” that demonstrates how to use the
capabilities of ggplot2, dplyr, readr packages with this particular dataset.
# Load
library("tidyverse")
library(ggplot2)
library(dplyr)
polls <- read_csv("https://raw.githubusercontent.com/ssufian/Data_607/master/approval_topline.csv")
## Parsed with column specification:
## cols(
## president = col_character(),
## subgroup = col_character(),
## modeldate = col_character(),
## approve_estimate = col_double(),
## approve_hi = col_double(),
## approve_lo = col_double(),
## disapprove_estimate = col_double(),
## disapprove_hi = col_double(),
## disapprove_lo = col_double(),
## timestamp = col_character()
## )
polls1 <- read_csv('https://raw.githubusercontent.com/ssufian/Data_607/master/polling_data.csv')
## Parsed with column specification:
## cols(
## .default = col_character(),
## samplesize = col_double(),
## weight = col_double(),
## influence = col_double(),
## approve = col_double(),
## disapprove = col_double(),
## adjusted_approve = col_double(),
## adjusted_disapprove = col_double(),
## tracking = col_logical(),
## poll_id = col_double(),
## question_id = col_double()
## )
## See spec(...) for full column specifications.
polls1 <- as_tibble(polls1)
polls1 %>% ggplot(aes(x=pollster, y=adjusted_approve, fill=pollster))+
geom_bar(stat = "identity", position = "dodge") +
guides(fill = FALSE) +
ggtitle("Trump's approval ratings by Pollster types")+
theme(axis.text.x = element_text(angle = 60, hjust = 1))
polls1 %>% ggplot(aes(x=pollster, y=adjusted_disapprove, fill=pollster))+
geom_bar(stat = "identity", position = "dodge") +
guides(fill = FALSE) +
ggtitle("Trump's disapproval ratings by Pollster types")+
theme(axis.text.x = element_text(angle = 60, hjust = 1))
polls1 %>% ggplot(aes(x=subgroup, y=adjusted_approve, fill=subgroup))+
geom_bar(stat = "identity", position = "dodge") +
guides(fill = FALSE) +
ggtitle("Trump's approval ratings by voter types")+
theme(axis.text.x = element_text(angle = 60, hjust = 1))
polls1 %>% ggplot(aes(x=subgroup, y=adjusted_disapprove, fill=subgroup))+
geom_bar(stat = "identity", position = "dodge") +
guides(fill = FALSE) +
ggtitle("Trump's disapproval ratings by voter types")+
theme(axis.text.x = element_text(angle = 60, hjust = 1))
Observation:
mid 45% range while his disapproval ratings is in the mid 60% range
registered Voters; this group have an average higher adj. dis-approval ratings for Trump relatively
speaking.
relative to the dis-approvals among their peers.
Description:
Using filter we can select rows of the data frame matching conditions.
filter capability tutorial
pollfilter <- filter(polls1, samplesize >800 & samplesize < 1992)
head(pollfilter)
## # A tibble: 6 x 22
## president subgroup modeldate startdate enddate pollster grade samplesize
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Donald T~ All pol~ 11/5/2019 1/20/2017 1/22/2~ Gallup B 1500
## 2 Donald T~ All pol~ 11/5/2019 1/21/2017 1/23/2~ Gallup B 1500
## 3 Donald T~ All pol~ 11/5/2019 1/20/2017 1/24/2~ Ipsos B- 1632
## 4 Donald T~ All pol~ 11/5/2019 1/22/2017 1/24/2~ Gallup B 1500
## 5 Donald T~ All pol~ 11/5/2019 1/22/2017 1/24/2~ Rasmuss~ C+ 1500
## 6 Donald T~ All pol~ 11/5/2019 1/20/2017 1/25/2~ Quinnip~ B+ 1190
## # ... with 14 more variables: population <chr>, weight <dbl>,
## # influence <dbl>, approve <dbl>, disapprove <dbl>,
## # adjusted_approve <dbl>, adjusted_disapprove <dbl>,
## # multiversions <chr>, tracking <lgl>, url <chr>, poll_id <dbl>,
## # question_id <dbl>, createddate <chr>, timestamp <chr>
Description:
Using select we can keep the selected variables
select capability tutorial
df<- select(polls1, c("subgroup","pollster","approve", "disapprove"))
head(df)
## # A tibble: 6 x 4
## subgroup pollster approve disapprove
## <chr> <chr> <dbl> <dbl>
## 1 All polls Morning Consult 46 37
## 2 All polls Gallup 45 45
## 3 All polls Gallup 45 46
## 4 All polls Ipsos 42.1 45.2
## 5 All polls Gallup 46 45
## 6 All polls Rasmussen Reports/Pulse Opinion Research 57 43
weather <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/us-weather-history/KCLT.csv")
head(weather,n=2)
## date actual_mean_temp actual_min_temp actual_max_temp
## 1 2014-7-1 81 70 91
## 2 2014-7-2 85 74 95
## average_min_temp average_max_temp record_min_temp record_max_temp
## 1 67 89 56 104
## 2 68 89 56 101
## record_min_temp_year record_max_temp_year actual_precipitation
## 1 1919 2012 0
## 2 2008 1931 0
## average_precipitation record_precipitation
## 1 0.1 5.91
## 2 0.1 1.53
weather2 <- weather %>% separate(date, c("year", "month", "day"), sep = "-")
head(weather2,n=2)
## year month day actual_mean_temp actual_min_temp actual_max_temp
## 1 2014 7 1 81 70 91
## 2 2014 7 2 85 74 95
## average_min_temp average_max_temp record_min_temp record_max_temp
## 1 67 89 56 104
## 2 68 89 56 101
## record_min_temp_year record_max_temp_year actual_precipitation
## 1 1919 2012 0
## 2 2008 1931 0
## average_precipitation record_precipitation
## 1 0.1 5.91
## 2 0.1 1.53
weather3 <- select(weather2, year, actual_mean_temp, record_min_temp, record_max_temp, record_precipitation)
head(weather3,n=2)
## year actual_mean_temp record_min_temp record_max_temp
## 1 2014 81 56 104
## 2 2014 85 56 101
## record_precipitation
## 1 5.91
## 2 1.53
Description:
Using mutate to create another column based on existing column
For ex: creating a z-score of another column
weather4 <- weather3 %>%
mutate( precipitation_z_score = (record_precipitation - mean(record_precipitation)/sd(record_precipitation)))
head(weather4)
## year actual_mean_temp record_min_temp record_max_temp
## 1 2014 81 56 104
## 2 2014 85 56 101
## 3 2014 82 56 99
## 4 2014 75 55 99
## 5 2014 72 57 100
## 6 2014 74 57 99
## record_precipitation precipitation_z_score
## 1 5.91 3.3981049
## 2 1.53 -0.9818951
## 3 2.50 -0.0118951
## 4 2.63 0.1181049
## 5 1.65 -0.8618951
## 6 1.95 -0.5618951
Description:
weather5 <- weather4 %>% top_n(actual_mean_temp, n= 5)
(weather5)
## year actual_mean_temp record_min_temp record_max_temp
## 1 2015 86 53 100
## 2 2015 87 52 100
## 3 2015 87 53 100
## 4 2015 88 55 102
## 5 2015 86 53 102
## record_precipitation precipitation_z_score
## 1 2.76 0.2481049
## 2 1.86 -0.6518951
## 3 1.82 -0.6918951
## 4 1.25 -1.2618951
## 5 2.76 0.2481049