Harold Nelson
2/26/2018
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
load("~/Dropbox/RProjects/Oly Weather/olywthr.rdata")
Download the file demo1 from Moodle and inspect it. Then write R code using dplyr to create this file based on olywthr.
olywthr %>%
filter((yr == 2015 | yr == 2016),mo==2,dy>=14,dy<=29) %>%
select(DATE,TMIN) %>%
mutate(TMIN_C = (5/9)*(TMIN - 32)) %>%
arrange(TMIN) -> demo1
The dataframe demo2, available in Moodle was constructed from the diamonds dataframe found in ggplot2. Download and inspect it. Then write dplyr statements to recreate it. Note: ppc stands for dollars/carat.
diamonds %>%
filter(cut == "Ideal") %>%
select(cut,color,clarity,carat,price) %>%
mutate(dollars_per_carat = price/carat) %>%
group_by(cut,color,clarity) %>%
summarize(med_ppc = mean(dollars_per_carat),
cell_count = n()) %>%
ungroup() %>%
arrange(med_ppc) -> demo2
glimpse(demo2)
## Observations: 56
## Variables: 5
## $ cut <ord> Ideal, Ideal, Ideal, Ideal, Ideal, Ideal, Ideal, Id...
## $ color <ord> J, I, J, D, I, I, E, H, F, I, G, D, H, E, H, D, J, ...
## $ clarity <ord> VVS1, IF, IF, I1, VVS1, I1, I1, VVS1, I1, VVS2, I1,...
## $ med_ppc <dbl> 2612.756, 2722.393, 2774.645, 2898.389, 2954.667, 3...
## $ cell_count <int> 29, 95, 25, 13, 179, 17, 18, 326, 42, 178, 16, 738,...
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following object is masked from 'package:ggplot2':
##
## diamonds
## The following objects are masked from 'package:datasets':
##
## cars, trees
cc <- countyComplete %>%
select(name,state,pop2010,per_capita_income,area,density,bachelors) %>%
mutate(name = as.character(name),
state = as.character(state),
total_income = per_capita_income * pop2010)
glimpse(cc)
## Observations: 3,143
## Variables: 8
## $ name <chr> "Autauga County", "Baldwin County", "Barbour...
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", ...
## $ pop2010 <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 2...
## $ per_capita_income <dbl> 24568, 26469, 15875, 19918, 21070, 20289, 16...
## $ area <dbl> 594.44, 1589.78, 884.88, 622.58, 644.78, 622...
## $ density <dbl> 91.8, 114.6, 31.0, 36.8, 88.9, 17.5, 27.0, 1...
## $ bachelors <dbl> 21.7, 26.8, 13.5, 10.0, 12.5, 12.0, 11.0, 16...
## $ total_income <dbl> 1340700328, 4824372285, 435879875, 456420970...
summary(cc)
## name state pop2010 per_capita_income
## Length:3143 Length:3143 Min. : 82 Min. : 7772
## Class :character Class :character 1st Qu.: 11104 1st Qu.:19030
## Mode :character Mode :character Median : 25857 Median :21773
## Mean : 98233 Mean :22505
## 3rd Qu.: 66699 3rd Qu.:24814
## Max. :9818605 Max. :64381
## area density bachelors total_income
## Min. : 2.0 Min. : 0.0 Min. : 3.70 Min. :3.462e+06
## 1st Qu.: 430.7 1st Qu.: 16.9 1st Qu.:13.10 1st Qu.:2.203e+08
## Median : 615.6 Median : 45.2 Median :16.90 Median :5.370e+08
## Mean : 1123.7 Mean : 259.3 Mean :19.03 Mean :2.687e+09
## 3rd Qu.: 924.0 3rd Qu.: 113.8 3rd Qu.:22.60 3rd Qu.:1.503e+09
## Max. :145504.8 Max. :69467.5 Max. :71.00 Max. :2.685e+11
Create a states dataset using group_by and summarize.
In this dataset create average per capita income for each state with two different methods.
mean_county_pci is the simple average of the county per capita income values.
state_pci is constructed by adding up total income and population from the county values and dividing the two at the state level.
cc %>%
group_by(state) %>%
summarize(state_pci = sum(total_income)/sum(pop2010),
mean_county_pci = mean(total_income/pop2010)
) %>%
arrange(desc(mean_county_pci))-> states
glimpse(states)
## Observations: 51
## Variables: 3
## $ state <chr> "District of Columbia", "Connecticut", "New Je...
## $ state_pci <dbl> 42078.00, 36792.41, 34853.30, 33982.91, 28703....
## $ mean_county_pci <dbl> 42078.00, 34873.25, 34391.19, 33547.07, 32741....
Create a subset, pnw, of cc consisting of counties from Washington, Oregon and Idaho. Produce an appropriate graphic to compare the counties in these states on the basis of percent of people holding a bachelors degree.
cc %>%
filter(state %in% c("Washington","Oregon","Idaho")) -> pnw
pnw %>% ggplot(aes(x=state,y=bachelors)) +
geom_boxplot() +
coord_flip()
# or
pnw %>% ggplot(aes(x=bachelors,fill=state)) +
geom_density() +
facet_wrap(~state,ncol=1)