Rithika Kumar (joint with Alex Toolkin)
September 26, 2019
Review your problem sets. As Professor Hopkins noted, most people get 2s (checks). Everyone got feedback on the problem set, so read that as that is the important part.
install.packages("dplyr")library(dplyr)How do we get all the names of first generation pokemon?
pokedex[pokedex$generation == 1,]$namepokedex %>% filter(generation == 1) %>% select(name)Remember, piping does not modify your objects or create new objects!
filter(), select(), group_by(), summarise(),https://rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
And you can keep stacking transformations!
Example: What is the average sp_attack + sp_defense of each generation of pokemon?
mean() median()var() sd()quantile()cor() cov() table()Say we’re interested in 1st gen pokemon attack
Note that summary statistics don’t work if you have NAs!
mean(merged_lfp$k5)
uh oh!
Find out the IDs that don’t overlap between lfp1 and lfp2 Hint: Merge lfp1 and lfp2 using anti_join()
Use inner_join to merge lfp1 and lfp2. Now, find the tercile of income in the merged dataset Hint: Use inner join() and quantile()
Explore the Pokemon data set - some sample questions to try.