Pokemon dataset from https://www.openintro.org/stat/data/?data=pokemon.
Option 1, after downloading the csv file to the same directory as this script (PokemonGO.Rmd):
pokemon <- read.csv("pokemon.csv")
Option 2, downloading from the internet:
pokemon = read.csv("https://www.openintro.org/stat/data/pokemon.csv")
pfit = lm(cp ~ species + hp + weight + height, data=pokemon)
pfit2 = lm(cp ~ (species + hp + weight + height)^2, data=pokemon) #all 2-way interactions
pfit3 = lm(cp ~ . - notes - name - attack_weak, data=pokemon) # all available predictors except for notes, names, and attack_weak.
dplyr introductionHelp at https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
Create a dataset that excludes all columns whose name ends in “new” (notice the - before ends_with():
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
pokemon2 = select(pokemon, -ends_with("new"))
Exclude columns whose name ends in new, then keep only rows where the species is “Pidgey”:
pokemon2 = pokemon %>%
select(-ends_with("new")) %>%
filter(species == "Pidgey")
summary(pokemon$species)
## Caterpie Eevee Pidgey Weedle
## 10 6 39 20
summary(pokemon2$species)
## Caterpie Eevee Pidgey Weedle
## 0 0 39 0
ggplot2 introductionHelp at https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
ggplot2 plots are built up piece by piece. The following creates a blank plot, and store all the data to make that plot and build on it in the myplot object:
library(ggplot2)
myplot = ggplot(aes(cp, cp_new), data=pokemon)
To add a scatterplot with colors and a legend, just add a geom_point() call. We could save this to a new object, like myplot2, but in this example we won’t create any new object, just make a scatterplot:
myplot + geom_point(aes(color=species))
Try doing the following one by one, just adding new things to the existing plot. These functions are all documented in the ggplot2 cheat sheet. You can also try skipping some of the lines. Note that nothing will happen until there is a line that doesn’t end in “+”
ggplot(aes(cp, cp_new), data=pokemon) +
geom_point(aes(color=species)) + #scatterplot
geom_smooth(method="lm") + #linear regression line and confidence bands
theme_bw() #get rid of the grey background
Now let’s make a boxplot:
ggplot(aes(species, cp), data=pokemon) +
geom_boxplot(fill="grey") +
ggtitle("Combat Power by Species") +
xlab("Species") +
ylab("Combat Power") +
theme_bw()
Try other kinds of plots, geom_violin and geom_dotplot…