#Identify which variables in your data set are numeric, and which are categorical (factors).
require(ggplot2)
## Loading required package: ggplot2
str(mpg)
## 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: Factor w/ 15 levels "audi","chevrolet",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ model : Factor w/ 38 levels "4runner 4wd",..: 2 2 2 2 2 2 2 3 3 3 ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : Factor w/ 10 levels "auto(av)","auto(l3)",..: 4 9 10 1 4 9 1 9 4 10 ...
## $ drv : Factor w/ 3 levels "4","f","r": 2 2 2 2 2 2 2 1 1 1 ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : Factor w/ 5 levels "c","d","e","p",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ class : Factor w/ 7 levels "2seater","compact",..: 2 2 2 2 2 2 2 2 2 2 ...
summary(mpg)
## manufacturer model displ year
## dodge :37 caravan 2wd : 11 Min. :1.600 Min. :1999
## toyota :34 ram 1500 pickup 4wd: 10 1st Qu.:2.400 1st Qu.:1999
## volkswagen:27 civic : 9 Median :3.300 Median :2004
## ford :25 dakota pickup 4wd : 9 Mean :3.472 Mean :2004
## chevrolet :19 jetta : 9 3rd Qu.:4.600 3rd Qu.:2008
## audi :18 mustang : 9 Max. :7.000 Max. :2008
## (Other) :74 (Other) :177
## cyl trans drv cty hwy
## Min. :4.000 auto(l4) :83 4:103 Min. : 9.00 Min. :12.00
## 1st Qu.:4.000 manual(m5):58 f:106 1st Qu.:14.00 1st Qu.:18.00
## Median :6.000 auto(l5) :39 r: 25 Median :17.00 Median :24.00
## Mean :5.889 manual(m6):19 Mean :16.86 Mean :23.44
## 3rd Qu.:8.000 auto(s6) :16 3rd Qu.:19.00 3rd Qu.:27.00
## Max. :8.000 auto(l6) : 6 Max. :35.00 Max. :44.00
## (Other) :13
## fl class
## c: 1 2seater : 5
## d: 5 compact :47
## e: 8 midsize :41
## p: 52 minivan :11
## r:168 pickup :33
## subcompact:35
## suv :62
table(mpg$model)
##
## 4runner 4wd a4 a4 quattro
## 6 7 8
## a6 quattro altima c1500 suburban 2wd
## 3 6 5
## camry camry solara caravan 2wd
## 7 7 11
## civic corolla corvette
## 9 5 5
## dakota pickup 4wd durango 4wd expedition 2wd
## 9 7 3
## explorer 4wd f150 pickup 4wd forester awd
## 6 7 6
## grand cherokee 4wd grand prix gti
## 8 5 5
## impreza awd jetta k1500 tahoe 4wd
## 8 9 4
## land cruiser wagon 4wd malibu maxima
## 2 5 3
## mountaineer 4wd mustang navigator 2wd
## 4 9 3
## new beetle passat pathfinder 4wd
## 6 7 4
## ram 1500 pickup 4wd range rover sonata
## 10 4 7
## tiburon toyota tacoma 4wd
## 7 7
table(mpg$model, mpg$class)
##
## 2seater compact midsize minivan pickup subcompact
## 4runner 4wd 0 0 0 0 0 0
## a4 0 7 0 0 0 0
## a4 quattro 0 8 0 0 0 0
## a6 quattro 0 0 3 0 0 0
## altima 0 2 4 0 0 0
## c1500 suburban 2wd 0 0 0 0 0 0
## camry 0 0 7 0 0 0
## camry solara 0 7 0 0 0 0
## caravan 2wd 0 0 0 11 0 0
## civic 0 0 0 0 0 9
## corolla 0 5 0 0 0 0
## corvette 5 0 0 0 0 0
## dakota pickup 4wd 0 0 0 0 9 0
## durango 4wd 0 0 0 0 0 0
## expedition 2wd 0 0 0 0 0 0
## explorer 4wd 0 0 0 0 0 0
## f150 pickup 4wd 0 0 0 0 7 0
## forester awd 0 0 0 0 0 0
## grand cherokee 4wd 0 0 0 0 0 0
## grand prix 0 0 5 0 0 0
## gti 0 5 0 0 0 0
## impreza awd 0 4 0 0 0 4
## jetta 0 9 0 0 0 0
## k1500 tahoe 4wd 0 0 0 0 0 0
## land cruiser wagon 4wd 0 0 0 0 0 0
## malibu 0 0 5 0 0 0
## maxima 0 0 3 0 0 0
## mountaineer 4wd 0 0 0 0 0 0
## mustang 0 0 0 0 0 9
## navigator 2wd 0 0 0 0 0 0
## new beetle 0 0 0 0 0 6
## passat 0 0 7 0 0 0
## pathfinder 4wd 0 0 0 0 0 0
## ram 1500 pickup 4wd 0 0 0 0 10 0
## range rover 0 0 0 0 0 0
## sonata 0 0 7 0 0 0
## tiburon 0 0 0 0 0 7
## toyota tacoma 4wd 0 0 0 0 7 0
##
## suv
## 4runner 4wd 6
## a4 0
## a4 quattro 0
## a6 quattro 0
## altima 0
## c1500 suburban 2wd 5
## camry 0
## camry solara 0
## caravan 2wd 0
## civic 0
## corolla 0
## corvette 0
## dakota pickup 4wd 0
## durango 4wd 7
## expedition 2wd 3
## explorer 4wd 6
## f150 pickup 4wd 0
## forester awd 6
## grand cherokee 4wd 8
## grand prix 0
## gti 0
## impreza awd 0
## jetta 0
## k1500 tahoe 4wd 4
## land cruiser wagon 4wd 2
## malibu 0
## maxima 0
## mountaineer 4wd 4
## mustang 0
## navigator 2wd 3
## new beetle 0
## passat 0
## pathfinder 4wd 4
## ram 1500 pickup 4wd 0
## range rover 4
## sonata 0
## tiburon 0
## toyota tacoma 4wd 0
Plot mpg city for each manufacturer.
hist(mpg$cty)
Look at the same single numeric variable, in ggplot2.
qplot(cty, data = mpg)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
First, in base R:
plot(mpg$cyl, mpg$cty, xlab="# Cylinders",ylab="Miles Per Gallon", pch=19)
Look at the same scatterplot in ggplot2
ggplot(mpg, aes(cyl, cty)) + geom_point()
# Manufacturer vs. mpg city
ggplot(mpg, aes(x = manufacturer, y = cty)) + geom_boxplot() + coord_flip()
# City mpg by manufacturer
ggplot(mpg, aes(x = cty, fill = manufacturer)) + geom_bar()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
#City mpg by year, with manufacturer
ggplot(mpg, aes(x = year, y = cty, fill = manufacturer)) + geom_bar(stat="identity", position=position_dodge())