Load Dataset, Identify Variable Types

  1. Choose and load any R dataset (except for diamonds!) that has at least two numeric variables and at least two categorical variables. Identify which variables your data set are numeric, and which are categorical (factors).
#install.packages("ggplot2")
require(ggplot2)
## Loading required package: ggplot2
data(mpg)
head(mpg)
##   manufacturer model displ year cyl      trans drv cty hwy fl   class
## 1         audi    a4   1.8 1999   4   auto(l5)   f  18  29  p compact
## 2         audi    a4   1.8 1999   4 manual(m5)   f  21  29  p compact
## 3         audi    a4   2.0 2008   4 manual(m6)   f  20  31  p compact
## 4         audi    a4   2.0 2008   4   auto(av)   f  21  30  p compact
## 5         audi    a4   2.8 1999   6   auto(l5)   f  16  26  p compact
## 6         audi    a4   2.8 1999   6 manual(m5)   f  18  26  p compact
#Identify which variables in the set are numeric, and which are categorical (factors)
str(mpg)
## 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: Factor w/ 15 levels "audi","chevrolet",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ model       : Factor w/ 38 levels "4runner 4wd",..: 2 2 2 2 2 2 2 3 3 3 ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : Factor w/ 10 levels "auto(av)","auto(l3)",..: 4 9 10 1 4 9 1 9 4 10 ...
##  $ drv         : Factor w/ 3 levels "4","f","r": 2 2 2 2 2 2 2 1 1 1 ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : Factor w/ 5 levels "c","d","e","p",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ class       : Factor w/ 7 levels "2seater","compact",..: 2 2 2 2 2 2 2 2 2 2 ...

Generate Descriptive Statistics

  1. Generate summary level descriptive statistics: Show the mean, median, 25th and 75th quartiles, min, and max for each of the applicable variables in your data set.
#Generate summary level descriptive statistics
summary(mpg)
##      manufacturer                 model         displ            year     
##  dodge     :37    caravan 2wd        : 11   Min.   :1.600   Min.   :1999  
##  toyota    :34    ram 1500 pickup 4wd: 10   1st Qu.:2.400   1st Qu.:1999  
##  volkswagen:27    civic              :  9   Median :3.300   Median :2004  
##  ford      :25    dakota pickup 4wd  :  9   Mean   :3.472   Mean   :2004  
##  chevrolet :19    jetta              :  9   3rd Qu.:4.600   3rd Qu.:2008  
##  audi      :18    mustang            :  9   Max.   :7.000   Max.   :2008  
##  (Other)   :74    (Other)            :177                                 
##       cyl               trans    drv          cty             hwy       
##  Min.   :4.000   auto(l4)  :83   4:103   Min.   : 9.00   Min.   :12.00  
##  1st Qu.:4.000   manual(m5):58   f:106   1st Qu.:14.00   1st Qu.:18.00  
##  Median :6.000   auto(l5)  :39   r: 25   Median :17.00   Median :24.00  
##  Mean   :5.889   manual(m6):19           Mean   :16.86   Mean   :23.44  
##  3rd Qu.:8.000   auto(s6)  :16           3rd Qu.:19.00   3rd Qu.:27.00  
##  Max.   :8.000   auto(l6)  : 6           Max.   :35.00   Max.   :44.00  
##                  (Other)   :13                                          
##  fl             class   
##  c:  1   2seater   : 5  
##  d:  5   compact   :47  
##  e:  8   midsize   :41  
##  p: 52   minivan   :11  
##  r:168   pickup    :33  
##          subcompact:35  
##          suv       :62

Determine Frequencies

  1. Determine the frequency for one of the categorical variables.
#Determine frequency for one variable
table(mpg$manufacturer)
## 
##       audi  chevrolet      dodge       ford      honda    hyundai 
##         18         19         37         25          9         14 
##       jeep land rover    lincoln    mercury     nissan    pontiac 
##          8          4          3          4         13          5 
##     subaru     toyota volkswagen 
##         14         34         27
  1. Determine the frequency for one of the categorical variables, by a different categorical variable.
#Determine frequency for one variable by another
table(mpg$manufacturer, mpg$class)
##             
##              2seater compact midsize minivan pickup subcompact suv
##   audi             0      15       3       0      0          0   0
##   chevrolet        5       0       5       0      0          0   9
##   dodge            0       0       0      11     19          0   7
##   ford             0       0       0       0      7          9   9
##   honda            0       0       0       0      0          9   0
##   hyundai          0       0       7       0      0          7   0
##   jeep             0       0       0       0      0          0   8
##   land rover       0       0       0       0      0          0   4
##   lincoln          0       0       0       0      0          0   3
##   mercury          0       0       0       0      0          0   4
##   nissan           0       2       7       0      0          0   4
##   pontiac          0       0       5       0      0          0   0
##   subaru           0       4       0       0      0          4   6
##   toyota           0      12       7       0      7          0   8
##   volkswagen       0      14       7       0      0          6   0

Graph a Single Variable

  1. Create a graph for a single numeric variable.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Graph Two Variables

  1. Create a scatterplot of two numeric variables.