Mpg is a dataset that comes with the ggplot2 package. Our goal is explore the structure of this dataset and see what we can learn from it. This exploration can be done with Spotfire on using R direclty, which is the approach we are going to take here.
First we examine the dimensions and structure of the dataset.
library('ggplot2') # needed for mpg dataset and later plotting
dim(mpg)
## [1] 234 11
str(mpg)
## 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: Factor w/ 15 levels "audi","chevrolet",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ model : Factor w/ 38 levels "4runner 4wd",..: 2 2 2 2 2 2 2 3 3 3 ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : Factor w/ 10 levels "auto(av)","auto(l3)",..: 4 9 10 1 4 9 1 9 4 10 ...
## $ drv : Factor w/ 3 levels "4","f","r": 2 2 2 2 2 2 2 1 1 1 ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : Factor w/ 5 levels "c","d","e","p",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ class : Factor w/ 7 levels "2seater","compact",..: 2 2 2 2 2 2 2 2 2 2 ...
qplot() function from the ggplot2 package allows for producing static plots. You can examine
qplot(hwy, data = mpg) + theme_bw()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(displ, hwy, data = mpg) + theme_bw()
qplot(displ, hwy, data = mpg, colour = factor(cyl)) + theme_bw()
qplot(displ, hwy, data = mpg, colour = drv) + theme_bw()
qplot(displ, hwy, data = mpg, colour = factor(cyl), size = cty) + theme_bw()
qplot(displ, hwy, data = mpg, colour = factor(cyl), size = cty) +
stat_smooth(method = "lm") + theme_bw()
?qplot
Explore the economics data set. To see the description type ?economics What are some of the interesting relationships among the numerical variables? Can you see any trends over time?
The first thing you will probably want to do is to read in data. Here we will simulate some data, write it out and the read it in.
# simulate some data
X <- matrix(rnorm(25), nrow = 5)
# ?matrix
# ?rnorm
X
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.2126 -0.3187 -0.04648 -0.2172 -0.553741
## [2,] 0.1559 0.6345 1.80849 1.4361 1.555234
## [3,] 0.8712 0.8077 0.08187 0.6804 -0.006837
## [4,] 0.8579 -0.2901 0.08815 -0.1529 0.261421
## [5,] 1.3025 0.4982 0.03750 0.0970 -0.189652
pairs(X, panel = panel.smooth)
heatmap(X) # ?heatmap
gender <- sample(c("M", "F"), 5, replace = TRUE) # ?sample
gender
## [1] "M" "M" "M" "F" "M"
X <- data.frame(X, gender)
colnames(X)[1:5] <- LETTERS[1:5]
class(X)
## [1] "data.frame"
str(X)
## 'data.frame': 5 obs. of 6 variables:
## $ A : num -0.213 0.156 0.871 0.858 1.303
## $ B : num -0.319 0.635 0.808 -0.29 0.498
## $ C : num -0.0465 1.8085 0.0819 0.0882 0.0375
## $ D : num -0.217 1.436 0.68 -0.153 0.097
## $ E : num -0.55374 1.55523 -0.00684 0.26142 -0.18965
## $ gender: Factor w/ 2 levels "F","M": 2 2 2 1 2
# write out CSV file
write.csv(X, "rand.csv", row.names = FALSE)
# reading the CSV file use read.csv
read.csv("rand.csv", stringsAsFactors = FALSE)
## A B C D E gender
## 1 -0.2126 -0.3187 -0.04648 -0.2172 -0.553741 M
## 2 0.1559 0.6345 1.80849 1.4361 1.555234 M
## 3 0.8712 0.8077 0.08187 0.6804 -0.006837 M
## 4 0.8579 -0.2901 0.08815 -0.1529 0.261421 F
## 5 1.3025 0.4982 0.03750 0.0970 -0.189652 M
Take a look at iris data set. Do you see any clusteting patterns? Try using heatmap function to find the clusters.
qplot(Sepal.Length, Sepal.Width, colour = Species, data = iris) + theme_bw()
Most R functions work with vectors and matrixes (or other rectangular data structures). For example:
x <- 10
y <- 5
x * y
## [1] 50
x <- 1:10
y <- 11:20
x; y
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 11 12 13 14 15 16 17 18 19 20
# what do you expect from the following?
x * y
## [1] 11 24 39 56 75 96 119 144 171 200
# %*% is a matrix multiplication operator
# what do you expect from the following?
x %*% y
## [,1]
## [1,] 935
if (sum(x * y) == x %*% y) {
cat("Sum of the pairwise multiplication equals to dot product")
} else {
cat("Something is seriously wrong with the universe")
}
## Sum of the pairwise multiplication equals to dot product
# now x and y are different lengths.
# what do you expect from the following?
y <- 1:5
x ; y
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 1 2 3 4 5
# notice the recyling "feature" of R
x * y
## [1] 1 4 9 16 25 6 14 24 36 50
# this obviously should fail as a matrix operation
# x %*% y
## Error in x %*% y : non-conformable arguments
# matrix inverse is available through solve()
# which linkes to the LAPACK routines DGESV and ZGESV
m <- matrix(sample(25), nrow = 5)
m
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 23 10 17 14
## [2,] 19 25 2 22 7
## [3,] 21 1 15 20 3
## [4,] 24 12 13 11 16
## [5,] 9 8 4 6 18
t(m) # transpose
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 19 21 24 9
## [2,] 23 25 1 12 8
## [3,] 10 2 15 13 4
## [4,] 17 22 20 11 6
## [5,] 14 7 3 16 18
solve(m)
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.044322 0.02371 -0.009286 0.04531 -0.01347
## [2,] 0.029042 0.01215 -0.045558 0.04739 -0.06184
## [3,] 0.059292 -0.05843 0.004562 0.05846 -0.07612
## [4,] 0.001269 0.01996 0.057921 -0.08977 0.06139
## [5,] -0.004345 -0.01092 0.004570 -0.02678 0.08623
# was this really an inverse?
# how would you check?
m == solve(solve(m))
## [,1] [,2] [,3] [,4] [,5]
## [1,] FALSE FALSE FALSE TRUE TRUE
## [2,] TRUE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE
## [4,] TRUE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE
m - solve(solve(m))
## [,1] [,2] [,3] [,4] [,5]
## [1,] -5.329e-15 3.553e-15 -1.776e-15 0.000e+00 0.000e+00
## [2,] 0.000e+00 3.553e-15 2.442e-15 -3.553e-15 5.329e-15
## [3,] -1.066e-14 -1.177e-14 -3.553e-15 -1.066e-14 -5.773e-15
## [4,] 0.000e+00 -3.553e-15 3.553e-15 -3.553e-15 3.553e-15
## [5,] 3.553e-15 8.882e-16 2.665e-15 8.882e-16 7.105e-15
m %*% solve(m) # should by identity
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000e+00 -2.776e-17 -1.943e-16 2.220e-16 -2.22e-16
## [2,] 1.041e-16 1.000e+00 -1.527e-16 4.441e-16 0.00e+00
## [3,] -1.492e-16 -5.551e-17 1.000e+00 -2.776e-17 -1.11e-16
## [4,] -1.388e-17 0.000e+00 0.000e+00 1.000e+00 0.00e+00
## [5,] -1.388e-17 0.000e+00 2.776e-17 -1.110e-16 1.00e+00
For an extremely robust sparse matrix math, see Matrix package.
The most common type of data structure in a data frame, which is like a matrix, but may contain columns of different types. In that sense it like an Excel spreadsheet.
d.f <- data.frame(poison = rpois(20, 1),
category = sample(c("M", "F"), 20, replace = TRUE))
dim(d.f)
## [1] 20 2
class(d.f)
## [1] "data.frame"
Perhaps the most flexible type of data structure is a list. A list can contain other data structures of arbitrary types. Lists are often used when a function needs to return lots of non-conforming data structures (e.g. regression coefficients and residuals)
l <- list(vec = 1:10, mat = m, df = d.f)
class(l)
## [1] "list"
str(l)
## List of 3
## $ vec: int [1:10] 1 2 3 4 5 6 7 8 9 10
## $ mat: int [1:5, 1:5] 5 19 21 24 9 23 25 1 12 8 ...
## $ df :'data.frame': 20 obs. of 2 variables:
## ..$ poison : int [1:20] 2 1 0 1 2 1 1 1 1 0 ...
## ..$ category: Factor w/ 2 levels "F","M": 1 2 2 1 2 2 1 1 1 2 ...
dim(l)
## NULL
l
## $vec
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $mat
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 23 10 17 14
## [2,] 19 25 2 22 7
## [3,] 21 1 15 20 3
## [4,] 24 12 13 11 16
## [5,] 9 8 4 6 18
##
## $df
## poison category
## 1 2 F
## 2 1 M
## 3 0 M
## 4 1 F
## 5 2 M
## 6 1 M
## 7 1 F
## 8 1 F
## 9 1 F
## 10 0 M
## 11 1 M
## 12 1 F
## 13 1 F
## 14 0 M
## 15 3 M
## 16 4 M
## 17 1 F
## 18 1 M
## 19 2 M
## 20 0 M
Often times we don’t need the entire Matrix, Data Frame or List. In this section we will look at some ways of getting at individual elements.
# Here is our normal matrix m
m # same as m[, ]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 23 10 17 14
## [2,] 19 25 2 22 7
## [3,] 21 1 15 20 3
## [4,] 24 12 13 11 16
## [5,] 9 8 4 6 18
# first row of m
m[1, ]
## [1] 5 23 10 17 14
# first two rows of m
m[1:2, ]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 23 10 17 14
## [2,] 19 25 2 22 7
# last two columns of m
# figure out why this works...
m[, c(ncol(m), ncol(m) -1)]
## [,1] [,2]
## [1,] 14 17
## [2,] 7 22
## [3,] 3 20
## [4,] 16 11
## [5,] 18 6
You can also compute on rows and columns.
rowSums(m)
## [1] 69 75 60 76 45
colSums(m)
## [1] 78 69 44 76 58
# What if you wanted to compute average per row or columns?
# ?apply
# Notice that I can match arguments by name (order does not matter) and by position (order does matter)
apply(X = m, MARGIN = 1, FUN = mean)
## [1] 13.8 15.0 12.0 15.2 9.0
apply(m, 2, mean)
## [1] 15.6 13.8 8.8 15.2 11.6
apply(m, 2, sd)
## [1] 8.173 10.134 5.630 6.611 6.348
Apply is a very flexible funtion and can take your own FUN as an argument. In general R functions can take functions as arguments and return functions as return values, wihch is called a closure.
Subsetting data frames can be done with $ operator.
d.f$poison
## [1] 2 1 0 1 2 1 1 1 1 0 1 1 1 0 3 4 1 1 2 0
# you can still use numeric substripts
d.f$poison == d.f[, 1]
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE TRUE
# this is safer
all.equal(d.f$poison, d.f[, 1])
## [1] TRUE
subset(d.f, poison > 0, select = category)
## category
## 1 F
## 2 M
## 4 F
## 5 M
## 6 M
## 7 F
## 8 F
## 9 F
## 11 M
## 12 F
## 13 F
## 15 M
## 16 M
## 17 F
## 18 M
## 19 M
subset(d.f, poison > 0, select = c(poison, category))
## poison category
## 1 2 F
## 2 1 M
## 4 1 F
## 5 2 M
## 6 1 M
## 7 1 F
## 8 1 F
## 9 1 F
## 11 1 M
## 12 1 F
## 13 1 F
## 15 3 M
## 16 4 M
## 17 1 F
## 18 1 M
## 19 2 M
Your Turn
Look at the ?transform function and ?mtcars dataset
Plyr is a great package for performing computations on grouped data. There is a new package called dplyr which you should explore, but here I demonstrare the simple use of plyr.
library("plyr")
mt.new <- transform(mtcars, hp.weight = hp/wt)
ddply(mt.new, .(cyl), summarise,
mean = mean(hp.weight),
sd = sd(hp.weight))
## cyl mean sd
## 1 4 37.93 13.78
## 2 6 39.93 10.85
## 3 8 53.86 17.08
qplot(factor(cyl), hp.weight, geom = "boxplot", data = mt.new) + theme_bw()
Time series (and other) processing often requires reshaping data. In excel this is called pivoting. Here we will take a look at the reshape2 package. First take a look at the ?melt function in the reshape2 package and see if you can figure out what it does.
library("reshape2")
str(economics)
## 'data.frame': 478 obs. of 6 variables:
## $ date : Date, format: "1967-06-30" "1967-07-31" ...
## $ pce : num 508 511 517 513 518 ...
## $ pop : int 198712 198911 199113 199311 199498 199657 199808 199920 200056 200208 ...
## $ psavert : num 9.8 9.8 9 9.8 9.7 9.4 9 9.5 8.9 9.6 ...
## $ uempmed : num 4.5 4.7 4.6 4.9 4.7 4.8 5.1 4.5 4.1 4.6 ...
## $ unemploy: int 2944 2945 2958 3143 3066 3018 2878 3001 2877 2709 ...
head(economics)
## date pce pop psavert uempmed unemploy
## 1 1967-06-30 507.8 198712 9.8 4.5 2944
## 2 1967-07-31 510.9 198911 9.8 4.7 2945
## 3 1967-08-31 516.7 199113 9.0 4.6 2958
## 4 1967-09-30 513.3 199311 9.8 4.9 3143
## 5 1967-10-31 518.5 199498 9.7 4.7 3066
## 6 1967-11-30 526.2 199657 9.4 4.8 3018
econ <- melt(economics, id = "date")
str(econ)
## 'data.frame': 2390 obs. of 3 variables:
## $ date : Date, format: "1967-06-30" "1967-07-31" ...
## $ variable: Factor w/ 5 levels "pce","pop","psavert",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ value : num 508 511 517 513 518 ...
head(econ)
## date variable value
## 1 1967-06-30 pce 507.8
## 2 1967-07-31 pce 510.9
## 3 1967-08-31 pce 516.7
## 4 1967-09-30 pce 513.3
## 5 1967-10-31 pce 518.5
## 6 1967-11-30 pce 526.2
qplot(date, value, colour = variable, data = econ)
qplot(date, value, colour = variable, facets = . ~ variable, data = econ)
qplot(date, value, colour = variable, geom = "line", data = econ) +
facet_wrap(~ variable, scales = "free_y")
Suppose, I wanted to plot all the lines on the same chart, so I can compare the pattern. How would I do that? Take a look at the scale() function. Also some.date.frame[, -1] returns all but the first column.
scaled <- scale(economics[, -1])
econ <- data.frame(date = economics$date, scaled)
econ <- melt(econ, id = "date")
qplot(date, value, colour = variable, geom = "line", data = econ)
rang <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / diff(rng)
}
scaled <- apply(economics[, -1], 2, rang)
econ <- data.frame(date = economics$date, scaled)
econ <- melt(econ, id = "date")
qplot(date, value, colour = variable, geom = "line", data = econ)
# ?lm
mpg.lm <- lm(hwy ~ ., data = subset(mpg, select = c(-manufacturer, -model)))
str(mpg.lm)
## List of 13
## $ coefficients : Named num [1:26] -126.7859 -0.1884 0.0699 -0.1082 -0.225 ...
## ..- attr(*, "names")= chr [1:26] "(Intercept)" "displ" "year" "cyl" ...
## $ residuals : Named num [1:234] 2.064 -0.442 2.147 1.106 1.371 ...
## ..- attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
## $ effects : Named num [1:234] -358.57 -69.63 -10.61 -11.78 -1.59 ...
## ..- attr(*, "names")= chr [1:234] "(Intercept)" "displ" "year" "cyl" ...
## $ rank : int 26
## $ fitted.values: Named num [1:234] 26.9 29.4 28.9 28.9 24.6 ...
## ..- attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
## $ assign : int [1:26] 0 1 2 3 4 4 4 4 4 4 ...
## $ qr :List of 5
## ..$ qr : num [1:234, 1:26] -15.2971 0.0654 0.0654 0.0654 0.0654 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:234] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:26] "(Intercept)" "displ" "year" "cyl" ...
## .. ..- attr(*, "assign")= int [1:26] 0 1 2 3 4 4 4 4 4 4 ...
## .. ..- attr(*, "contrasts")=List of 4
## .. .. ..$ trans: chr "contr.treatment"
## .. .. ..$ drv : chr "contr.treatment"
## .. .. ..$ fl : chr "contr.treatment"
## .. .. ..$ class: chr "contr.treatment"
## ..$ qraux: num [1:26] 1.07 1.08 1.08 1.02 1.01 ...
## ..$ pivot: int [1:26] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ tol : num 1e-07
## ..$ rank : int 26
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 208
## $ contrasts :List of 4
## ..$ trans: chr "contr.treatment"
## ..$ drv : chr "contr.treatment"
## ..$ fl : chr "contr.treatment"
## ..$ class: chr "contr.treatment"
## $ xlevels :List of 4
## ..$ trans: chr [1:10] "auto(av)" "auto(l3)" "auto(l4)" "auto(l5)" ...
## ..$ drv : chr [1:3] "4" "f" "r"
## ..$ fl : chr [1:5] "c" "d" "e" "p" ...
## ..$ class: chr [1:7] "2seater" "compact" "midsize" "minivan" ...
## $ call : language lm(formula = hwy ~ ., data = subset(mpg, select = c(-manufacturer, -model)))
## $ terms :Classes 'terms', 'formula' length 3 hwy ~ displ + year + cyl + trans + drv + cty + fl + class
## .. ..- attr(*, "variables")= language list(hwy, displ, year, cyl, trans, drv, cty, fl, class)
## .. ..- attr(*, "factors")= int [1:9, 1:8] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:9] "hwy" "displ" "year" "cyl" ...
## .. .. .. ..$ : chr [1:8] "displ" "year" "cyl" "trans" ...
## .. ..- attr(*, "term.labels")= chr [1:8] "displ" "year" "cyl" "trans" ...
## .. ..- attr(*, "order")= int [1:8] 1 1 1 1 1 1 1 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(hwy, displ, year, cyl, trans, drv, cty, fl, class)
## .. ..- attr(*, "dataClasses")= Named chr [1:9] "numeric" "numeric" "numeric" "numeric" ...
## .. .. ..- attr(*, "names")= chr [1:9] "hwy" "displ" "year" "cyl" ...
## $ model :'data.frame': 234 obs. of 9 variables:
## ..$ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## ..$ displ: num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## ..$ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## ..$ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## ..$ trans: Factor w/ 10 levels "auto(av)","auto(l3)",..: 4 9 10 1 4 9 1 9 4 10 ...
## ..$ drv : Factor w/ 3 levels "4","f","r": 2 2 2 2 2 2 2 1 1 1 ...
## ..$ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## ..$ fl : Factor w/ 5 levels "c","d","e","p",..: 4 4 4 4 4 4 4 4 4 4 ...
## ..$ class: Factor w/ 7 levels "2seater","compact",..: 2 2 2 2 2 2 2 2 2 2 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 hwy ~ displ + year + cyl + trans + drv + cty + fl + class
## .. .. ..- attr(*, "variables")= language list(hwy, displ, year, cyl, trans, drv, cty, fl, class)
## .. .. ..- attr(*, "factors")= int [1:9, 1:8] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:9] "hwy" "displ" "year" "cyl" ...
## .. .. .. .. ..$ : chr [1:8] "displ" "year" "cyl" "trans" ...
## .. .. ..- attr(*, "term.labels")= chr [1:8] "displ" "year" "cyl" "trans" ...
## .. .. ..- attr(*, "order")= int [1:8] 1 1 1 1 1 1 1 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(hwy, displ, year, cyl, trans, drv, cty, fl, class)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:9] "numeric" "numeric" "numeric" "numeric" ...
## .. .. .. ..- attr(*, "names")= chr [1:9] "hwy" "displ" "year" "cyl" ...
## - attr(*, "class")= chr "lm"
summary(mpg.lm)
##
## Call:
## lm(formula = hwy ~ ., data = subset(mpg, select = c(-manufacturer,
## -model)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.434 -0.566 -0.072 0.603 2.909
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -126.7859 45.0283 -2.82 0.00534 **
## displ -0.1884 0.2152 -0.88 0.38239
## year 0.0699 0.0226 3.09 0.00227 **
## cyl -0.1082 0.1411 -0.77 0.44416
## transauto(l3) -0.2250 0.9830 -0.23 0.81916
## transauto(l4) 1.1150 0.5634 1.98 0.04912 *
## transauto(l5) 1.4867 0.5579 2.66 0.00831 **
## transauto(l6) 1.8330 0.7171 2.56 0.01129 *
## transauto(s4) 0.0183 0.8233 0.02 0.98229
## transauto(s5) 1.9106 0.8150 2.34 0.02001 *
## transauto(s6) 1.0889 0.5754 1.89 0.05982 .
## transmanual(m5) 1.1397 0.5614 2.03 0.04362 *
## transmanual(m6) 0.9097 0.5696 1.60 0.11177
## drvf 0.9644 0.2990 3.23 0.00146 **
## drvr 1.1779 0.3432 3.43 0.00072 ***
## cty 0.9512 0.0477 19.94 < 2e-16 ***
## fld -1.6453 1.2522 -1.31 0.19034
## fle -5.2199 1.2236 -4.27 3.0e-05 ***
## flp -3.3776 1.1583 -2.92 0.00393 **
## flr -3.6397 1.1356 -3.21 0.00156 **
## classcompact -1.5004 0.7026 -2.14 0.03391 *
## classmidsize -1.1462 0.6982 -1.64 0.10217
## classminivan -3.0181 0.8040 -3.75 0.00023 ***
## classpickup -4.6279 0.7203 -6.42 8.8e-10 ***
## classsubcompact -2.1500 0.6943 -3.10 0.00223 **
## classsuv -4.3160 0.6824 -6.32 1.5e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.09 on 208 degrees of freedom
## Multiple R-squared: 0.97, Adjusted R-squared: 0.967
## F-statistic: 270 on 25 and 208 DF, p-value: <2e-16
op <- par(mfrow = c(2, 2))
plot(mpg.lm)
## Warning: not plotting observations with leverage one:
## 107
## Warning: not plotting observations with leverage one:
## 107
par(op)