Are we able to predict the miles per gallon for a car based on attributes such as cylinders or horsepower?

In the US, there are more than 200 million passenger vehicles on the roads as we speak. The average price of gas to date for regular grade is hovering around 2.80. Premuium grade is averaging around 3.00. With a push for cleaner energy and hybrid technologies, an automaker might want to optimize certain components of a car to maximize the distance traveled on a single gallon of fuel. This concept could be attractive to customers who are enviormentally aware and want to save money on gas. This brings us to the question stated above. Can identify the relationship between miles per gallon and car components.

The data was collected from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition. The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes. (Quinlan, 1993)

Each case in our dataset represents the specific attributes belonging to a type of veichle. There are 398 observations in our dataset with 9 variables. This data is obervational because the data is collecting attributes and metrics pertaining to each car. Each row represents a different car model. Since the data is observational, it can not be used to determine causation.

The variables are broken down as follows: Response variable: mpg-continous value.

Explanatory: cylinders-discrete displacement-continous horsepower-continous weight-continous acceleration-continous model.year-discrete origin-discrete car.name-categorical

Data Source: description: https://archive.ics.uci.edu/ml/datasets/auto+mpg raw data: https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data

Data Preperation: Data preperation involves cleaning the data and turning it into a proper data frame for the scope of this study. Our data set is relativley clean. The only transformation needed is to map the arbitrary column names to the proper variable definitions. It is also necessary to search and fix any errors such as missing entries or strange characters.

auto <- read.table(url("https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"), header =FALSE)
head(auto)
##   V1 V2  V3    V4   V5   V6 V7 V8                        V9
## 1 18  8 307 130.0 3504 12.0 70  1 chevrolet chevelle malibu
## 2 15  8 350 165.0 3693 11.5 70  1         buick skylark 320
## 3 18  8 318 150.0 3436 11.0 70  1        plymouth satellite
## 4 16  8 304 150.0 3433 12.0 70  1             amc rebel sst
## 5 17  8 302 140.0 3449 10.5 70  1               ford torino
## 6 15  8 429 198.0 4341 10.0 70  1          ford galaxie 500

The data has no headers. We will need to do some manipulation in order to assign headers. Using the provided documentation, we can rename all the variables with the correct name. There are 9 variables, hence 9 columns we should rename.

names(auto) <- c("mpg", "cylinders", 
                 "displacement", 
                 " horsepower", 
                 "weight", 
                 "acceleration", 
                 "model year",
                 "origin", 
                 "car name")

Lets see if they changed correctly

names(auto)
## [1] "mpg"          "cylinders"    "displacement" " horsepower" 
## [5] "weight"       "acceleration" "model year"   "origin"      
## [9] "car name"

The next step is to convert this into a proper data frame that we can use in our downstream analysis

auto.df<-data.frame(auto)
head(auto.df)
##   mpg cylinders displacement X.horsepower weight acceleration model.year
## 1  18         8          307        130.0   3504         12.0         70
## 2  15         8          350        165.0   3693         11.5         70
## 3  18         8          318        150.0   3436         11.0         70
## 4  16         8          304        150.0   3433         12.0         70
## 5  17         8          302        140.0   3449         10.5         70
## 6  15         8          429        198.0   4341         10.0         70
##   origin                  car.name
## 1      1 chevrolet chevelle malibu
## 2      1         buick skylark 320
## 3      1        plymouth satellite
## 4      1             amc rebel sst
## 5      1               ford torino
## 6      1          ford galaxie 500
names(auto.df)
## [1] "mpg"          "cylinders"    "displacement" "X.horsepower"
## [5] "weight"       "acceleration" "model.year"   "origin"      
## [9] "car.name"

From the looks of our data, we need to do some additional transformations. Horespower is written as a factor when it should be numeric instead.

auto.df$X.horsepower <- as.numeric(as.character(auto.df$X.horsepower))
## Warning: NAs introduced by coercion
head(auto.df)
##   mpg cylinders displacement X.horsepower weight acceleration model.year
## 1  18         8          307          130   3504         12.0         70
## 2  15         8          350          165   3693         11.5         70
## 3  18         8          318          150   3436         11.0         70
## 4  16         8          304          150   3433         12.0         70
## 5  17         8          302          140   3449         10.5         70
## 6  15         8          429          198   4341         10.0         70
##   origin                  car.name
## 1      1 chevrolet chevelle malibu
## 2      1         buick skylark 320
## 3      1        plymouth satellite
## 4      1             amc rebel sst
## 5      1               ford torino
## 6      1          ford galaxie 500

We should check our data frame for missing values

colSums(is.na(auto.df)|auto.df == '')
##          mpg    cylinders displacement X.horsepower       weight 
##            0            0            0            6            0 
## acceleration   model.year       origin     car.name 
##            0            0            0            0
colSums(is.na(auto.df)|auto.df == '?')
##          mpg    cylinders displacement X.horsepower       weight 
##            0            0            0            6            0 
## acceleration   model.year       origin     car.name 
##            0            0            0            0
colSums(is.na(auto.df)|auto.df == 'NA')
##          mpg    cylinders displacement X.horsepower       weight 
##            0            0            0            6            0 
## acceleration   model.year       origin     car.name 
##            0            0            0            0

We can see that horsepower is missing 6 entries. We have the option of removing those entries or using na.rm = TRUE to proceed with the calculation while ignoring those empty values. If we build a model, the lm function automatically ignores the missing entries, however there could be an effect on our EDA and correlation analysis later on. We will remove the rows with the missing data.

auto.df<-na.omit(auto.df)
nrow(auto.df)
## [1] 392

The data has been validated.

Exploratory Data Analysis: In this step, we take a look at the statistics and distributions of our variables. If there are any outliers, they will be identified through the use of normality correlation plots and histograms.

summary(auto.df)
##       mpg          cylinders      displacement    X.horsepower  
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0  
##                                                                 
##      weight      acceleration     model.year        origin     
##  Min.   :1613   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
##  1st Qu.:2225   1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000  
##  Median :2804   Median :15.50   Median :76.00   Median :1.000  
##  Mean   :2978   Mean   :15.54   Mean   :75.98   Mean   :1.577  
##  3rd Qu.:3615   3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000  
##  Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000  
##                                                                
##                car.name  
##  amc matador       :  5  
##  ford pinto        :  5  
##  toyota corolla    :  5  
##  amc gremlin       :  4  
##  amc hornet        :  4  
##  chevrolet chevette:  4  
##  (Other)           :365

The basic summary provides a list of descriptive statistics for each of the variables. It should be noted that car.name is a categorical value, hence the list provides a top level frequency of each car. The model year is a date column and can not be quantified the same way you would weight or acceleration. Before modeling the data, year, origin, and far name should be converted to factor.

What is the distribution of car names by year?

table(auto.df$model.year)
## 
## 70 71 72 73 74 75 76 77 78 79 80 81 82 
## 29 27 28 40 26 30 34 28 36 29 27 28 30
library(ggplot2)
bar_plt <- ggplot(auto.df, aes(x = model.year)) 
bar_plt <- bar_plt + geom_bar()
bar_plt

summary(bar_plt)
## data: mpg, cylinders, displacement, X.horsepower, weight,
##   acceleration, model.year, origin, car.name [392x9]
## mapping:  x = model.year
## faceting: <ggproto object: Class FacetNull, Facet>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map: function
##     map_data: function
##     params: list
##     render_back: function
##     render_front: function
##     render_panels: function
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train: function
##     train_positions: function
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet>
## -----------------------------------
## geom_bar: width = NULL, na.rm = FALSE
## stat_count: width = NULL, na.rm = FALSE
## position_stack

There are more cars from the year 1973 than any other year within our dataset.

The task at hand is going to be to construct a predictive model of mpg. Lets look at the distribution of our response variable

x <- auto.df$mpg 
h<-hist(x, breaks=10, col="red", xlab="Miles Per Gallon", 
    main="Histogram with Normal Curve") 
xfit<-seq(min(x),max(x),length=40) 
yfit<-dnorm(xfit,mean=mean(x),sd=sd(x)) 
yfit <- yfit*diff(h$mids[1:2])*length(x) 
lines(xfit, yfit, col="blue", lwd=2)

There is a right skew in the distribution of miles per gallon.

The ggqqplot visualizes the correlation between a normal distribution and the response variable.

library(ggpubr)
## Loading required package: magrittr
ggqqplot(auto.df$mpg)

What about the distributions of the other variables?

library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
auto.df %>% gather() %>% head()
## Warning: attributes are not identical across measure variables;
## they will be dropped
##   key value
## 1 mpg    18
## 2 mpg    15
## 3 mpg    18
## 4 mpg    16
## 5 mpg    17
## 6 mpg    15
ggplot(gather(mtcars), aes(value)) + 
    geom_histogram(bins = 10) + 
    facet_wrap(~key, scales = 'free_x')

The data contains some outliers but we won’t know how they will affect our model.

We are going to create an alternate data set without the car name, year, and origin variables. This data frame will be used if we can show they have high p values in the model. My hypothesis is that these variables with multiple levels are not significant. If they are, they will need to be recoded into dummy variables.

auto.df2 <- subset(auto.df, select = c(mpg, cylinders, displacement, X.horsepower, weight,acceleration))
names(auto.df2)
## [1] "mpg"          "cylinders"    "displacement" "X.horsepower"
## [5] "weight"       "acceleration"

Inference: Given that the scope of our research question implies linear regression, we can use hypothesis testing to test the significance of the slope. We can also use p-values to measure if a predictor is staistically significant.

We need to check the following conditions before running our regression model: -Observations are independent -Residuals follow a normal and unimodal distribution -Variance is constant -minimum sample size

Lets build a simple linear regression model

names(auto.df)
## [1] "mpg"          "cylinders"    "displacement" "X.horsepower"
## [5] "weight"       "acceleration" "model.year"   "origin"      
## [9] "car.name"
mod <- lm(mpg ~ cylinders+displacement+X.horsepower+weight+acceleration+factor(model.year)+factor(origin)+factor(car.name), data=auto.df)
summary(mod)
## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + X.horsepower + 
##     weight + acceleration + factor(model.year) + factor(origin) + 
##     factor(car.name), data = auto.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.438  0.000  0.000  0.000  4.438 
## 
## Coefficients: (2 not defined because of singularities)
##                                                       Estimate Std. Error
## (Intercept)                                          46.610073   7.068042
## cylinders                                            -0.580020   0.625230
## displacement                                          0.003575   0.015665
## X.horsepower                                         -0.068892   0.029038
## weight                                               -0.003098   0.001295
## acceleration                                         -0.447821   0.169743
## factor(model.year)71                                 -1.098130   1.125081
## factor(model.year)72                                 -2.068422   1.308683
## factor(model.year)73                                 -1.438144   1.143206
## factor(model.year)74                                 -0.438919   1.196102
## factor(model.year)75                                 -0.544869   1.255016
## factor(model.year)76                                  0.859332   1.250710
## factor(model.year)77                                  2.627774   1.647629
## factor(model.year)78                                  2.353744   1.529217
## factor(model.year)79                                  4.411243   1.548543
## factor(model.year)80                                  5.733644   1.522829
## factor(model.year)81                                  3.656306   1.648425
## factor(model.year)82                                  5.936569   1.833643
## factor(origin)2                                       0.004684   4.275572
## factor(origin)3                                       1.288448   4.000627
## factor(car.name)amc ambassador dpl                    0.458257   3.234406
## factor(car.name)amc ambassador sst                    2.870538   3.297935
## factor(car.name)amc concord                          -2.412820   3.242182
## factor(car.name)amc concord d/l                      -2.713430   3.444838
## factor(car.name)amc concord dl 6                     -3.705663   3.522730
## factor(car.name)amc gremlin                          -1.829932   2.899962
## factor(car.name)amc hornet                           -1.281917   2.848157
## factor(car.name)amc hornet sportabout (sw)           -2.154815   3.448999
## factor(car.name)amc matador                          -0.969004   2.599101
## factor(car.name)amc matador (sw)                      1.922215   2.877792
## factor(car.name)amc pacer                            -0.654219   3.488473
## factor(car.name)amc pacer d/l                        -3.004424   3.462125
## factor(car.name)amc rebel sst                        -0.714356   3.303082
## factor(car.name)amc spirit dl                        -1.233996   4.099426
## factor(car.name)audi 100 ls                          -0.455856   3.497243
## factor(car.name)audi 100ls                           -1.140140   2.991750
## factor(car.name)audi 4000                             3.152045   3.341216
## factor(car.name)audi 5000                            -3.253714   3.612119
## factor(car.name)audi 5000s (diesel)                   9.185062   3.711111
## factor(car.name)audi fox                              4.774998   3.362683
## factor(car.name)bmw 2002                              1.575782   3.400972
## factor(car.name)bmw 320i                             -4.490515   3.476468
## factor(car.name)buick century                         1.347576   3.061228
## factor(car.name)buick century 350                     1.795690   3.081135
## factor(car.name)buick century limited                -0.668198   3.699729
## factor(car.name)buick century luxus (sw)              1.601505   3.390666
## factor(car.name)buick century special                -0.929732   3.500536
## factor(car.name)buick electra 225 custom              5.605478   3.348329
## factor(car.name)buick estate wagon (sw)               0.033411   2.892797
## factor(car.name)buick lesabre custom                  2.517374   3.503332
## factor(car.name)buick opel isuzu deluxe               1.500279   4.220332
## factor(car.name)buick regal sport coupe (turbo)      -0.569644   3.597101
## factor(car.name)buick skyhawk                         1.298721   3.411399
## factor(car.name)buick skylark                        -0.631992   3.312454
## factor(car.name)buick skylark 320                    -0.263920   3.264576
## factor(car.name)buick skylark limited                 0.795482   3.967537
## factor(car.name)cadillac eldorado                     3.852565   3.482836
## factor(car.name)cadillac seville                      3.807026   3.237425
## factor(car.name)capri ii                              0.328314   3.962426
## factor(car.name)chevroelt chevelle malibu             0.111644   3.450577
## factor(car.name)chevrolet bel air                     2.336748   3.283504
## factor(car.name)chevrolet camaro                     -0.680284   4.038331
## factor(car.name)chevrolet caprice classic            -0.910233   2.675588
## factor(car.name)chevrolet cavalier                    0.282606   4.108740
## factor(car.name)chevrolet cavalier 2-door             4.915548   4.128611
## factor(car.name)chevrolet cavalier wagon             -1.056791   4.119795
## factor(car.name)chevrolet chevelle concours (sw)     -0.078793   3.466029
## factor(car.name)chevrolet chevelle malibu            -0.832776   3.041773
## factor(car.name)chevrolet chevelle malibu classic     0.367151   2.933783
## factor(car.name)chevrolet chevette                    1.681470   3.583181
## factor(car.name)chevrolet citation                   -0.419580   3.181375
## factor(car.name)chevrolet concours                   -3.324846   3.570030
## factor(car.name)chevrolet impala                      1.921867   2.695285
## factor(car.name)chevrolet malibu                     -0.924037   2.851420
## factor(car.name)chevrolet malibu classic (sw)        -1.639315   3.344897
## factor(car.name)chevrolet monte carlo                -2.161049   3.581815
## factor(car.name)chevrolet monte carlo landau         -0.166686   2.868715
## factor(car.name)chevrolet monte carlo s               1.673179   3.141051
## factor(car.name)chevrolet monza 2+2                   1.240000   3.403149
## factor(car.name)chevrolet nova                       -0.960732   2.879547
## factor(car.name)chevrolet nova custom                -1.481007   3.342403
## factor(car.name)chevrolet vega                       -0.082894   3.346658
## factor(car.name)chevrolet vega (sw)                  -0.764036   3.833681
## factor(car.name)chevrolet vega 2300                   4.462554   3.840431
## factor(car.name)chevrolet woody                      -0.265662   4.007361
## factor(car.name)chevy c10                            -3.155760   3.235302
## factor(car.name)chevy c20                             0.984218   3.733992
## factor(car.name)chevy s-10                            3.110915   4.079620
## factor(car.name)chrysler cordoba                      1.423136   3.398044
## factor(car.name)chrysler lebaron medallion           -3.944983   4.096102
## factor(car.name)chrysler lebaron salon               -5.967104   3.649350
## factor(car.name)chrysler lebaron town @ country (sw) -0.807395   3.336554
## factor(car.name)chrysler new yorker brougham          5.301060   3.244683
## factor(car.name)chrysler newport royal                4.054169   3.453370
## factor(car.name)datsun 1200                           8.073365   3.531351
## factor(car.name)datsun 200-sx                        -3.652343   3.447231
## factor(car.name)datsun 200sx                          4.857532   3.479252
## factor(car.name)datsun 210                            4.818350   3.144742
## factor(car.name)datsun 210 mpg                        6.745247   3.648269
## factor(car.name)datsun 280-zx                         5.160845   3.661593
## factor(car.name)datsun 310                            3.657176   3.565789
## factor(car.name)datsun 310 gx                         4.210247   3.662740
## factor(car.name)datsun 510                           -0.767180   3.466187
## factor(car.name)datsun 510 (sw)                       5.181998   3.081299
## factor(car.name)datsun 510 hatchback                  5.857917   3.434793
## factor(car.name)datsun 610                           -1.291833   3.332696
## factor(car.name)datsun 710                            3.100028   2.967600
## factor(car.name)datsun 810                           -3.671885   3.666013
## factor(car.name)datsun 810 maxima                    -0.873133   3.736495
## factor(car.name)datsun b-210                          3.858377   3.429683
## factor(car.name)datsun b210                           4.743133   3.437529
## factor(car.name)datsun b210 gx                       10.728306   3.577558
## factor(car.name)datsun f-10 hatchback                 3.360969   3.655933
## factor(car.name)datsun pl510                          0.778065   2.914699
## factor(car.name)dodge aries se                       -0.935174   4.102018
## factor(car.name)dodge aries wagon (sw)               -1.801079   3.966690
## factor(car.name)dodge aspen                          -3.020720   3.097876
## factor(car.name)dodge aspen 6                        -2.325022   3.462595
## factor(car.name)dodge aspen se                        1.332044   3.461709
## factor(car.name)dodge challenger se                  -1.111896   3.297987
## factor(car.name)dodge charger 2.2                     4.241200   4.230802
## factor(car.name)dodge colt                            0.866623   3.519722
## factor(car.name)dodge colt (sw)                       4.360399   4.051379
## factor(car.name)dodge colt hardtop                    2.140110   3.994070
## factor(car.name)dodge colt hatchback custom           4.540681   4.223860
## factor(car.name)dodge colt m/m                        5.498209   4.234797
## factor(car.name)dodge coronet brougham                1.169137   3.229949
## factor(car.name)dodge coronet custom                  0.963298   3.082123
## factor(car.name)dodge coronet custom (sw)             1.518419   3.315765
## factor(car.name)dodge d100                           -2.730599   3.181728
## factor(car.name)dodge d200                            1.980662   3.701672
## factor(car.name)dodge dart custom                    -0.879413   3.098135
## factor(car.name)dodge diplomat                        1.065860   3.347083
## factor(car.name)dodge magnum xe                       0.458522   3.372059
## factor(car.name)dodge monaco (sw)                     2.659064   3.548207
## factor(car.name)dodge monaco brougham                -1.285180   3.402414
## factor(car.name)dodge omni                            2.449294   4.114215
## factor(car.name)dodge rampage                        -0.618087   4.310288
## factor(car.name)dodge st. regis                      -1.346163   3.320131
## factor(car.name)fiat 124 sport coupe                  2.951145   3.371201
## factor(car.name)fiat 124 tc                           0.123599   3.319000
## factor(car.name)fiat 124b                             4.615014   3.362846
## factor(car.name)fiat 128                              1.126490   2.972799
## factor(car.name)fiat 131                              2.962392   3.353650
## factor(car.name)fiat strada custom                    6.203596   3.049850
## factor(car.name)fiat x1.9                             4.838327   3.326251
## factor(car.name)ford country                          2.338733   3.333462
## factor(car.name)ford country squire (sw)             -0.452600   2.974270
## factor(car.name)ford escort 2h                        2.724005   4.035268
## factor(car.name)ford escort 4w                        4.171038   4.124940
## factor(car.name)ford f108                            -3.247156   3.247384
## factor(car.name)ford f250                             2.120663   3.646426
## factor(car.name)ford fairmont                        -1.065384   3.846187
## factor(car.name)ford fairmont (auto)                 -3.882331   3.612800
## factor(car.name)ford fairmont (man)                  -0.659274   3.961234
## factor(car.name)ford fairmont 4                      -4.139283   3.968609
## factor(car.name)ford fairmont futura                 -4.169526   4.083734
## factor(car.name)ford fiesta                           5.677447   4.225095
## factor(car.name)ford futura                          -2.783320   3.392267
## factor(car.name)ford galaxie 500                      2.129003   2.790833
## factor(car.name)ford gran torino                      0.883333   2.643025
## factor(car.name)ford gran torino (sw)                 2.348892   3.073388
## factor(car.name)ford granada                         -1.971723   3.647983
## factor(car.name)ford granada ghia                    -1.033679   3.602789
## factor(car.name)ford granada gl                      -4.101758   3.595501
## factor(car.name)ford granada l                       -4.814812   3.629134
## factor(car.name)ford ltd                              1.711927   2.827027
## factor(car.name)ford ltd landau                      -3.433657   3.329578
## factor(car.name)ford maverick                        -1.625163   2.992871
## factor(car.name)ford mustang                         -2.645692   3.612786
## factor(car.name)ford mustang gl                      -2.173470   4.110140
## factor(car.name)ford mustang ii                      -5.426889   3.349188
## factor(car.name)ford mustang ii 2+2                  -0.176860   4.069293
## factor(car.name)ford pinto                           -0.691044   3.274359
## factor(car.name)ford pinto (sw)                      -0.148639   3.948794
## factor(car.name)ford pinto runabout                  -1.448262   3.933336
## factor(car.name)ford ranger                          -0.751886   4.083777
## factor(car.name)ford thunderbird                      0.334735   3.445723
## factor(car.name)ford torino                          -1.018288   3.375371
## factor(car.name)ford torino 500                      -0.692924   3.601058
## factor(car.name)hi 1200d                              2.182903   4.028367
## factor(car.name)honda accord                          1.482983   3.111864
## factor(car.name)honda accord cvcc                     2.247785   3.671804
## factor(car.name)honda accord lx                      -0.050239   3.534697
## factor(car.name)honda civic                           2.120830   2.889048
## factor(car.name)honda civic (auto)                   -2.106599   3.671790
## factor(car.name)honda civic 1300                      2.371248   3.654023
## factor(car.name)honda civic 1500 gl                   9.489215   3.629310
## factor(car.name)honda civic cvcc                      4.793059   3.235954
## factor(car.name)honda prelude                         2.544394   3.520752
## factor(car.name)maxda glc deluxe                      1.205888   3.521088
## factor(car.name)maxda rx3                            -8.144967   3.451589
## factor(car.name)mazda 626                             1.914702   3.062483
## factor(car.name)mazda glc                            14.010810   3.555376
## factor(car.name)mazda glc 4                           2.538858   3.577233
## factor(car.name)mazda glc custom                     -2.171359   3.677767
## factor(car.name)mazda glc custom l                    4.267715   3.670108
## factor(car.name)mazda glc deluxe                      3.008224   3.648355
## factor(car.name)mazda rx-4                           -5.522497   3.630137
## factor(car.name)mazda rx-7 gs                        -8.458701   3.545944
## factor(car.name)mazda rx2 coupe                      -5.394293   3.156458
## factor(car.name)mercedes benz 300d                    1.861017   3.671310
## factor(car.name)mercedes-benz 240d                    3.895868   3.804217
## factor(car.name)mercedes-benz 280s                   -0.515301   4.044002
## factor(car.name)mercury capri 2000                   -1.556693   3.933465
## factor(car.name)mercury capri v6                      0.052754   3.605405
## factor(car.name)mercury cougar brougham              -1.743801   3.504497
## factor(car.name)mercury grand marquis                -3.465885   3.376675
## factor(car.name)mercury lynx l                        4.575664   4.205518
## factor(car.name)mercury marquis                       3.172446   3.542515
## factor(car.name)mercury marquis brougham              4.065370   3.292699
## factor(car.name)mercury monarch                      -3.482720   3.672829
## factor(car.name)mercury monarch ghia                  1.163902   3.341757
## factor(car.name)mercury zephyr                       -2.554020   3.599498
## factor(car.name)mercury zephyr 6                     -5.187615   3.540574
## factor(car.name)nissan stanza xe                      3.303134   3.617371
## factor(car.name)oldsmobile cutlass ciera (diesel)    10.805451   3.788684
## factor(car.name)oldsmobile cutlass ls                 7.004061   3.836171
## factor(car.name)oldsmobile cutlass salon brougham     1.457991   3.126229
## factor(car.name)oldsmobile cutlass supreme            0.136603   3.652771
## factor(car.name)oldsmobile delta 88 royale            1.719332   3.467945
## factor(car.name)oldsmobile omega                     -2.106148   3.054222
## factor(car.name)oldsmobile omega brougham             0.703847   3.668086
## factor(car.name)oldsmobile starfire sx               -0.801864   3.902680
## factor(car.name)oldsmobile vista cruiser              2.152272   3.187821
## factor(car.name)opel 1900                             1.445993   2.936982
## factor(car.name)opel manta                            0.155672   2.920044
## factor(car.name)peugeot 304                           6.500826   3.544895
## factor(car.name)peugeot 504                           1.840662   3.046543
## factor(car.name)peugeot 504 (sw)                      2.299226   3.846923
## factor(car.name)peugeot 505s turbo diesel             4.297769   3.894477
## factor(car.name)peugeot 604sl                        -3.069394   4.022138
## factor(car.name)plymouth 'cuda 340                   -3.400215   3.322588
## factor(car.name)plymouth arrow gs                    -1.174123   4.140818
## factor(car.name)plymouth champ                        8.307983   4.175045
## factor(car.name)plymouth cricket                      2.541785   3.955480
## factor(car.name)plymouth custom suburb                3.131677   3.219542
## factor(car.name)plymouth duster                       1.291572   2.961011
## factor(car.name)plymouth fury                         1.389064   3.495274
## factor(car.name)plymouth fury gran sedan              2.283942   3.173934
## factor(car.name)plymouth fury iii                     2.139602   2.738068
## factor(car.name)plymouth grand fury                   4.199201   3.292967
## factor(car.name)plymouth horizon                      2.672234   4.254009
## factor(car.name)plymouth horizon 4                    4.252692   4.147774
## factor(car.name)plymouth horizon miser                4.904061   4.301034
## factor(car.name)plymouth horizon tc3                  3.578638   4.170031
## factor(car.name)plymouth reliant                     -0.087267   3.733031
## factor(car.name)plymouth sapporo                     -0.785708   3.880336
## factor(car.name)plymouth satellite                    0.797062   3.303457
## factor(car.name)plymouth satellite custom            -2.007981   3.457659
## factor(car.name)plymouth satellite custom (sw)        2.194657   3.374910
## factor(car.name)plymouth satellite sebring            0.319651   3.405947
## factor(car.name)plymouth valiant                      0.400953   2.988431
## factor(car.name)plymouth valiant custom              -0.568369   3.465722
## factor(car.name)plymouth volare                      -0.570898   3.524133
## factor(car.name)plymouth volare custom               -1.501452   3.632066
## factor(car.name)plymouth volare premier v8           -2.515757   3.186169
## factor(car.name)pontiac astro                         0.442144   3.884952
## factor(car.name)pontiac catalina                      4.079782   2.801472
## factor(car.name)pontiac catalina brougham             2.732790   3.402951
## factor(car.name)pontiac firebird                     -0.152092   3.490196
## factor(car.name)pontiac grand prix                    7.390006   3.270642
## factor(car.name)pontiac grand prix lj                 0.416345   3.392305
## factor(car.name)pontiac j2000 se hatchback            1.460406   4.167200
## factor(car.name)pontiac lemans v6                    -1.995651   3.456827
## factor(car.name)pontiac phoenix                       1.627794   3.639036
## factor(car.name)pontiac phoenix lj                   -0.326978   3.529031
## factor(car.name)pontiac safari (sw)                   4.050835   3.668934
## factor(car.name)pontiac sunbird coupe                -1.241984   4.044661
## factor(car.name)pontiac ventura sj                   -0.258740   3.406411
## factor(car.name)renault 12 (sw)                       3.025966   3.543116
## factor(car.name)renault 12tl                          0.875973   3.310852
## factor(car.name)renault 5 gtl                         6.773823   3.544359
## factor(car.name)saab 99e                              2.072410   3.576139
## factor(car.name)saab 99gle                           -1.869269   3.619208
## factor(car.name)saab 99le                             2.929333   3.061658
## factor(car.name)subaru                                1.465911   2.994744
## factor(car.name)subaru dl                             0.509323   3.193193
## factor(car.name)toyota carina                        -2.856086   3.406564
## factor(car.name)toyota celica gt                      1.064173   3.547685
## factor(car.name)toyota celica gt liftback            -6.347777   3.434481
## factor(car.name)toyota corolla                        1.352975   2.759800
## factor(car.name)toyota corolla 1200                   5.460644   3.137401
## factor(car.name)toyota corolla 1600 (sw)              3.100129   3.089850
## factor(car.name)toyota corolla liftback              -2.219221   3.590546
## factor(car.name)toyota corolla tercel                 5.118772   3.631206
## factor(car.name)toyota corona                         0.235029   2.708155
## factor(car.name)toyota corona hardtop                 0.628759   3.056478
## factor(car.name)toyota corona liftback               -0.451486   3.399771
## factor(car.name)toyota corona mark ii                -1.372378   3.312237
## factor(car.name)toyota cressida                      -0.657676   3.655870
## factor(car.name)toyota mark ii                       -1.884674   3.114890
## factor(car.name)toyota starlet                        6.583383   3.665668
## factor(car.name)toyota tercel                         6.516185   3.584358
## factor(car.name)toyouta corona mark ii (sw)                 NA         NA
## factor(car.name)triumph tr7 coupe                     5.104628   3.350756
## factor(car.name)vokswagen rabbit                     -3.708086   3.347441
## factor(car.name)volkswagen 1131 deluxe sedan         -0.607621   3.701184
## factor(car.name)volkswagen 411 (sw)                   0.416326   3.626280
## factor(car.name)volkswagen dasher                     0.432618   2.763535
## factor(car.name)volkswagen jetta                      2.914897   3.420386
## factor(car.name)volkswagen model 111                  1.780164   3.525165
## factor(car.name)volkswagen rabbit                     1.014133   2.880713
## factor(car.name)volkswagen rabbit custom             -0.392514   3.411232
## factor(car.name)volkswagen rabbit custom diesel      15.213942   3.657730
## factor(car.name)volkswagen rabbit l                   3.476692   3.456240
## factor(car.name)volkswagen scirocco                   2.261897   3.369848
## factor(car.name)volkswagen super beetle               1.410685   3.527149
## factor(car.name)volkswagen type 3                     1.653389   3.837305
## factor(car.name)volvo 144ea                          -0.747478   3.581731
## factor(car.name)volvo 145e (sw)                      -1.363662   3.749380
## factor(car.name)volvo 244dl                           0.185476   3.442409
## factor(car.name)volvo 245                            -1.802896   3.591857
## factor(car.name)volvo 264gl                          -4.642147   3.776667
## factor(car.name)volvo diesel                          7.192836   4.050601
## factor(car.name)vw dasher (diesel)                   14.203488   3.706085
## factor(car.name)vw pickup                            14.619086   3.846633
## factor(car.name)vw rabbit                             5.143980   2.871948
## factor(car.name)vw rabbit c (diesel)                 13.433388   3.587318
## factor(car.name)vw rabbit custom                            NA         NA
##                                                      t value Pr(>|t|)    
## (Intercept)                                            6.594 5.55e-09 ***
## cylinders                                             -0.928 0.356584    
## displacement                                           0.228 0.820098    
## X.horsepower                                          -2.372 0.020269 *  
## weight                                                -2.393 0.019251 *  
## acceleration                                          -2.638 0.010154 *  
## factor(model.year)71                                  -0.976 0.332222    
## factor(model.year)72                                  -1.581 0.118249    
## factor(model.year)73                                  -1.258 0.212350    
## factor(model.year)74                                  -0.367 0.714698    
## factor(model.year)75                                  -0.434 0.665441    
## factor(model.year)76                                   0.687 0.494183    
## factor(model.year)77                                   1.595 0.115001    
## factor(model.year)78                                   1.539 0.128025    
## factor(model.year)79                                   2.849 0.005682 ** 
## factor(model.year)80                                   3.765 0.000331 ***
## factor(model.year)81                                   2.218 0.029619 *  
## factor(model.year)82                                   3.238 0.001805 ** 
## factor(origin)2                                        0.001 0.999129    
## factor(origin)3                                        0.322 0.748314    
## factor(car.name)amc ambassador dpl                     0.142 0.887716    
## factor(car.name)amc ambassador sst                     0.870 0.386894    
## factor(car.name)amc concord                           -0.744 0.459114    
## factor(car.name)amc concord d/l                       -0.788 0.433400    
## factor(car.name)amc concord dl 6                      -1.052 0.296255    
## factor(car.name)amc gremlin                           -0.631 0.529972    
## factor(car.name)amc hornet                            -0.450 0.653964    
## factor(car.name)amc hornet sportabout (sw)            -0.625 0.534047    
## factor(car.name)amc matador                           -0.373 0.710346    
## factor(car.name)amc matador (sw)                       0.668 0.506245    
## factor(car.name)amc pacer                             -0.188 0.851753    
## factor(car.name)amc pacer d/l                         -0.868 0.388311    
## factor(car.name)amc rebel sst                         -0.216 0.829373    
## factor(car.name)amc spirit dl                         -0.301 0.764246    
## factor(car.name)audi 100 ls                           -0.130 0.896645    
## factor(car.name)audi 100ls                            -0.381 0.704225    
## factor(car.name)audi 4000                              0.943 0.348555    
## factor(car.name)audi 5000                             -0.901 0.370629    
## factor(car.name)audi 5000s (diesel)                    2.475 0.015613 *  
## factor(car.name)audi fox                               1.420 0.159808    
## factor(car.name)bmw 2002                               0.463 0.644486    
## factor(car.name)bmw 320i                              -1.292 0.200485    
## factor(car.name)buick century                          0.440 0.661070    
## factor(car.name)buick century 350                      0.583 0.561799    
## factor(car.name)buick century limited                 -0.181 0.857170    
## factor(car.name)buick century luxus (sw)               0.472 0.638083    
## factor(car.name)buick century special                 -0.266 0.791288    
## factor(car.name)buick electra 225 custom               1.674 0.098330 .  
## factor(car.name)buick estate wagon (sw)                0.012 0.990816    
## factor(car.name)buick lesabre custom                   0.719 0.474671    
## factor(car.name)buick opel isuzu deluxe                0.355 0.723235    
## factor(car.name)buick regal sport coupe (turbo)       -0.158 0.874603    
## factor(car.name)buick skyhawk                          0.381 0.704516    
## factor(car.name)buick skylark                         -0.191 0.849211    
## factor(car.name)buick skylark 320                     -0.081 0.935785    
## factor(car.name)buick skylark limited                  0.200 0.841641    
## factor(car.name)cadillac eldorado                      1.106 0.272242    
## factor(car.name)cadillac seville                       1.176 0.243387    
## factor(car.name)capri ii                               0.083 0.934189    
## factor(car.name)chevroelt chevelle malibu              0.032 0.974276    
## factor(car.name)chevrolet bel air                      0.712 0.478911    
## factor(car.name)chevrolet camaro                      -0.168 0.866684    
## factor(car.name)chevrolet caprice classic             -0.340 0.734670    
## factor(car.name)chevrolet cavalier                     0.069 0.945349    
## factor(car.name)chevrolet cavalier 2-door              1.191 0.237614    
## factor(car.name)chevrolet cavalier wagon              -0.257 0.798265    
## factor(car.name)chevrolet chevelle concours (sw)      -0.023 0.981924    
## factor(car.name)chevrolet chevelle malibu             -0.274 0.785017    
## factor(car.name)chevrolet chevelle malibu classic      0.125 0.900747    
## factor(car.name)chevrolet chevette                     0.469 0.640258    
## factor(car.name)chevrolet citation                    -0.132 0.895432    
## factor(car.name)chevrolet concours                    -0.931 0.354716    
## factor(car.name)chevrolet impala                       0.713 0.478059    
## factor(car.name)chevrolet malibu                      -0.324 0.746805    
## factor(car.name)chevrolet malibu classic (sw)         -0.490 0.625517    
## factor(car.name)chevrolet monte carlo                 -0.603 0.548128    
## factor(car.name)chevrolet monte carlo landau          -0.058 0.953822    
## factor(car.name)chevrolet monte carlo s                0.533 0.595850    
## factor(car.name)chevrolet monza 2+2                    0.364 0.716622    
## factor(car.name)chevrolet nova                        -0.334 0.739595    
## factor(car.name)chevrolet nova custom                 -0.443 0.658989    
## factor(car.name)chevrolet vega                        -0.025 0.980306    
## factor(car.name)chevrolet vega (sw)                   -0.199 0.842578    
## factor(car.name)chevrolet vega 2300                    1.162 0.248972    
## factor(car.name)chevrolet woody                       -0.066 0.947323    
## factor(car.name)chevy c10                             -0.975 0.332533    
## factor(car.name)chevy c20                              0.264 0.792834    
## factor(car.name)chevy s-10                             0.763 0.448156    
## factor(car.name)chrysler cordoba                       0.419 0.676568    
## factor(car.name)chrysler lebaron medallion            -0.963 0.338631    
## factor(car.name)chrysler lebaron salon                -1.635 0.106272    
## factor(car.name)chrysler lebaron town @ country (sw)  -0.242 0.809461    
## factor(car.name)chrysler new yorker brougham           1.634 0.106555    
## factor(car.name)chrysler newport royal                 1.174 0.244170    
## factor(car.name)datsun 1200                            2.286 0.025106 *  
## factor(car.name)datsun 200-sx                         -1.060 0.292818    
## factor(car.name)datsun 200sx                           1.396 0.166846    
## factor(car.name)datsun 210                             1.532 0.129739    
## factor(car.name)datsun 210 mpg                         1.849 0.068468 .  
## factor(car.name)datsun 280-zx                          1.409 0.162890    
## factor(car.name)datsun 310                             1.026 0.308408    
## factor(car.name)datsun 310 gx                          1.149 0.254060    
## factor(car.name)datsun 510                            -0.221 0.825443    
## factor(car.name)datsun 510 (sw)                        1.682 0.096832 .  
## factor(car.name)datsun 510 hatchback                   1.705 0.092302 .  
## factor(car.name)datsun 610                            -0.388 0.699407    
## factor(car.name)datsun 710                             1.045 0.299597    
## factor(car.name)datsun 810                            -1.002 0.319800    
## factor(car.name)datsun 810 maxima                     -0.234 0.815881    
## factor(car.name)datsun b-210                           1.125 0.264227    
## factor(car.name)datsun b210                            1.380 0.171800    
## factor(car.name)datsun b210 gx                         2.999 0.003690 ** 
## factor(car.name)datsun f-10 hatchback                  0.919 0.360916    
## factor(car.name)datsun pl510                           0.267 0.790254    
## factor(car.name)dodge aries se                        -0.228 0.820291    
## factor(car.name)dodge aries wagon (sw)                -0.454 0.651121    
## factor(car.name)dodge aspen                           -0.975 0.332690    
## factor(car.name)dodge aspen 6                         -0.671 0.504013    
## factor(car.name)dodge aspen se                         0.385 0.701494    
## factor(car.name)dodge challenger se                   -0.337 0.736963    
## factor(car.name)dodge charger 2.2                      1.002 0.319389    
## factor(car.name)dodge colt                             0.246 0.806194    
## factor(car.name)dodge colt (sw)                        1.076 0.285301    
## factor(car.name)dodge colt hardtop                     0.536 0.593688    
## factor(car.name)dodge colt hatchback custom            1.075 0.285864    
## factor(car.name)dodge colt m/m                         1.298 0.198202    
## factor(car.name)dodge coronet brougham                 0.362 0.718408    
## factor(car.name)dodge coronet custom                   0.313 0.755506    
## factor(car.name)dodge coronet custom (sw)              0.458 0.648338    
## factor(car.name)dodge d100                            -0.858 0.393546    
## factor(car.name)dodge d200                             0.535 0.594204    
## factor(car.name)dodge dart custom                     -0.284 0.777316    
## factor(car.name)dodge diplomat                         0.318 0.751045    
## factor(car.name)dodge magnum xe                        0.136 0.892209    
## factor(car.name)dodge monaco (sw)                      0.749 0.455986    
## factor(car.name)dodge monaco brougham                 -0.378 0.706715    
## factor(car.name)dodge omni                             0.595 0.553443    
## factor(car.name)dodge rampage                         -0.143 0.886366    
## factor(car.name)dodge st. regis                       -0.405 0.686313    
## factor(car.name)fiat 124 sport coupe                   0.875 0.384190    
## factor(car.name)fiat 124 tc                            0.037 0.970394    
## factor(car.name)fiat 124b                              1.372 0.174099    
## factor(car.name)fiat 128                               0.379 0.705823    
## factor(car.name)fiat 131                               0.883 0.379917    
## factor(car.name)fiat strada custom                     2.034 0.045530 *  
## factor(car.name)fiat x1.9                              1.455 0.150012    
## factor(car.name)ford country                           0.702 0.485134    
## factor(car.name)ford country squire (sw)              -0.152 0.879466    
## factor(car.name)ford escort 2h                         0.675 0.501748    
## factor(car.name)ford escort 4w                         1.011 0.315227    
## factor(car.name)ford f108                             -1.000 0.320603    
## factor(car.name)ford f250                              0.582 0.562622    
## factor(car.name)ford fairmont                         -0.277 0.782554    
## factor(car.name)ford fairmont (auto)                  -1.075 0.286043    
## factor(car.name)ford fairmont (man)                   -0.166 0.868271    
## factor(car.name)ford fairmont 4                       -1.043 0.300341    
## factor(car.name)ford fairmont futura                  -1.021 0.310577    
## factor(car.name)ford fiesta                            1.344 0.183137    
## factor(car.name)ford futura                           -0.820 0.414572    
## factor(car.name)ford galaxie 500                       0.763 0.447974    
## factor(car.name)ford gran torino                       0.334 0.739164    
## factor(car.name)ford gran torino (sw)                  0.764 0.447138    
## factor(car.name)ford granada                          -0.540 0.590478    
## factor(car.name)ford granada ghia                     -0.287 0.774982    
## factor(car.name)ford granada gl                       -1.141 0.257631    
## factor(car.name)ford granada l                        -1.327 0.188684    
## factor(car.name)ford ltd                               0.606 0.546662    
## factor(car.name)ford ltd landau                       -1.031 0.305778    
## factor(car.name)ford maverick                         -0.543 0.588754    
## factor(car.name)ford mustang                          -0.732 0.466290    
## factor(car.name)ford mustang gl                       -0.529 0.598522    
## factor(car.name)ford mustang ii                       -1.620 0.109409    
## factor(car.name)ford mustang ii 2+2                   -0.043 0.965450    
## factor(car.name)ford pinto                            -0.211 0.833431    
## factor(car.name)ford pinto (sw)                       -0.038 0.970075    
## factor(car.name)ford pinto runabout                   -0.368 0.713774    
## factor(car.name)ford ranger                           -0.184 0.854427    
## factor(car.name)ford thunderbird                       0.097 0.922874    
## factor(car.name)ford torino                           -0.302 0.763741    
## factor(car.name)ford torino 500                       -0.192 0.847938    
## factor(car.name)hi 1200d                               0.542 0.589527    
## factor(car.name)honda accord                           0.477 0.635081    
## factor(car.name)honda accord cvcc                      0.612 0.542299    
## factor(car.name)honda accord lx                       -0.014 0.988698    
## factor(car.name)honda civic                            0.734 0.465212    
## factor(car.name)honda civic (auto)                    -0.574 0.567893    
## factor(car.name)honda civic 1300                       0.649 0.518385    
## factor(car.name)honda civic 1500 gl                    2.615 0.010819 *  
## factor(car.name)honda civic cvcc                       1.481 0.142802    
## factor(car.name)honda prelude                          0.723 0.472151    
## factor(car.name)maxda glc deluxe                       0.342 0.732964    
## factor(car.name)maxda rx3                             -2.360 0.020925 *  
## factor(car.name)mazda 626                              0.625 0.533755    
## factor(car.name)mazda glc                              3.941 0.000182 ***
## factor(car.name)mazda glc 4                            0.710 0.480104    
## factor(car.name)mazda glc custom                      -0.590 0.556720    
## factor(car.name)mazda glc custom l                     1.163 0.248634    
## factor(car.name)mazda glc deluxe                       0.825 0.412281    
## factor(car.name)mazda rx-4                            -1.521 0.132448    
## factor(car.name)mazda rx-7 gs                         -2.385 0.019618 *  
## factor(car.name)mazda rx2 coupe                       -1.709 0.091648 .  
## factor(car.name)mercedes benz 300d                     0.507 0.613726    
## factor(car.name)mercedes-benz 240d                     1.024 0.309128    
## factor(car.name)mercedes-benz 280s                    -0.127 0.898951    
## factor(car.name)mercury capri 2000                    -0.396 0.693424    
## factor(car.name)mercury capri v6                       0.015 0.988365    
## factor(car.name)mercury cougar brougham               -0.498 0.620249    
## factor(car.name)mercury grand marquis                 -1.026 0.308037    
## factor(car.name)mercury lynx l                         1.088 0.280120    
## factor(car.name)mercury marquis                        0.896 0.373405    
## factor(car.name)mercury marquis brougham               1.235 0.220863    
## factor(car.name)mercury monarch                       -0.948 0.346094    
## factor(car.name)mercury monarch ghia                   0.348 0.728611    
## factor(car.name)mercury zephyr                        -0.710 0.480214    
## factor(car.name)mercury zephyr 6                      -1.465 0.147105    
## factor(car.name)nissan stanza xe                       0.913 0.364139    
## factor(car.name)oldsmobile cutlass ciera (diesel)      2.852 0.005628 ** 
## factor(car.name)oldsmobile cutlass ls                  1.826 0.071916 .  
## factor(car.name)oldsmobile cutlass salon brougham      0.466 0.642318    
## factor(car.name)oldsmobile cutlass supreme             0.037 0.970269    
## factor(car.name)oldsmobile delta 88 royale             0.496 0.621520    
## factor(car.name)oldsmobile omega                      -0.690 0.492611    
## factor(car.name)oldsmobile omega brougham              0.192 0.848359    
## factor(car.name)oldsmobile starfire sx                -0.205 0.837773    
## factor(car.name)oldsmobile vista cruiser               0.675 0.501682    
## factor(car.name)opel 1900                              0.492 0.623937    
## factor(car.name)opel manta                             0.053 0.957627    
## factor(car.name)peugeot 304                            1.834 0.070696 .  
## factor(car.name)peugeot 504                            0.604 0.547571    
## factor(car.name)peugeot 504 (sw)                       0.598 0.551879    
## factor(car.name)peugeot 505s turbo diesel              1.104 0.273362    
## factor(car.name)peugeot 604sl                         -0.763 0.447815    
## factor(car.name)plymouth 'cuda 340                    -1.023 0.309470    
## factor(car.name)plymouth arrow gs                     -0.284 0.777548    
## factor(car.name)plymouth champ                         1.990 0.050295 .  
## factor(car.name)plymouth cricket                       0.643 0.522471    
## factor(car.name)plymouth custom suburb                 0.973 0.333867    
## factor(car.name)plymouth duster                        0.436 0.663967    
## factor(car.name)plymouth fury                          0.397 0.692208    
## factor(car.name)plymouth fury gran sedan               0.720 0.474042    
## factor(car.name)plymouth fury iii                      0.781 0.437044    
## factor(car.name)plymouth grand fury                    1.275 0.206227    
## factor(car.name)plymouth horizon                       0.628 0.531828    
## factor(car.name)plymouth horizon 4                     1.025 0.308564    
## factor(car.name)plymouth horizon miser                 1.140 0.257878    
## factor(car.name)plymouth horizon tc3                   0.858 0.393564    
## factor(car.name)plymouth reliant                      -0.023 0.981412    
## factor(car.name)plymouth sapporo                      -0.202 0.840094    
## factor(car.name)plymouth satellite                     0.241 0.810004    
## factor(car.name)plymouth satellite custom             -0.581 0.563184    
## factor(car.name)plymouth satellite custom (sw)         0.650 0.517521    
## factor(car.name)plymouth satellite sebring             0.094 0.925481    
## factor(car.name)plymouth valiant                       0.134 0.893634    
## factor(car.name)plymouth valiant custom               -0.164 0.870180    
## factor(car.name)plymouth volare                       -0.162 0.871750    
## factor(car.name)plymouth volare custom                -0.413 0.680518    
## factor(car.name)plymouth volare premier v8            -0.790 0.432292    
## factor(car.name)pontiac astro                          0.114 0.909697    
## factor(car.name)pontiac catalina                       1.456 0.149540    
## factor(car.name)pontiac catalina brougham              0.803 0.424509    
## factor(car.name)pontiac firebird                      -0.044 0.965359    
## factor(car.name)pontiac grand prix                     2.259 0.026797 *  
## factor(car.name)pontiac grand prix lj                  0.123 0.902652    
## factor(car.name)pontiac j2000 se hatchback             0.350 0.726995    
## factor(car.name)pontiac lemans v6                     -0.577 0.565484    
## factor(car.name)pontiac phoenix                        0.447 0.655954    
## factor(car.name)pontiac phoenix lj                    -0.093 0.926429    
## factor(car.name)pontiac safari (sw)                    1.104 0.273131    
## factor(car.name)pontiac sunbird coupe                 -0.307 0.759654    
## factor(car.name)pontiac ventura sj                    -0.076 0.939658    
## factor(car.name)renault 12 (sw)                        0.854 0.395839    
## factor(car.name)renault 12tl                           0.265 0.792071    
## factor(car.name)renault 5 gtl                          1.911 0.059858 .  
## factor(car.name)saab 99e                               0.580 0.564005    
## factor(car.name)saab 99gle                            -0.516 0.607055    
## factor(car.name)saab 99le                              0.957 0.341794    
## factor(car.name)subaru                                 0.489 0.625939    
## factor(car.name)subaru dl                              0.160 0.873707    
## factor(car.name)toyota carina                         -0.838 0.404502    
## factor(car.name)toyota celica gt                       0.300 0.765047    
## factor(car.name)toyota celica gt liftback             -1.848 0.068562 .  
## factor(car.name)toyota corolla                         0.490 0.625412    
## factor(car.name)toyota corolla 1200                    1.740 0.085928 .  
## factor(car.name)toyota corolla 1600 (sw)               1.003 0.318973    
## factor(car.name)toyota corolla liftback               -0.618 0.538425    
## factor(car.name)toyota corolla tercel                  1.410 0.162829    
## factor(car.name)toyota corona                          0.087 0.931076    
## factor(car.name)toyota corona hardtop                  0.206 0.837580    
## factor(car.name)toyota corona liftback                -0.133 0.894713    
## factor(car.name)toyota corona mark ii                 -0.414 0.679827    
## factor(car.name)toyota cressida                       -0.180 0.857726    
## factor(car.name)toyota mark ii                        -0.605 0.546995    
## factor(car.name)toyota starlet                         1.796 0.076583 .  
## factor(car.name)toyota tercel                          1.818 0.073119 .  
## factor(car.name)toyouta corona mark ii (sw)               NA       NA    
## factor(car.name)triumph tr7 coupe                      1.523 0.131915    
## factor(car.name)vokswagen rabbit                      -1.108 0.271563    
## factor(car.name)volkswagen 1131 deluxe sedan          -0.164 0.870045    
## factor(car.name)volkswagen 411 (sw)                    0.115 0.908908    
## factor(car.name)volkswagen dasher                      0.157 0.876030    
## factor(car.name)volkswagen jetta                       0.852 0.396846    
## factor(car.name)volkswagen model 111                   0.505 0.615068    
## factor(car.name)volkswagen rabbit                      0.352 0.725807    
## factor(car.name)volkswagen rabbit custom              -0.115 0.908705    
## factor(car.name)volkswagen rabbit custom diesel        4.159 8.50e-05 ***
## factor(car.name)volkswagen rabbit l                    1.006 0.317733    
## factor(car.name)volkswagen scirocco                    0.671 0.504173    
## factor(car.name)volkswagen super beetle                0.400 0.690345    
## factor(car.name)volkswagen type 3                      0.431 0.667814    
## factor(car.name)volvo 144ea                           -0.209 0.835262    
## factor(car.name)volvo 145e (sw)                       -0.364 0.717117    
## factor(car.name)volvo 244dl                            0.054 0.957176    
## factor(car.name)volvo 245                             -0.502 0.617200    
## factor(car.name)volvo 264gl                           -1.229 0.222904    
## factor(car.name)volvo diesel                           1.776 0.079886 .  
## factor(car.name)vw dasher (diesel)                     3.832 0.000264 ***
## factor(car.name)vw pickup                              3.800 0.000294 ***
## factor(car.name)vw rabbit                              1.791 0.077364 .  
## factor(car.name)vw rabbit c (diesel)                   3.745 0.000355 ***
## factor(car.name)vw rabbit custom                          NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.149 on 74 degrees of freedom
## Multiple R-squared:  0.9857, Adjusted R-squared:  0.9242 
## F-statistic: 16.04 on 317 and 74 DF,  p-value: < 2.2e-16
mod <- lm(mpg ~ cylinders+displacement+X.horsepower+weight+acceleration+factor(model.year)+factor(origin), data=auto.df)
summary(mod)
## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + X.horsepower + 
##     weight + acceleration + factor(model.year) + factor(origin), 
##     data = auto.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.4288 -1.9194 -0.0287  1.7899 11.8399 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          37.0199278  2.1417790  17.285  < 2e-16 ***
## cylinders            -0.1924289  0.3040482  -0.633 0.527195    
## displacement          0.0172507  0.0072044   2.394 0.017139 *  
## X.horsepower         -0.0240199  0.0136328  -1.762 0.078904 .  
## weight               -0.0061203  0.0006494  -9.424  < 2e-16 ***
## acceleration          0.0543880  0.0919344   0.592 0.554480    
## factor(model.year)71  1.0461869  0.8730131   1.198 0.231538    
## factor(model.year)72  0.0330325  0.8531036   0.039 0.969134    
## factor(model.year)73 -0.5322929  0.7718143  -0.690 0.490835    
## factor(model.year)74  1.6545531  0.9129699   1.812 0.070750 .  
## factor(model.year)75  0.9415172  0.8953739   1.052 0.293695    
## factor(model.year)76  1.7486166  0.8573480   2.040 0.042100 *  
## factor(model.year)77  3.2399161  0.8759807   3.699 0.000249 ***
## factor(model.year)78  3.0821303  0.8333179   3.699 0.000249 ***
## factor(model.year)79  5.3812526  0.8791655   6.121 2.36e-09 ***
## factor(model.year)80  9.5116004  0.9339482  10.184  < 2e-16 ***
## factor(model.year)81  6.9070845  0.9223997   7.488 5.12e-13 ***
## factor(model.year)82  8.6173419  0.9031369   9.542  < 2e-16 ***
## factor(origin)2       2.5075851  0.5316558   4.717 3.40e-06 ***
## factor(origin)3       2.5002584  0.5225230   4.785 2.47e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.05 on 372 degrees of freedom
## Multiple R-squared:  0.8547, Adjusted R-squared:  0.8473 
## F-statistic: 115.2 on 19 and 372 DF,  p-value: < 2.2e-16
mod <- lm(mpg ~ cylinders+displacement+X.horsepower+weight+acceleration+factor(origin), data=auto.df)
summary(mod)
## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + X.horsepower + 
##     weight + acceleration + factor(origin), data = auto.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.6303  -2.8009  -0.2871   2.0945  14.8931 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     44.7687911  2.6398457  16.959  < 2e-16 ***
## cylinders       -0.5661876  0.4042069  -1.401 0.162100    
## displacement     0.0114270  0.0095737   1.194 0.233376    
## X.horsepower    -0.0613339  0.0168679  -3.636 0.000314 ***
## weight          -0.0048119  0.0008089  -5.948  6.1e-09 ***
## acceleration    -0.0319841  0.1232529  -0.259 0.795389    
## factor(origin)2  1.1255451  0.7015566   1.604 0.109458    
## factor(origin)3  2.9325397  0.6955675   4.216  3.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.161 on 384 degrees of freedom
## Multiple R-squared:  0.7209, Adjusted R-squared:  0.7158 
## F-statistic: 141.7 on 7 and 384 DF,  p-value: < 2.2e-16

Based on these three initial models, I think it is safe to say we can remove car name. Car name is directly tied to its own attributes. For example, a ford F-150 has itsown horsepower and weight while a Chevy Camaro has its own unique horsepower and weight. Using car name also produces an error stating that two coeffwere not defined because of singularities. This implies that there were factors that were perfectly correlated. (Not ideal for modeling)

We can also remove year. The year is not tied to attributes and there is reason to believe that it has no predictive power for mpg. A truck from 1970 could consume more fuel than a sedan from 1980. A truck from 1980 could consume more fuel than a sedan from 1970.

By inspection, a good portion of the factors for each of these variables have high p values.

We will use the reduced dataset from this point on.

mod <- lm(mpg ~ ., data=auto.df2)
summary(mod)
## 
## Call:
## lm(formula = mpg ~ ., data = auto.df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5816  -2.8618  -0.3404   2.2438  16.3416 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.626e+01  2.669e+00  17.331   <2e-16 ***
## cylinders    -3.979e-01  4.105e-01  -0.969   0.3330    
## displacement -8.313e-05  9.072e-03  -0.009   0.9927    
## X.horsepower -4.526e-02  1.666e-02  -2.716   0.0069 ** 
## weight       -5.187e-03  8.167e-04  -6.351    6e-10 ***
## acceleration -2.910e-02  1.258e-01  -0.231   0.8171    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.247 on 386 degrees of freedom
## Multiple R-squared:  0.7077, Adjusted R-squared:  0.7039 
## F-statistic: 186.9 on 5 and 386 DF,  p-value: < 2.2e-16

Check the normality assumption for residuals

hist(mod$residuals);

qqnorm(mod$residuals)
qqline(mod$residuals)

The QQ-plot and histogram indicate that residuals follow a close to normal distribution.

Constant Variance Check (Heteroscedasticity): We can use the Breusch Pagan Test for Heteroscedasticity Ho: The variance is constant Ha: the variance is not constant

library(olsrr)
## 
## Attaching package: 'olsrr'
## The following object is masked _by_ '.GlobalEnv':
## 
##     auto
## The following object is masked from 'package:datasets':
## 
##     rivers
ols_test_breusch_pagan(mod)
## 
##  Breusch Pagan Test for Heteroskedasticity
##  -----------------------------------------
##  Ho: the variance is constant            
##  Ha: the variance is not constant        
## 
##              Data               
##  -------------------------------
##  Response : mpg 
##  Variables: fitted values of mpg 
## 
##          Test Summary           
##  -------------------------------
##  DF            =    1 
##  Chi2          =    39.77868 
##  Prob > Chi2   =    2.844324e-10

The p-value is lower than our alpha of 0.05, therefore we do not accept the null hypothesis. According to this test, constant variance is not present. We can check through a visualization.

plot(fitted(mod), residuals(mod), xlab="fitted", ylab="residuals")
abline(h=0)

Square root of the residuals

plot(fitted(mod), sqrt(abs(residuals(mod))), xlab="fitted", ylab=expression(sqrt(hat(epsilon))))

The visuals reveal a parabolic pattern within the residuals. Non-linearity is implied.

We have discovered that the constant variance condition for regression has failed. The best course of action is to rebuild the model with a transformation. We can use a technique called the Box-Cox test to determine the type of transformation to apply to our response variable.

require(MASS)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:olsrr':
## 
##     cement
boxcox(mod, plotit=T, lambda=seq(-1.0, 1.0, by=0.5))

According to the plot, the maximum log-likelihood happens around -440. We estimate a parameter for our value lambda based on where the center dotted line crosses the x axis. The interval is roughly (-0.56, -0.3) which contains lambda of -0.5, therfore we can apply a reciprocal square root transformation on our response variable.

The transformation should be applied to the response variable as follows: \[ { mpg }^{ -0.5 } \]

Afit = lm((mpg ^ -0.5) ~ ., data = auto.df2)
summary(Afit)
## 
## Call:
## lm(formula = (mpg^-0.5) ~ ., data = auto.df2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.049957 -0.009969  0.000017  0.010575  0.052091 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.690e-02  1.011e-02   8.592  < 2e-16 ***
## cylinders     3.144e-03  1.555e-03   2.022   0.0439 *  
## displacement -1.642e-06  3.437e-05  -0.048   0.9619    
## X.horsepower  3.575e-04  6.313e-05   5.663 2.91e-08 ***
## weight        2.033e-05  3.094e-06   6.572 1.61e-10 ***
## acceleration  8.910e-04  4.765e-04   1.870   0.0623 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01609 on 386 degrees of freedom
## Multiple R-squared:  0.8139, Adjusted R-squared:  0.8115 
## F-statistic: 337.6 on 5 and 386 DF,  p-value: < 2.2e-16
plot(fitted(Afit), resid(Afit), col = "dodgerblue",
     pch = 20, cex = 1.5, xlab = "Fitted", ylab = "Residuals")
abline(h = 0, lty = 2, col = "darkorange", lwd = 2)

The variance looks much more constant in this model.

There are a few things that are different in the new model compared to the older model. Certain predictors such as cylinders and acceleration now have lower p values under the 0.05 threshold. In the old model, these same predictors had p values over 0.05. This indicates that they are more significant. Displacement is stillstatistically not significant since its p value is the same as pre-transformation.

Before analyzing the best predictors for mpg, we are going to check for influential points. These are data points that can come in the form of outliers and distort the predictive power of our model. We introduce the jack knife statistic and compare that to the absolute value of the Bonferroni constant.

jack<-rstudent(Afit)
jack[which.max(abs(jack))]
##      112 
## 3.308493
qt(.05/(50*2), 398)
## [1] -3.315139

The absolute value of the bonferroni constant is 3.315, which is less than the jack statistic. There exists at least one outlier. We can test if the outlier is significant.

library(car)
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
outlierTest(Afit)
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##     rstudent unadjusted p-value Bonferonni p
## 112 3.308493          0.0010263      0.40233

The low p value suggests there is no significance in the outlier.

Thhe next task is to pick the predictors for mpg. We want to check the relationship between predictors and mpg. The VIF technique allows us to measure their colinearity.

vif(Afit)
##    cylinders displacement X.horsepower       weight acceleration 
##    10.630870    19.535061     8.916017    10.430271     2.609487

The VIF numbers suggest that cylinders, displacemnent, horsepower, and weight are highly correlated with mpg. Acceleration has moderate correlation. With this in mind, we want to select the best predictors for mpg.

We introduce a function that computes different permutations of the predictor and provides a table of adjusted R squared values and mallow’s CP values.

k<-ols_step_all_possible(Afit)
k
## # A tibble: 31 x 6
##    Index     N Predictors         `R-Square` `Adj. R-Square` `Mallow's Cp`
##  * <int> <int> <chr>                   <dbl>           <dbl>         <dbl>
##  1     1     1 weight                  0.783           0.783         61.6 
##  2     2     1 displacement            0.748           0.747        135.  
##  3     3     1 X.horsepower            0.716           0.716        200.  
##  4     4     1 cylinders               0.703           0.702        229.  
##  5     5     1 acceleration            0.206           0.204       1258.  
##  6     6     2 X.horsepower weig…      0.809           0.808          9.39
##  7     7     2 displacement weig…      0.795           0.794         39.0 
##  8     8     2 cylinders weight        0.793           0.792         43.1 
##  9     9     2 weight accelerati…      0.792           0.791         45.3 
## 10    10     2 displacement X.ho…      0.773           0.772         83.9 
## # ... with 21 more rows
plot(k)

To avoid going through the list manually, there is a function that selects the best permutation of predictors based on the adjusted R square value and Mallow’s CP.

h<-ols_step_best_subset(Afit)
h
##                        Best Subsets Regression                        
## ----------------------------------------------------------------------
## Model Index    Predictors
## ----------------------------------------------------------------------
##      1         weight                                                  
##      2         X.horsepower weight                                     
##      3         cylinders X.horsepower weight                           
##      4         cylinders X.horsepower weight acceleration              
##      5         cylinders displacement X.horsepower weight acceleration 
## ----------------------------------------------------------------------
## 
##                                                      Subsets Regression Summary                                                     
## ------------------------------------------------------------------------------------------------------------------------------------
##                        Adj.        Pred                                                                                              
## Model    R-Square    R-Square    R-Square     C(p)         AIC           SBIC          SBC        MSEP      FPE      HSP       APC  
## ------------------------------------------------------------------------------------------------------------------------------------
##   1        0.7832      0.7827      0.7811    61.6056    -2065.3226    -3178.3157    -2053.4088    3e-04    3e-04    0.0000    0.2190 
##   2        0.8094      0.8084      0.8058     9.3866    -2113.6972    -3226.1966    -2097.8122    3e-04    3e-04    0.0000    0.1936 
##   3        0.8122      0.8107      0.8075     5.5573    -2117.5196    -3229.9173    -2097.6633    3e-04    3e-04    0.0000    0.1917 
##   4        0.8139      0.8120      0.8076     4.0023    -2119.1134    -3231.4063    -2095.2858    3e-04    3e-04    0.0000    0.1909 
##   5        0.8139      0.8115      0.8059     6.0000    -2117.1157    -3229.3775    -2089.3169    3e-04    3e-04    0.0000    0.1919 
## ------------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria 
##  SBIC: Sawa's Bayesian Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
##  MSEP: Estimated error of prediction, assuming multivariate normality 
##  FPE: Final Prediction Error 
##  HSP: Hocking's Sp 
##  APC: Amemiya Prediction Criteria
plot(h)

The ideal number of predictors based on our charts are 4. Four predictors minimize AIC, minimize CP, and have the highest adjusted R square value. Displacemenet is not included in our set of predictors. This is not suprising since it had a high p value for both pre and post transformation. cylinders X.horsepower weight acceleration

final_fit<-lm((mpg ^ -0.5)~cylinders+X.horsepower+weight+acceleration, data=auto.df2)
summary(final_fit)
## 
## Call:
## lm(formula = (mpg^-0.5) ~ cylinders + X.horsepower + weight + 
##     acceleration, data = auto.df2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.050058 -0.009969  0.000009  0.010574  0.052110 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.709e-02  9.276e-03   9.389  < 2e-16 ***
## cylinders    3.094e-03  1.149e-03   2.693  0.00738 ** 
## X.horsepower 3.567e-04  6.085e-05   5.862 9.79e-09 ***
## weight       2.027e-05  2.781e-06   7.288 1.79e-12 ***
## acceleration 8.934e-04  4.732e-04   1.888  0.05978 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01607 on 387 degrees of freedom
## Multiple R-squared:  0.8139, Adjusted R-squared:  0.812 
## F-statistic: 423.1 on 4 and 387 DF,  p-value: < 2.2e-16

Acceleration is not statistically significant. It has a p-value higher than 0.05, however removing acceleration only marginally changes the adjusted r square and the CP number. We can remove acceleration to have the most effective but simple model that predicts mpg. With three or four predictors, more than 80% of the variability will still be covered by the model.

We should also consider the fact that the model was power transformed. Any transformation can make the interpretation difficult. For this reason alone, we are encouraged to pick the most simple model with the least amount of predictors.

final_fit<-lm((mpg ^ -0.5)~cylinders+X.horsepower+weight, data=auto.df2)
summary(final_fit)
## 
## Call:
## lm(formula = (mpg^-0.5) ~ cylinders + X.horsepower + weight, 
##     data = auto.df2)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.050204 -0.010163  0.000064  0.009642  0.050426 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.037e-01  3.029e-03  34.215  < 2e-16 ***
## cylinders    2.740e-03  1.137e-03   2.410   0.0164 *  
## X.horsepower 2.775e-04  4.422e-05   6.275 9.33e-10 ***
## weight       2.280e-05  2.445e-06   9.326  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01612 on 388 degrees of freedom
## Multiple R-squared:  0.8122, Adjusted R-squared:  0.8107 
## F-statistic: 559.2 on 3 and 388 DF,  p-value: < 2.2e-16

Since this regression model is based from a transformed response variable, we will need to back transform in order to be able to see unit change effects.

\[ {mpg }^{ -0.5 } =0.1037+0.002740(cylinders)+0.0002775(horsepower)+0.00002280(weight) \]

After more algebra operations, we end up with the final equation \[ { mpg }=\frac { 1 }{ { (0.1037+0.00274(cylinders)+0.0002775(horsepower)+0.0000228(weight)) }^{ 2 } } \]

By using a power transformation to account for non-constant variance, we ended up with a model that is difficult to interpret. Without computational software, it is difficult to see how a unit change in any of our predictors can have an effect on mpg. We do know that this is the best performing model and satisfies the conditions for regression.

Another model type we should consider are non linear predictors.

Bonus: This portion might be outside the scope of DATA 606, however we include it to show that we have only seen a small glimpse of regression analysis.

Perhaps an alternative can be implimented in this case. The power transformation, although good,is incredibly difficult to interpret when applied to a linear model. The use of non-parametric modeling can produce a good model without going through the process of transforms. In our case, we provide a link function (which we know from BoxCox).

glm_mod<-glm(mpg~cylinders+X.horsepower+weight, data=auto.df2,family=gaussian(link=power(-1/2)))
summary(glm_mod)
## 
## Call:
## glm(formula = mpg ~ cylinders + X.horsepower + weight, family = gaussian(link = power(-1/2)), 
##     data = auto.df2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -11.1821   -2.2946   -0.2032    2.0872   15.1809  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.182e+00  3.338e-02 125.293  < 2e-16 ***
## cylinders    -1.450e-02  1.267e-02  -1.145    0.253    
## X.horsepower -3.414e-03  5.742e-04  -5.946 6.14e-09 ***
## weight       -2.155e-04  2.817e-05  -7.652 1.58e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 15.8047)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  6132.2  on 388  degrees of freedom
## AIC: 2200.5
## 
## Number of Fisher Scoring iterations: 5
library(boot)
## 
## Attaching package: 'boot'
## The following object is masked from 'package:car':
## 
##     logit
glm.diag.plots(glm_mod, glmdiag=glm.diag(glm_mod), 
     subset=NULL, iden=FALSE, labels=NULL, ret=FALSE)

To find the goodness of fit:

1 - (6132.2/23819.0)
## [1] 0.7425501

The model fit using GLM has slightly different coefficients. The change is marginal but cylinders is no longer significant. The coefficients will not be the same due to the fact that GLM is fit using the MLE.

Conclusion: It is safe to say that the best predictors for mpg are cylinders, horsepower, and weight. The resulting model, although meeting the regression requirments, is incredibly difficult to interpret without the use of computing software or a production cluster. This model can benefit from the use of GLM techniques, however these are out of scope of this course. I would encourage the use of a Cullen-Fray plot to identify a neighborhood of distributions closely associated with the response variable. I would speculate that a gamma regression model can be of good use here.

References: http://astrostatistics.psu.edu/su07/R/html/boot/html/glm.diag.plots.html

https://stats.stackexchange.com/questions/313793/whats-a-power-link-used-for-in-generalized-linear-models-glms?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

https://www.researchgate.net/post/How_to_back_translate_regression_cofficients_of_log_and_square-route_transformed_ouctome_and_independent_variables

https://www.isixsigma.com/tools-templates/normality/making-data-normal-using-box-cox-power-transformation/

https://daviddalpiaz.github.io/appliedstats/transformations.html

http://www.statisticshowto.com/box-cox-transformation/