Project 2: Multiple Linear Regression

Dirt Bike

Jane Braun

RPI

March 19th - Final Draft - Version 2.0

1. Data

Data Selection

The data was selected from the 100+ datasets, and is part of Carnegie Melon’s data library. The dataset is called “Dirt Bike Data”, and is a summary of all off-road dirt bikes available in the United States. The data provides information about the bike, including engine size, price, frame size, etc.

Reading in and Cleaning Data

x <- read.csv("~/Spring2015/Applied Regression/dirtbike.csv")
x <- x[c("msrp", "displacement", "bore", "stroke", "wheelbase", "seat.height", "fuel.cap", "total.weight")]
x <- x[c("msrp", "bore", "stroke", "wheelbase", "fuel.cap", "total.weight")]
x$bore <- as.numeric(x$bore)
x$msrp <- as.numeric(x$msrp)

head(x)

##   msrp bore stroke wheelbase fuel.cap total.weight
## 1 2999   48     48      49.1     1.35          144
## 2 3099   48     48      49.1     1.35          145
## 3 2249   53     45      49.2     1.50          165
## 4 2999   64     49      52.2     2.20          216
## 5 3499   66     66      54.1     2.20          238
## 6 5899   78     52      58.2     2.00          204

Data Organization

Oringinally in this dataset, there are 104 observations with 30 different columns each. Unfortunately, some of the data was missing from a large portion of these variables. Therefore, the data was cleaned down to include only variables with complete data. This ended up being 6 variables.

All patient days and in hundreds, and all numeric values are in hundreds of dollars.

summary(x)

##       msrp            bore            stroke       wheelbase    
##  Min.   : 1079   Min.   : 39.00   Min.   :38.0   Min.   :33.70  
##  1st Qu.: 3099   1st Qu.: 53.75   1st Qu.:49.0   1st Qu.:49.75  
##  Median : 5248   Median : 70.00   Median :58.0   Median :57.00  
##  Mean   : 4928   Mean   : 70.84   Mean   :57.6   Mean   :53.18  
##  3rd Qu.: 6398   3rd Qu.: 91.25   3rd Qu.:63.0   3rd Qu.:58.25  
##  Max.   :19500   Max.   :100.00   Max.   :83.0   Max.   :58.90  
##                                                  NA's   :1      
##     fuel.cap      total.weight  
##  Min.   :0.000   Min.   : 82.0  
##  1st Qu.:1.350   1st Qu.:147.9  
##  Median :2.100   Median :218.5  
##  Mean   :1.964   Mean   :201.7  
##  3rd Qu.:2.400   3rd Qu.:241.5  
##  Max.   :6.100   Max.   :337.0  
##  NA's   :1       NA's   :1

str(x)

## 'data.frame':    104 obs. of  6 variables:
##  $ msrp        : num  2999 3099 2249 2999 3499 ...
##  $ bore        : num  48 48 53 64 66 78 78 96 96 39 ...
##  $ stroke      : num  48 48 45 49 66 52 52 62 62 41 ...
##  $ wheelbase   : num  49.1 49.1 49.2 52.2 54.1 58.2 58.6 58.7 58.2 36 ...
##  $ fuel.cap    : num  1.35 1.35 1.5 2.2 2.2 2 2.2 1.9 2.27 0.8 ...
##  $ total.weight: num  144 145 165 216 238 ...

Continuous variables

All of the variables that were included in this analysis are continuous variables.

Response variables

The response variable that I am interested in is the MSRP, or the Manufacturer’s Suggested Retail Price, and how variation in some other factors may be able to predict the variation in this price. This variable is measured in US dollars.

Initial Plot and Outlier Analysis

The histogram below shows the values of MSRP for the dirt bikes. Histograms are useful for displaying the variable of interest, as well as to show outliers, skewness, and kurtosis. From the histogram below, it is quite obvious that there is an outlier on the right side of the range. If we ignore this outlier, the kurtosis also seems large, because there does not seem to be a very normally distributed peak.

hist(x$msrp, breaks = 15)

The boxplot below shows the values of MSRP for the dataset. As seen below, most of the data appears to be centered around $5,000. However, there also appears to be a single outlier, which has an MSRP much greater than all the others. This outlier will be analyzed further.

bplot <- boxplot(x$msrp, main = "MSRP of Dirt Bikes", ylab = "Dollars ($)")

bplot

## $stats
##        [,1]
## [1,] 1079.0
## [2,] 3099.0
## [3,] 5248.5
## [4,] 6398.0
## [5,] 8998.0
## 
## $n
## [1] 104
## 
## $conf
##         [,1]
## [1,] 4737.38
## [2,] 5759.62
## 
## $out
## [1] 19500
## 
## $group
## [1] 1
## 
## $names
## [1] ""

The boxplot function allows you to view the outliers of the dataset that you have just plotted.

One value has been identified as an outlier:

bplot$out

## [1] 19500

This data point has an MSRP value of $19,500.

While removing outliers is a matter of personal discretion, it is better to have a good model of the majority of the points, rather than an “okay” model of all of the points. For this reason, I will remove this outlier for the remainder of the analysis.

x <-x[x$msrp!=19500,]
bplot2 <- boxplot(x$msrp)

bplot2$out

## numeric(0)

As seen above, the outlier has been removed, and the boxplot function no longer returns a value for an outlier.

2. Multiple Linear Model

Null Hypothesis

My null hypothesis is that the variation in MSRP of the off-road dirt bikes cannot be explained by anything other than randomization.

Scattergram of Variables

plot(x)

The plot above shows each of the variables included in this dataset, and their relationship to each of the other variables. Since we are primarily interested in MSRP as a response variable, the top row of plots are of interest.

As seen in the plot, some of the variable have stronger relationships than others. “Bore” appears to have a fairly strong, positive, linear relationship, whereas the relationship between “total weight” and “msrp” is slightly less clear, though probably slightly positive.

Methods of Modeling

There will be three different methods used in the following analysis in order to model the dependent variable. Hierarchical, step-wise, and entry-wise models will be used. The summary of each model will be plotted, but the complete analysis and interpretation will occur last, with the step-wise model.

1. Hierarchical Linear Model

Hierarchical models are built using theory in order to determine which variables should be able to explain the dependent variable.

From my limited knowledge of dirt bikes, I would assume that there are two main deciding factors that would determine how much I would pay (MSRP). Since dirt bikes are used for sport, my primary concern would be the power of the bike, which bore is a measurement of.

x <- x[c("msrp", "bore", "wheelbase", "fuel.cap")]

Linear Model with Bore Alone

attach(x)
x <- x[order(x$msrp),]
model1 <- lm(x$msrp ~ x$bore)

summary(model1)

## 
## Call:
## lm(formula = x$msrp ~ x$bore)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2244.57  -906.79   -27.89   763.42  1994.64 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1109.988    392.573  -2.827  0.00566 ** 
## x$bore         83.536      5.355  15.600  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1077 on 101 degrees of freedom
## Multiple R-squared:  0.7067, Adjusted R-squared:  0.7038 
## F-statistic: 243.4 on 1 and 101 DF,  p-value: < 2.2e-16

Linear Model with Bore and Wheelbase

My second concern would be the size of the bike, since different sized bikes have different performance capabilities. This is measured via the wheelbase variable.

attach(x)

## The following objects are masked from x (pos = 3):
## 
##     bore, fuel.cap, msrp, wheelbase

model2 <- lm(x$msrp ~ x$bore + x$wheelbase)

summary(model2)

## 
## Call:
## lm(formula = x$msrp ~ x$bore + x$wheelbase)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1916.26  -792.94   -12.21   690.55  2175.73 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5587.318    807.084  -6.923 4.47e-10 ***
## x$bore         47.049      7.532   6.247 1.06e-08 ***
## x$wheelbase   132.839     21.720   6.116 1.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 925 on 99 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7876, Adjusted R-squared:  0.7833 
## F-statistic: 183.6 on 2 and 99 DF,  p-value: < 2.2e-16

Linear Model with Bore, Wheelbase, and Fuel Capacity

Finally, the variable I would be least concerned with would be fuel capacity. Because this is a sport bike, it’s not necessary to hold large amounts of fuel. You will likely be close enough to the “home base” so that you don’t have to worry about storing a lot of fuel.

attach(x)

## The following objects are masked from x (pos = 3):
## 
##     bore, fuel.cap, msrp, wheelbase
## 
## The following objects are masked from x (pos = 4):
## 
##     bore, fuel.cap, msrp, wheelbase

model3 <- lm(x$msrp ~ x$bore + x$wheelbase + x$fuel.cap)

summary(model3)

## 
## Call:
## lm(formula = x$msrp ~ x$bore + x$wheelbase + x$fuel.cap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1536.5  -598.6  -146.6   572.5  2032.8 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7167.352    774.357  -9.256 4.99e-15 ***
## x$bore         53.976      6.798   7.940 3.40e-12 ***
## x$wheelbase   179.021     21.114   8.479 2.39e-13 ***
## x$fuel.cap   -695.461    130.998  -5.309 6.88e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 819.3 on 98 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.8351, Adjusted R-squared:   0.83 
## F-statistic: 165.4 on 3 and 98 DF,  p-value: < 2.2e-16

2. Entry-wise Linear Model

In order to build an entry-wise linear model, you simply use all of the variables the first time. Instead of building the model starting with 1 IV, then 2 IV, then 3 IV, the entry-wise starts with all of them at once.

model4 <- lm(x$msrp ~ x$bore + x$wheelbase + x$fuel.cap)

summary(model4)

## 
## Call:
## lm(formula = x$msrp ~ x$bore + x$wheelbase + x$fuel.cap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1536.5  -598.6  -146.6   572.5  2032.8 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7167.352    774.357  -9.256 4.99e-15 ***
## x$bore         53.976      6.798   7.940 3.40e-12 ***
## x$wheelbase   179.021     21.114   8.479 2.39e-13 ***
## x$fuel.cap   -695.461    130.998  -5.309 6.88e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 819.3 on 98 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.8351, Adjusted R-squared:   0.83 
## F-statistic: 165.4 on 3 and 98 DF,  p-value: < 2.2e-16

3. Step-wise Linear Model

The model will be built step-wise based on what independent variables appear to be most correlated with the dependent variable, MSRP.

The following correlation matrix shows the correlation between each of the variables and themselves. To narrow down the size of the matrix, only the variables of interest were displayed. As seen below, each variable is obviously perfectly correlated with itself - with a correlation of 1.

x <- x[c("msrp", "bore", "wheelbase", "fuel.cap")]
cor(x, use="complete")

##                msrp      bore wheelbase  fuel.cap
## msrp      1.0000000 0.8410597 0.8389980 0.4726100
## bore      0.8410597 1.0000000 0.7918625 0.6368591
## wheelbase 0.8389980 0.7918625 1.0000000 0.6982879
## fuel.cap  0.4726100 0.6368591 0.6982879 1.0000000

Another characteristic to look out for is suppression. As seen in the correlation matrix above, there is a high correlation between some of the independent variables:
bore - wheelbase = 0.79
bore - fuel capacity = 0.64
fuel capacity - wheelbase = 0.70
Because of this high correlation, it is possible that there may be suppression when looking at the coefficients of the model. It will cause some coefficients to be inflated.

This is something that we will have to keep an eye on.

Linear Model with 1 Independent Variable

Based on the correlation matrix, Bore and MSRP are most highly correlated, at .841. We will begin building the model with bore as the primary independent variable.

Bore is a measure of an internal combustion engine in a motorcycle. The value of a bike’s bore can affect the speed and power. The larger the bore, the more powerful the bike. Bore is measured in inches and is the diameter of the chambers in an engine cylinder.

This linear model analyzes if the variation in MSRP can be expained by the variation in bore.

We use the “lm” function to calculate the linear model.

“MSRP ~ bore” reads as: “MSRP is explained by bore”

Model

attach(x)

## The following objects are masked from x (pos = 3):
## 
##     bore, fuel.cap, msrp, wheelbase
## 
## The following objects are masked from x (pos = 4):
## 
##     bore, fuel.cap, msrp, wheelbase
## 
## The following objects are masked from x (pos = 5):
## 
##     bore, fuel.cap, msrp, wheelbase

x <- x[order(x$msrp),]
model5 <- lm(x$msrp ~ x$bore)

summary(model5)

## 
## Call:
## lm(formula = x$msrp ~ x$bore)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2244.57  -906.79   -27.89   763.42  1994.64 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1109.988    392.573  -2.827  0.00566 ** 
## x$bore         83.536      5.355  15.600  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1077 on 101 degrees of freedom
## Multiple R-squared:  0.7067, Adjusted R-squared:  0.7038 
## F-statistic: 243.4 on 1 and 101 DF,  p-value: < 2.2e-16

Interpretation

For this simple linear model, the estimate of the y-intercept is -1,109, and the slope of the model is 83.536. This means that for a bore of value 0, the MSRP is $-1,109. While this is not plausible, it indicates that a motorcycle cannot have a bore of 0, and that bore must have a fairly significant value for the bike to be worth anything. With a slope of 83.536, this says that for each additional unit of bore (in inches), the MSRP increases by $83.536.

The R^2 value for the model is 0.7067. Generally, a larger R^2 value indicates a stronger relationship between the independent and dependent variables, where an R^2 of 1 would indicate a perfect correlation, and a 0 would indicate zero correlation. With this R^2 value of 0.7067, we can say that variation in the independent variable, bore size, can explain 70.67% of the variation in the dependent variable, MSRP. This is a fairly high R^2 value, and therefore a moderate percent of the variation in the dependent variable can be explained by the independent variable. This may, however, increase as we include more variables.

Another indicator of the strength of the model is the p-value, which in this case is 2 x 10^-16. This is a very small value, and it is the probability that the F-statistic (in this case 243.4) is due to randomization alone - showing that this event is very unlikely. If we analyze using an alpha of 0.05, the p-value is less than 0.05. Therefore, we may reject our null hypothesis, which states that the variation in MSRP cannot be explained by anything other than randomization. Instead, we suggest an alternative - that variation in bore size may help to explain the variation in MSRP.

Scatterplot with Regression

par(mfrow=c(1,1))
plot(msrp ~ bore, data = x, pch=21, cex=1, bg='ivory4', main="MSRP vs. Bore with Regression Line", xlab = "Bore (in inches)", ylab = "MSRP (in dollars)")
abline(model5, lwd=2.5, col='dodgerblue4')

Linear Model with 2 Independent Variables

To continue building this model, the next most highly correlated independent variable based on the correlation matrix is the wheelbase.

Wheelbase is the distance between the front and rear wheel of a motorcycle, and can affect the function of the bike (sport/trick bikes versus leisure bikes). Wheelbase is measured in inches.

Model

model6 <- lm(x$msrp ~ x$bore + x$wheelbase)

summary(model6)

## 
## Call:
## lm(formula = x$msrp ~ x$bore + x$wheelbase)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1916.26  -792.94   -12.21   690.55  2175.73 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5587.318    807.084  -6.923 4.47e-10 ***
## x$bore         47.049      7.532   6.247 1.06e-08 ***
## x$wheelbase   132.839     21.720   6.116 1.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 925 on 99 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7876, Adjusted R-squared:  0.7833 
## F-statistic: 183.6 on 2 and 99 DF,  p-value: < 2.2e-16

Interpretation

For this multiple linear model, the estimate of the y-intercept is -5,587, and the slope of bore is 47.049, and the slope of wheelbase is 132.839. This means that for a bore size and wheelbase size, both of value 0, the MSRP is $-5,587. While this is not plausible, it indicates that a motorcycle cannot have a bore of 0 and wheelbase size of 0. With a slope of bore of 47.049, this says that for each additional unit of bore (in inches), the MSRP increases by $47.049, and for every additional inch in wheelbase, the MSRP increases by $132.839.

The adjusted R^2 value for the model is 0.7833. This is an increase from our model with bore as the only independent variable, meaning that it was appropriate to add wheelbase size to the model. With this R^2 value of 0.7833, we can say that variation in the independent variables, bore size and wheelbase size, can explain 78.33% of the variation in the dependent variable, MSRP. This is a fairly high R^2 value, and therefore a moderate percent of the variation in the dependent variable can be explained by the independent variables.

Another indicator of the strength of the model are the p-values.In this case, the p-value for bore is 1.06 x 10^-08 and the p-value for wheelbase is 1.92 x 10^-08. These are both very small values, and they indicate the probabilities that the F-statistic (in this case 183.6) is due to randomization alone - showing that this event is very unlikely. If we analyze using an alpha of 0.05, the p-values are less than 0.05. Therefore, we may reject our null hypothesis, which states that the variation in MSRP cannot be explained by anything other than randomization. Instead, we suggest an alternative - that variation in bore size and wheelbase size may help to explain the variation in MSRP.

Scatterplot with Regression

# install.packages("scatterplot3d") # commented out to avoid errors
library(scatterplot3d)

## Warning: package 'scatterplot3d' was built under R version 3.1.3

mdl6 <- scatterplot3d(x$bore, x$wheelbase, x$msrp, pch=21, main = "MSRP vs. Bore and Wheelbase with Regression Plane", bg = 'ivory4', xlab = "Bore (in inches)", ylab = "Wheelbase (in inches)", zlab = "MSRP (in dollars $)", axis=TRUE)
mdl6$plane3d(model6, col = 'dodgerblue4')

Linear Model with 3 Independent Variables

The next independent variable that I was most interested in was fuel capacity of the dirt bikes.

Fuel capacity is the amount of fuel that the motorcycle can hold. This value is measured in gallons.

Model

model7 <- lm(x$msrp ~ x$bore + x$wheelbase + x$fuel.cap)

summary(model7)

## 
## Call:
## lm(formula = x$msrp ~ x$bore + x$wheelbase + x$fuel.cap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1536.5  -598.6  -146.6   572.5  2032.8 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7167.352    774.357  -9.256 4.99e-15 ***
## x$bore         53.976      6.798   7.940 3.40e-12 ***
## x$wheelbase   179.021     21.114   8.479 2.39e-13 ***
## x$fuel.cap   -695.461    130.998  -5.309 6.88e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 819.3 on 98 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.8351, Adjusted R-squared:   0.83 
## F-statistic: 165.4 on 3 and 98 DF,  p-value: < 2.2e-16

Interpretation

For this final multiple linear model, the estimate of the y-intercept is -7,167, and the slope of bore is 53.98, the slope of wheelbase is 179.02, and the slope of the fuel capacity is -695.46. This means that for a bore size, wheelbase size, and fuel capacity all of value 0, the MSRP is $-7,167. While this is not plausible, it indicates that a motorcycle cannot have a bore, wheelbase size, or fuel capacity of 0. With a slope of bore of 53.98, this says that for each additional unit of bore (in inches), the MSRP increases by $53.98, for every additional inch in wheelbase, the MSRP increases by $179.02, and for every additional gallon of fuel capacity, the MSRP decreases by $695.46.

The adjusted R^2 value for the model is 0.83, which is an increase from the previous 2 IV model. This signifies that it was appropriate to add fuel capacity to the model. With this R^2 value of 0.83, we can say that variation in the independent variables, bore size, wheelbase size, and fuel capacity, can explain 83% of the variation in the dependent variable, MSRP. This is a very high R^2 value, and therefore a large percent of the variation in the dependent variable can be explained by the independent variables.

Another indicator of the strength of the model are the p-values.In this case, the p-values for bore, wheelbase size and fuel capacity are all very small numbers - less than 1 each. These p-values indicate the probabilities that the F-statistic (in this case 165.4) is due to randomization alone - showing that this event is very unlikely. If we analyze using an alpha of 0.05, the p-values are less than 0.05. Therefore, we may reject our null hypothesis, which states that the variation in MSRP cannot be explained by anything other than randomization. Instead, we suggest an alternative - that variation in bore size, wheelbase size, and fuel capacity may help to explain the variation in MSRP.

Scattergram of Final Model

plot(x, pch=21, cex=1, bg='ivory4', main="MSRP versus Bore, Wheelbase, and Fuel Capacity")

3. Checking Assumptions

There are multiple assumptions that are required for our model to be valid. In the following section, we will check various plots and tests to verify these assumptions.

Histogram

Histograms are good for initial verification of the skewness and kurtosis of a dataset.

hist(x$msrp, breaks = 15, main = "Histogram of MSRP", xlab = "MSRP (in dollars)")

Based on the histogram above, the kurtosis seems to be very large - there is no distinct peak. While the data may be slightly skew left, it is not a very significant skewness.

Boxplot

Boxplots are another useful tool for examining skewness and outliers. Since outliers were already removed in the outlier analysis near the beginning, they will not appear in this analysis.

boxplot(x$msrp, main="Boxplot of MSRP")

As stated before with the histogram, there appears to be a slight skew. There is a slightly longer tail in the lower range of the data.

Fitted Values versus Residuals

par(mfrow=c(1,1))
model.res <- resid(model7)
plot(fitted(model7), model.res, pch=21, cex=1, bg='ivory4',main="Plot of Fitted Values vs. Residuals  (Not Sandardized)", xlab = "Fitted Values of Model", ylab = "Residuals")
abline(0,0, lwd=2.5, col='dodgerblue4')

This plot should appear relatively scattered and random across the dynamic range. Although there are more points in the greater part of the range than the lesser part, this appears to be a fairly good fit.

The second plot, below, shows the standardized residuals versus the fitted values. Similarly, there are more points in the second half of the dynamic range.

par(mfrow=c(1,1))
model.st.res <- rstandard(model7)
plot(fitted(model7), model.st.res, pch=21, cex=1, bg='ivory4',main="Standardized Plot of Fitted Values vs. Residuals", xlab = "Fitted Values of Model", ylab = "Residuals")
abline(0,0, lwd=2.5, col='dodgerblue4')

Q-Q Plot

If the residuals are normally distributed, the Q-Q plot will appear relatively linear, and will not have any distinct curvature.

The Q-Q plot below seems relatively linear. The extremeties do seem to diverge a bit, but for the most part, the points maintain that linearity.

par(mfrow = c(1,1))
qqnorm(residuals(model7))
qqline(residuals(model7))

4. 95% Confidence Intervals

Confidence intervals are used to say that in 95% of samples, the true value of the parameter will lie between certian values. As seen below, there are confidence intervals for the intercepts, the coefficient for bore, the coefficient for wheelbase, and the coefficient for fuel capacity.

confint(model7)

##                  2.5 %      97.5 %
## (Intercept) -8704.0379 -5630.66677
## x$bore         40.4863    67.46637
## x$wheelbase   137.1213   220.92015
## x$fuel.cap   -955.4220  -435.49979

4. Data Source

http://lib.stat.cmu.edu/datasets/

http://www.ehow.com/about_6460291_bore-stroke-motorcycle_.html