Discussion 12

IS 605 FUNDAMENTALS OF COMPUTATIONAL MATHEMATICS

Multiple Linear Regression

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

I just created my own cars data and tried to fit the multiple linear regression model. The data has following variables: Mileage, Type, Cylinder, Liter, Doors, Leather and proce. Usinf all the other variables we are going to build the model to predict the price of the car.

  1. Quadratic term - Cylinder of the car.
  2. Dichotomous term - Type - Sedan or Convertible
  3. Dichotomous vs Quantitative term - Type vs Cylinder
#cars data
cars <- read.csv('https://raw.githubusercontent.com/Riteshlohiya/Data605_Discussion12/master/cars.csv')
summary(cars)
##      Price          Mileage               Type       Cylinder    
##  Min.   :22245   Min.   :  583   Convertible:50   Min.   :4.000  
##  1st Qu.:29338   1st Qu.:14050   Sedan      :76   1st Qu.:4.000  
##  Median :33370   Median :21237                    Median :4.000  
##  Mean   :35667   Mean   :20257                    Mean   :5.619  
##  3rd Qu.:38275   3rd Qu.:25776                    3rd Qu.:8.000  
##  Max.   :70755   Max.   :50387                    Max.   :8.000  
##      Liter           Doors          Leather      
##  Min.   :2.000   Min.   :2.000   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.0000  
##  Median :2.300   Median :4.000   Median :1.0000  
##  Mean   :3.211   Mean   :3.206   Mean   :0.7698  
##  3rd Qu.:4.600   3rd Qu.:4.000   3rd Qu.:1.0000  
##  Max.   :6.000   Max.   :4.000   Max.   :1.0000
# Encoding the categorical variable
cars$Type <- ifelse(cars$Type == 'Sedan', 0, 1) # Sedan = 0 and Convertible = 1
cars
##        Price Mileage Type Cylinder Liter Doors Leather
## 1   37510.25   21593    0        8   4.6     4       1
## 2   37215.17   22211    0        8   4.6     4       1
## 3   36332.89   25153    0        8   4.6     4       1
## 4   36245.16   26250    0        8   4.6     4       1
## 5   32954.14   36074    0        8   4.6     4       1
## 6   32537.19   41829    0        8   4.6     4       1
## 7   35715.77    6447    0        8   4.6     4       1
## 8   35651.68   10555    0        8   4.6     4       1
## 9   35129.34   11975    0        8   4.6     4       1
## 10  35165.76   13449    0        8   4.6     4       1
## 11  32501.25   17508    0        8   4.6     4       1
## 12  33220.03   18661    0        8   4.6     4       1
## 13  32509.48   20910    0        8   4.6     4       1
## 14  31132.21   23124    0        8   4.6     4       1
## 15  31181.72   26222    0        8   4.6     4       1
## 16  31059.18   27544    0        8   4.6     4       1
## 17  42741.52    2846    0        6   3.6     4       1
## 18  40966.61    7476    0        6   3.6     4       1
## 19  38795.38   13973    0        6   3.6     4       1
## 20  38297.46   16754    0        6   3.6     4       1
## 21  37192.90   19100    0        6   3.6     4       1
## 22  36210.12   21778    0        6   3.6     4       1
## 23  36633.63   22042    0        6   3.6     4       1
## 24  35895.50   23056    0        6   3.6     4       1
## 25  34974.38   25796    0        6   3.6     4       1
## 26  32038.34   35326    0        6   3.6     4       1
## 27  48310.33     788    0        8   4.6     4       1
## 28  48365.98    2616    0        8   4.6     4       1
## 29  45061.95   13829    0        8   4.6     4       1
## 30  44205.88   15104    0        8   4.6     4       1
## 31  42377.96   18581    0        8   4.6     4       1
## 32  41671.58   20575    0        8   4.6     4       1
## 33  41516.43   23861    0        8   4.6     4       1
## 34  41053.48   25717    0        8   4.6     4       1
## 35  38208.50   31303    0        8   4.6     4       1
## 36  39072.39   31587    0        8   4.6     4       1
## 37  70755.47     583    1        8   4.6     2       1
## 38  68566.19    6420    1        8   4.6     2       1
## 39  69133.73    7892    1        8   4.6     2       1
## 40  66374.31   12021    1        8   4.6     2       1
## 41  65281.48   15600    1        8   4.6     2       1
## 42  63913.12   18200    1        8   4.6     2       1
## 43  60567.55   23193    1        8   4.6     2       1
## 44  57154.44   29260    1        8   4.6     2       1
## 45  55639.09   31805    1        8   4.6     2       1
## 46  52001.99   42691    1        8   4.6     2       1
## 47  46732.61    3625    1        8   6.0     2       1
## 48  47065.21    5239    1        8   6.0     2       1
## 49  44749.69   12115    1        8   6.0     2       1
## 50  42773.03   14546    1        8   6.0     2       1
## 51  41371.38   20000    1        8   6.0     2       1
## 52  39547.59   23826    1        8   6.0     2       1
## 53  39691.73   25169    1        8   6.0     2       1
## 54  38824.87   25960    1        8   6.0     2       1
## 55  36970.90   30502    1        8   6.0     2       1
## 56  37288.94   32039    1        8   6.0     2       1
## 57  35622.14   10340    1        4   2.0     2       0
## 58  34819.30   12251    1        4   2.0     2       0
## 59  34355.00   17711    1        4   2.0     2       0
## 60  32737.08   19112    1        4   2.0     2       1
## 61  33540.54   20925    1        4   2.0     2       1
## 62  31970.54   21208    1        4   2.0     2       0
## 63  33287.41   21661    1        4   2.0     2       1
## 64  32075.98   23553    1        4   2.0     2       0
## 65  31969.07   24559    1        4   2.0     2       0
## 66  27666.23   35157    1        4   2.0     2       0
## 67  29246.24    3907    0        4   2.0     4       1
## 68  26337.83   16068    0        4   2.0     4       0
## 69  26775.03   16688    0        4   2.0     4       0
## 70  25299.97   19569    0        4   2.0     4       0
## 71  24896.60   21266    0        4   2.0     4       0
## 72  25996.81   21433    0        4   2.0     4       1
## 73  24801.62   26345    0        4   2.0     4       1
## 74  24063.01   27674    0        4   2.0     4       1
## 75  23249.84   27686    0        4   2.0     4       0
## 76  22244.88   50387    0        4   2.0     4       1
## 77  37088.56    3828    1        4   2.0     2       1
## 78  33381.82   17381    1        4   2.0     2       1
## 79  33358.77   17590    1        4   2.0     2       1
## 80  33586.91   18930    1        4   2.0     2       1
## 81  30731.94   22479    1        4   2.0     2       0
## 82  30315.17   23635    1        4   2.0     2       0
## 83  30166.85   25049    1        4   2.0     2       0
## 84  30251.02   27558    1        4   2.0     2       1
## 85  29142.71   31655    1        4   2.0     2       1
## 86  29612.15   32477    1        4   2.0     2       1
## 87  26841.08   10003    0        4   2.0     4       0
## 88  27825.95   10014    0        4   2.0     4       1
## 89  27284.75   14281    0        4   2.0     4       1
## 90  27060.14   17319    0        4   2.0     4       1
## 91  25618.28   20208    0        4   2.0     4       1
## 92  25790.51   21160    0        4   2.0     4       1
## 93  25148.38   22272    0        4   2.0     4       1
## 94  24852.50   22814    0        4   2.0     4       1
## 95  24173.53   27015    0        4   2.0     4       0
## 96  23733.40   27600    0        4   2.0     4       0
## 97  38324.81   12090    1        4   2.0     2       1
## 98  38167.17   13162    1        4   2.0     2       1
## 99  37383.50   16088    1        4   2.0     2       1
## 100 36338.75   18195    1        4   2.0     2       0
## 101 35580.33   21167    1        4   2.0     2       0
## 102 35304.49   21293    1        4   2.0     2       1
## 103 34393.00   24031    1        4   2.0     2       1
## 104 33984.43   25420    1        4   2.0     2       0
## 105 33248.34   27051    1        4   2.0     2       1
## 106 28777.96   48991    1        4   2.0     2       1
## 107 32197.34    3867    0        4   2.0     4       0
## 108 32053.10    5144    0        4   2.0     4       0
## 109 30274.71   10800    0        4   2.0     4       0
## 110 30353.59   11273    0        4   2.0     4       0
## 111 30122.43   14568    0        4   2.0     4       1
## 112 26789.83   22189    0        4   2.0     4       0
## 113 28291.76   22328    0        4   2.0     4       0
## 114 27109.41   22598    0        4   2.0     4       1
## 115 27256.49   26400    0        4   2.0     4       0
## 116 25267.37   34175    0        4   2.0     4       0
## 117 35033.22    1676    0        4   2.3     4       1
## 118 32746.13    7924    0        4   2.3     4       1
## 119 33183.33    9795    0        4   2.3     4       1
## 120 31002.73   15087    0        4   2.3     4       1
## 121 30075.99   22052    0        4   2.3     4       1
## 122 29844.20   23143    0        4   2.3     4       1
## 123 28432.82   25247    0        4   2.3     4       1
## 124 28054.98   26276    0        4   2.3     4       1
## 125 28502.96   28598    0        4   2.3     4       1
## 126 24912.08   38717    0        4   2.3     4       1
#Distribution
hist(cars$Price, main = "Histogram of price of the cars")

hist(cars$Mileage, main = "Histogram of Mileage of the cars")

#Correlation matrix
pairs(cars)

Build multiple regression model

# Quadratic variable
cars$q <- cars$Cylinder^2
# Dichotomous vs. quantative interaction
cars$dq <- cars$Type * cars$Cylinder

#Fitting the multiple regression model
ml = lm(Price ~ Mileage + Type + Cylinder + Liter + Doors + Leather + q + dq, data = cars)
summary(ml)
## 
## Call:
## lm(formula = Price ~ Mileage + Type + Cylinder + Liter + Doors + 
##     Leather + q + dq, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6042.6 -2068.0   -60.9  1875.2  5785.8 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.387e+04  8.883e+03  -4.939 2.62e-06 ***
## Mileage     -3.002e-01  2.870e-02 -10.460  < 2e-16 ***
## Type        -1.326e+04  1.919e+03  -6.908 2.66e-10 ***
## Cylinder     3.373e+04  3.418e+03   9.869  < 2e-16 ***
## Liter       -1.372e+04  9.491e+02 -14.453  < 2e-16 ***
## Doors               NA         NA      NA       NA    
## Leather      2.122e+03  7.464e+02   2.843  0.00526 ** 
## q           -1.896e+03  2.697e+02  -7.027 1.46e-10 ***
## dq           4.638e+03  3.467e+02  13.377  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3059 on 118 degrees of freedom
## Multiple R-squared:  0.9118, Adjusted R-squared:  0.9065 
## F-statistic: 174.2 on 7 and 118 DF,  p-value: < 2.2e-16

After seeing summary i think Doors are not significant contributor, so removing the variable from the model.

#Refitting the model
ml2 = lm(Price ~ Mileage + Type + Cylinder + Liter + Leather + q + dq, data = cars)
summary(ml2)
## 
## Call:
## lm(formula = Price ~ Mileage + Type + Cylinder + Liter + Leather + 
##     q + dq, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6042.6 -2068.0   -60.9  1875.2  5785.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.387e+04  8.883e+03  -4.939 2.62e-06 ***
## Mileage     -3.002e-01  2.870e-02 -10.460  < 2e-16 ***
## Type        -1.326e+04  1.919e+03  -6.908 2.66e-10 ***
## Cylinder     3.373e+04  3.418e+03   9.869  < 2e-16 ***
## Liter       -1.372e+04  9.491e+02 -14.453  < 2e-16 ***
## Leather      2.122e+03  7.464e+02   2.843  0.00526 ** 
## q           -1.896e+03  2.697e+02  -7.027 1.46e-10 ***
## dq           4.638e+03  3.467e+02  13.377  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3059 on 118 degrees of freedom
## Multiple R-squared:  0.9118, Adjusted R-squared:  0.9065 
## F-statistic: 174.2 on 7 and 118 DF,  p-value: < 2.2e-16

Residuals

#Histogram
hist(ml2$residuals, main = "Regression Residuals")

# Residuals
plot(ml2$residuals, ylab='Residuals')
abline(a=0, b=0)

# Q-Q plot
qqnorm(ml2$residuals)
qqline(ml2$residuals)

Conclusion:

The R-squared value is 91.18% which is good. That means that the explained variability is 91.18% between independent and dependent variables. Seeing the residual plot, we can see mostly there is constant variability and no pattern. Q-Q plot also looks good with some outliers at the tails. I think this multiple linear model(ml2) is appropriate.