Akula-DATA605-Week12-Discussion

library(datasets)
library(knitr)
cars.data <- cars

One factor regression is not commutative. To prove it I will be using cars dataset.

Since both columns, speed and dist are continuous variables a regression model can be built two different ways.

Estimating distance given speed of the car

Speed As Predictor. Distance As Output.
Estimating speed given the distance car will stop

Distance As Predictor. Speed As Output.

Linear Model for Case 1

case1.lm <- lm(cars.data$dist ~ cars.data$speed)
case1.lm

## 
## Call:
## lm(formula = cars.data$dist ~ cars.data$speed)
## 
## Coefficients:
##     (Intercept)  cars.data$speed  
##         -17.579            3.932

summary(case1.lm)

## 
## Call:
## lm(formula = cars.data$dist ~ cars.data$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -17.5791     6.7584  -2.601   0.0123 *  
## cars.data$speed   3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Regression model for Case 1 is \(distance = -17.5791 + 3.9324 * speed\)

Linear Model for Case 2

case2.lm <- lm(cars.data$speed ~ cars.data$dist)
case2.lm

## 
## Call:
## lm(formula = cars.data$speed ~ cars.data$dist)
## 
## Coefficients:
##    (Intercept)  cars.data$dist  
##         8.2839          0.1656

summary(case2.lm)

## 
## Call:
## lm(formula = cars.data$speed ~ cars.data$dist)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5293 -2.1550  0.3615  2.4377  6.4179 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     8.28391    0.87438   9.474 1.44e-12 ***
## cars.data$dist  0.16557    0.01749   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Regression model for Case 2 is \(speed = 8.2839 + 0.1656 * distance\)

case1.speed = seq(10, 30, by=1)
case1.distance <- -17.5791 + (3.9324 * case1.speed)
case1.df <- data.frame(case1.speed,case1.distance)
colnames(case1.df) <- c('Speed(mph)', 'Distance(feet)')
kable(case1.df, align='l', caption = "Regression model estimates - Speed As Predictor - Distance As Output")

Regression model estimates - Speed As Predictor - Distance As Output
Speed(mph)	Distance(feet)
10	21.7449
11	25.6773
12	29.6097
13	33.5421
14	37.4745
15	41.4069
16	45.3393
17	49.2717
18	53.2041
19	57.1365
20	61.0689
21	65.0013
22	68.9337
23	72.8661
24	76.7985
25	80.7309
26	84.6633
27	88.5957
28	92.5281
29	96.4605
30	100.3929

case2.distance <- case1.distance
case2.speed <- 8.2839 + (0.1656 * case2.distance)
case2.df <- data.frame(case2.distance,case2.speed)
colnames(case2.df) <- c('Distance(feet)', 'Speed(mph)')
kable(case2.df, align='l', caption = "Regression model estimates - Distance As Predictor - Speed As Output")

Regression model estimates - Distance As Predictor - Speed As Output
Distance(feet)	Speed(mph)
21.7449	11.88486
25.6773	12.53606
29.6097	13.18727
33.5421	13.83847
37.4745	14.48968
41.4069	15.14088
45.3393	15.79209
49.2717	16.44329
53.2041	17.09450
57.1365	17.74570
61.0689	18.39691
65.0013	19.04812
68.9337	19.69932
72.8661	20.35053
76.7985	21.00173
80.7309	21.65294
84.6633	22.30414
88.5957	22.95535
92.5281	23.60655
96.4605	24.25776
100.3929	24.90896

plot(case1.distance,case1.speed,type="l",col="red", xlab = 'Distance', ylab = 'Speed')
lines(case2.distance,case2.speed,col="green")

Red line plots Speed As Predictor - Distance As Output. Green line plots Distance As Predictor - Speed As Output. For the first case, I have used speeds between 10 mph to 30 mph to predict distance. In the second case, I have used the distance from the first instance as a predictor to predict speed. Output speed in latter case does not match to the first case.

Had both models followed commutative property they would have overlapped each other.

Conclusion

Speed As Predictor - Distance As Output and Distance As Predictor - Speed As Output are not some. They generate two different models. This proves one factor regression is not commutative
Both models generate same \(R^2\) value, regression model can explain 65.11% of the variation in data.
One has to be cautious in building the model especially one-factor regression.

Akula-DATA605-Week12-Discussion

Pavan Akula

November 13, 2017

Linear Model for Case 1

Linear Model for Case 2

Conclusion