library(datasets)
library(knitr)
cars.data <- cars
One factor regression is not commutative. To prove it I will be using cars dataset.
Since both columns, speed and dist are continuous variables a regression model can be built two different ways.
Estimating distance given speed of the car
Speed As Predictor. Distance As Output.
Estimating speed given the distance car will stop
Distance As Predictor. Speed As Output.
case1.lm <- lm(cars.data$dist ~ cars.data$speed)
case1.lm
##
## Call:
## lm(formula = cars.data$dist ~ cars.data$speed)
##
## Coefficients:
## (Intercept) cars.data$speed
## -17.579 3.932
summary(case1.lm)
##
## Call:
## lm(formula = cars.data$dist ~ cars.data$speed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## cars.data$speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Regression model for Case 1 is \(distance = -17.5791 + 3.9324 * speed\)
case2.lm <- lm(cars.data$speed ~ cars.data$dist)
case2.lm
##
## Call:
## lm(formula = cars.data$speed ~ cars.data$dist)
##
## Coefficients:
## (Intercept) cars.data$dist
## 8.2839 0.1656
summary(case2.lm)
##
## Call:
## lm(formula = cars.data$speed ~ cars.data$dist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5293 -2.1550 0.3615 2.4377 6.4179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.28391 0.87438 9.474 1.44e-12 ***
## cars.data$dist 0.16557 0.01749 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Regression model for Case 2 is \(speed = 8.2839 + 0.1656 * distance\)
case1.speed = seq(10, 30, by=1)
case1.distance <- -17.5791 + (3.9324 * case1.speed)
case1.df <- data.frame(case1.speed,case1.distance)
colnames(case1.df) <- c('Speed(mph)', 'Distance(feet)')
kable(case1.df, align='l', caption = "Regression model estimates - Speed As Predictor - Distance As Output")
| Speed(mph) | Distance(feet) |
|---|---|
| 10 | 21.7449 |
| 11 | 25.6773 |
| 12 | 29.6097 |
| 13 | 33.5421 |
| 14 | 37.4745 |
| 15 | 41.4069 |
| 16 | 45.3393 |
| 17 | 49.2717 |
| 18 | 53.2041 |
| 19 | 57.1365 |
| 20 | 61.0689 |
| 21 | 65.0013 |
| 22 | 68.9337 |
| 23 | 72.8661 |
| 24 | 76.7985 |
| 25 | 80.7309 |
| 26 | 84.6633 |
| 27 | 88.5957 |
| 28 | 92.5281 |
| 29 | 96.4605 |
| 30 | 100.3929 |
case2.distance <- case1.distance
case2.speed <- 8.2839 + (0.1656 * case2.distance)
case2.df <- data.frame(case2.distance,case2.speed)
colnames(case2.df) <- c('Distance(feet)', 'Speed(mph)')
kable(case2.df, align='l', caption = "Regression model estimates - Distance As Predictor - Speed As Output")
| Distance(feet) | Speed(mph) |
|---|---|
| 21.7449 | 11.88486 |
| 25.6773 | 12.53606 |
| 29.6097 | 13.18727 |
| 33.5421 | 13.83847 |
| 37.4745 | 14.48968 |
| 41.4069 | 15.14088 |
| 45.3393 | 15.79209 |
| 49.2717 | 16.44329 |
| 53.2041 | 17.09450 |
| 57.1365 | 17.74570 |
| 61.0689 | 18.39691 |
| 65.0013 | 19.04812 |
| 68.9337 | 19.69932 |
| 72.8661 | 20.35053 |
| 76.7985 | 21.00173 |
| 80.7309 | 21.65294 |
| 84.6633 | 22.30414 |
| 88.5957 | 22.95535 |
| 92.5281 | 23.60655 |
| 96.4605 | 24.25776 |
| 100.3929 | 24.90896 |
plot(case1.distance,case1.speed,type="l",col="red", xlab = 'Distance', ylab = 'Speed')
lines(case2.distance,case2.speed,col="green")
Red line plots Speed As Predictor - Distance As Output. Green line plots Distance As Predictor - Speed As Output. For the first case, I have used speeds between 10 mph to 30 mph to predict distance. In the second case, I have used the distance from the first instance as a predictor to predict speed. Output speed in latter case does not match to the first case.
Had both models followed commutative property they would have overlapped each other.
commutativeone-factor regression.