In this R activity, you’ll graphically add a weird point and see the impact of this point on the least-squares fit. You’ll explore the use of three resistant fitting procedures in fitting a line to data with outlying points.
library(TeachingDemos)
x = c(2,2,4,5,6,7,8,9,10)
y = c(7,8,6,7,4,6,4,6,3)
library(TeachingDemos) x=c(2,2,4,5,6,7,8,9,10) y=c(7,8,6,7,4,6,4,6,3) put.points.demo(x,y)
Record the equation of the least-squares line below:
y = 8.25 + (-0.44)x
x1=c(2,2,2,4,5,6,7,8,9,10)
y1=c(2,7,8,6,7,4,6,4,6,3)
Point you added: (2,2)
New least-squares fit: y = 6.43 + (-0.31)x
library(LearnEDAfunctions)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggplot2
df <- data.frame(x, y)
rline(y ~ x, df, iter=10)
## $a
## [1] 5.5
##
## $b
## [1] -0.5
##
## $xC
## [1] 6
##
## $half.slope.ratio
## [1] 2.666667
##
## $residual
## [1] -0.5 0.5 -0.5 1.0 -1.5 1.0 -0.5 2.0 -0.5
##
## $spoints.x
## [1] 2 6 9
##
## $spoints.y
## [1] 7 6 4
df1 <- data.frame(x1, y1)
rline(y1 ~ x1, df1, iter=10)
## $a
## [1] 5.5
##
## $b
## [1] -0.5
##
## $xC
## [1] 5.5
##
## $half.slope.ratio
## [1] 2
##
## $residual
## [1] -5.25 -0.25 0.75 -0.25 1.25 -1.25 1.25 -0.25 2.25 -0.25
##
## $spoints.x
## [1] 2.0 5.5 9.0
##
## $spoints.y
## [1] 7 6 4
Original (x, y) data: y = 8.25 + (-0.44)x Equation of the resistant line: y = 5.5 + (-0.5)(x − 6) y = 8.5 + (-0.5)x
New data with additional point: y = 6.43 + (-0.31)x Equation of the resistant line: y = 5.5 + (-0.5)(x - 5.5) y = 8.25 + (-0.5)x
Compare the least-squares and resistant fits for the original data and for the new data. Have you demonstrated in this example that the resistant fit is indeed resistant to outlying points? Yes, we have demonstrated in this example that the resistant fit is indeed resistant to outlying points. When looking at the least-squares fits, we can tell that adding that single point had a rather large impact on our lsr line. However, when comparing the two equations of the resistant lines, we see that adding a point barely changed our equation. This shows that the resistant fit proves to be resistant to outlying points.
Demonstrate the differences between the two fits for the new data by plotting the data and both lines on the same graph using contrasting colors. I have plotted the least-squares fit in red and the resistant fit in blue
ggplot(df1, aes(x1,y1)) +
geom_point() +
geom_abline(slope = -0.31,
intercept = 6.43, color = "red") +
geom_abline(slope = -0.5,
intercept = 8.25, color = "blue")
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:LearnEDAfunctions':
##
## farms
## The following object is masked from 'package:dplyr':
##
## select
lqs(y1 ~ x1, data = df1)
## Call:
## lqs.formula(formula = y1 ~ x1, data = df1)
##
## Coefficients:
## (Intercept) x1
## 7.633 -0.200
##
## Scale estimates 1.098 1.628
rlm(y1 ~ x1, data = df1)
## Call:
## rlm(formula = y1 ~ x1, data = df1)
## Converged in 6 iterations
##
## Coefficients:
## (Intercept) x1
## 7.1004119 -0.2912616
##
## Degrees of freedom: 10 total; 8 residual
## Scale estimate: 1.88