In this R activity, you’ll graphically add a weird point and see the impact of this point on the least-squares fit. You’ll explore the use of three resistant fitting procedures in fitting a line to data with outlying points.
Install the package TeachingDemos into R.
Define the following (x, y) data and run the function put.points.demo on this data:
library(LearnEDAfunctions)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggplot2
x=c(2,2,4,5,6,7,8,9,10)
y=c(7,8,6,7,4,6,4,6,3)
plot(x,y)
Record the equation of the least-squares line below: \(Y=-0.44X+8.25\) (found using put.points.demo)
x1=c(2,2,4,5,6,7,8,9,10,12)
y1=c(7,8,6,7,4,6,4,6,3,18)
plot(x1,y1)
Point you added: \((12,18)\)
New least-squares fit: \(Y=0.38X+4.41\) (found using put.points.demo)
Original (x, y) data: Equation of the resistant line: \(Y=-0.43X+5.53\)
old_data<-data.frame(x,y)
old_rline<-rline(y~x, old_data)
print(old_rline)
## $a
## [1] 5.52381
##
## $b
## [1] -0.4285714
##
## $xC
## [1] 6
##
## $half.slope.ratio
## [1] 2.666667
##
## $residual
## [1] -0.2380952 0.7619048 -0.3809524 1.0476190 -1.5238095 0.9047619 -0.6666667
## [8] 1.7619048 -0.8095238
##
## $spoints.x
## [1] 2 6 9
##
## $spoints.y
## [1] 7 6 4
New data with additional point: Equation of the resistant line: \(Y=-0.125X+5.96\)
new_data<-data.frame(x1,y1)
new_rline<-rline(y1~x1, new_data)
print(new_rline)
## $a
## [1] 5.958333
##
## $b
## [1] -0.125
##
## $xC
## [1] 6.5
##
## $half.slope.ratio
## [1] -0.6428571
##
## $residual
## [1] 0.4791667 1.4791667 -0.2708333 0.8541667 -2.0208333 0.1041667
## [7] -1.7708333 0.3541667 -2.5208333 12.7291667
##
## $spoints.x
## [1] 2.0 6.5 10.0
##
## $spoints.y
## [1] 7 5 6
Compare the least-squares and resistant fits for the original data and for the new data. Have you demonstrated in this example that the resistant fit is indeed resistant to outlying points? Adding one outlier made the slope of the least-squares line go from negative to positive, which is a very drastic change. In comparison, the resistant line equation hardly changed when this outlier was added to the data. This demonstrates that the resistant fit is more resistant to outlying points.
Demonstrate the differences between the two fits for the new data by plotting the data and both lines on the same graph using contrasting colors. In the plot below, the blue line is the resistant line, and the red line is the least-squares line.
ggplot(new_data, aes(x1, y1)) +
geom_point() +
geom_abline(slope=-0.125, intercept=5.96, col="blue") +
geom_abline(slope=0.38, intercept=4.41, col="red")
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:LearnEDAfunctions':
##
## farms
## The following object is masked from 'package:dplyr':
##
## select
fit_rlm<-rlm(y1~x1)
fit_lqs<-lqs(y1~x1)
summary(fit_rlm)
##
## Call: rlm(formula = y1 ~ x1)
## Residuals:
## Min 1Q Median 3Q Max
## -1.8623 -1.1078 0.2561 0.8966 13.6644
##
## Coefficients:
## Value Std. Error t value
## (Intercept) 7.3889 1.1584 6.3784
## x1 -0.2544 0.1602 -1.5885
##
## Residual standard error: 1.498 on 8 degrees of freedom
summary(fit_lqs)
## Length Class Mode
## crit 1 -none- numeric
## sing 1 -none- character
## coefficients 2 -none- numeric
## bestone 2 -none- numeric
## fitted.values 10 -none- numeric
## residuals 10 -none- numeric
## scale 2 -none- numeric
## terms 3 terms call
## call 2 -none- call
## xlevels 0 -none- list
## model 2 data.frame list