\(~\)
data <- read.table("./Diretorio/exer.txt", header = TRUE)
data <- data.frame(Advertisement = data$Advert., Sales = data$Sales)
plot(data, main = "Scatter Diagram", pch = 19, col = "brown4")
I expect there to be a regression line on the bottom half of the diagram, with the one observation on the mark of 50 sales being an outlier.
\(~\)
Regression <- lm(Sales~Advertisement, data = data)
summary(Regression)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.626893 4.881527 6.0691852 9.784098e-06
## Advertisement -0.324575 0.458911 -0.7072722 4.884540e-01
plot(data, main = "Scatter Diagram", pch = 19, col = "brown4")
abline(Regression)
The values of B fail to reject the null hypothesis, since the standard error is above the interval of -2 and 2, and the t-value of b is neither below -2, nor above 2. This means that B is not significantly different from 0.
\(~\)
sort(data.frame(Residuals = Regression$residuals)$Residuals)
## [1] -4.67944359 -4.35486862 -4.03029366 -3.05656878 -3.03029366 -2.70571870
## [7] -2.67944359 -1.73199382 -1.70571870 -1.70571870 -1.05656878 -0.05656878
## [13] 0.26800618 0.59258114 0.59258114 0.94343122 1.26800618 2.24173107
## [19] 2.56630603 22.32055641
hist(Regression$residuals, xlab = "Residuals", main = "Residuals Histogram", col = "brown4")
Most residuals are located in the range of 5 data points above or below the regression line, while one single outlier is located in the range of 20 to 25 points above it.
\(~\)
I would delete the outlier residual, since it corresponds to an unusual and irregular event.
\(~\)
data <- data[-12,]
Regression <- lm(Sales~Advertisement, data = data)
summary(Regression)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.125 0.95484809 22.123938 5.715772e-14
## Advertisement 0.375 0.08819642 4.251873 5.379453e-04
plot(data, main = "Scatter Diagram", pch = 19, col = "brown4")
abline(Regression)
In this case, the values of B are able to reject the null hypothesis, since the standard error is within the interval of -2 and 2, and the t-value of b is above 2. This means that B is significantly different from 0.
\(~\)
It’s evident that by removing the irregular outlier the correlation between both variables became clearer and more reliable. I’ve leaned from these results that it’s important to analyse the given data critically in to order to avoid misguiding results due to random unrelated circumstances.
\(~\) \(~\) \(~\)