\(~\)

(a) Make the scatter diagram with sales on the vertical axis and advertising on the horizontal axis. What do you expect to find if you would fit a regression line to these data?
data <- read.table("./Diretorio/exer.txt", header = TRUE)
data <- data.frame(Advertisement = data$Advert., Sales = data$Sales)
plot(data, main = "Scatter Diagram", pch = 19, col = "brown4")

I expect there to be a regression line on the bottom half of the diagram, with the one observation on the mark of 50 sales being an outlier.

\(~\)

(b) Estimate the coefficients a and b in the simple regression model with sales as dependent variable and advertising as explanatory factor. Also compute the standard error and t-value of b. Is b significantly different from 0?
Regression <- lm(Sales~Advertisement, data = data)
summary(Regression)$coefficients
##                Estimate Std. Error    t value     Pr(>|t|)
## (Intercept)   29.626893   4.881527  6.0691852 9.784098e-06
## Advertisement -0.324575   0.458911 -0.7072722 4.884540e-01
plot(data, main = "Scatter Diagram", pch = 19, col = "brown4")
abline(Regression)

The values of B fail to reject the null hypothesis, since the standard error is above the interval of -2 and 2, and the t-value of b is neither below -2, nor above 2. This means that B is not significantly different from 0.

\(~\)

(c) Compute the residuals and draw a histogram of these residuals. What conclusion do you draw from this histogram?

sort(data.frame(Residuals = Regression$residuals)$Residuals)
##  [1] -4.67944359 -4.35486862 -4.03029366 -3.05656878 -3.03029366 -2.70571870
##  [7] -2.67944359 -1.73199382 -1.70571870 -1.70571870 -1.05656878 -0.05656878
## [13]  0.26800618  0.59258114  0.59258114  0.94343122  1.26800618  2.24173107
## [19]  2.56630603 22.32055641
hist(Regression$residuals, xlab = "Residuals", main = "Residuals Histogram", col = "brown4")

Most residuals are located in the range of 5 data points above or below the regression line, while one single outlier is located in the range of 20 to 25 points above it.

\(~\)

(d) Apparently, the regression result of part (b) is not satisfactory. Once you realize that the large residual corresponds to the week with opening hours during the evening, how would you proceed to get a more satisfactory regression model?

I would delete the outlier residual, since it corresponds to an unusual and irregular event.

\(~\)

(e) Delete this special week from the sample and use the remaining 19 weeks to estimate the coefficients a and b in the simple regression model with sales as dependent variable and advertising as explanatory factor. Also compute the standard error and t-value of b. Is b significantly different from 0?

data <- data[-12,]
Regression <- lm(Sales~Advertisement, data = data)
summary(Regression)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)     21.125 0.95484809 22.123938 5.715772e-14
## Advertisement    0.375 0.08819642  4.251873 5.379453e-04
plot(data, main = "Scatter Diagram", pch = 19, col = "brown4")
abline(Regression)

In this case, the values of B are able to reject the null hypothesis, since the standard error is within the interval of -2 and 2, and the t-value of b is above 2. This means that B is significantly different from 0.

\(~\)

(f) Discuss the differences between your findings in parts (b) and (e). Describe in words what you have learned from these results.

It’s evident that by removing the irregular outlier the correlation between both variables became clearer and more reliable. I’ve leaned from these results that it’s important to analyse the given data critically in to order to avoid misguiding results due to random unrelated circumstances.
\(~\) \(~\) \(~\)