Plot the Scatter Plot of the Data

Figure 1 is a scatter plot of the Days vs the Index.

Figure 1: Scattor Plot

Figure 1: Scattor Plot

What are the Least Squares estimates of the parameters of a simple linear regression?

The Least Squares Element for \(\beta_0\) and \(\beta_1\) are -192.984 and 15.296. Now substituting \(\beta_0\) and \(\beta_1\) in a simiple linear equation, The predicted linear equation is Days = -192.984 + 15.296*Index.

Plot the Least Squares Line on the Scatter Plot

Figure 2 is a Least Squares fit of the data using the equation Days = -192.984 + 15.296*Index.

Figure 2: Liner Fit Grapth

Figure 2: Liner Fit Grapth

Are the Linear Model assumptions valid?

Normal Probability Plot

Figure 3 is the plotted Normal Probability Plot of our linear model.

With most of the standardized residuals flowing along the line, we determined the the Normality assumption is a valid assumption for this model.

Figure 3: Normal Probability Plot

Figure 3: Normal Probability Plot

Constant Variance

We determined that Figure 4 shows a random scatter and we determined that the constant variance assumption is a valid assumption for your linear model.

Figure 4: Constant Variance Plot

Figure 4: Constant Variance Plot

Is the Regession Model signficant?

## 
## Call:
## lm(formula = Days ~ Index, data = FA4_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## Index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

\(H_0: \hat\beta_1 = 0\)

\(H_a: \hat\beta_1 \neq 0\)

Using the Statistics in the summary table, the \(t_0\) score to be 1.624,a p-value of 0.1267, and \(R^2\) = 0.1585. After analyzing the \(t_0\) score the p-value, we determined that model is not significant and we accept the null hypothesis that \(\hat\beta_1 = 0\).

Since the \(R^2\) is low, we determined that model doesn’t accurately model the variation between the two variables.

Code

library(tidyverse)
library(readxl)
library(ggpmisc)


FA4_data <- read_excel(file.choose())

FA4_model<-lm(data = FA4_data,Days~Index)




Scater_plot<-ggplot(data =FA4_data,aes(Index,Days))+geom_point(colour = "blue3")+ ggtitle("Regression Analysis: Days vs Index")+
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())




Scater_plot+geom_smooth(method = "lm",colour ="red4",fill = "bisque")

Regression_coeffeicients <-round(FA4_model$coefficients,2)

FA4_data$Residuals <- FA4_model$residuals
FA4_data$Fitted_values <- FA4_model$fitted.values

## Normal QQ plot
plot(FA4_model, which =2)

## Constant variance Cheeck
ggplot(data = FA4_data,aes(y=Residuals,x=Fitted_values))+geom_point(colour = "blue3")+theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())


## Independence: They look like they are decreasing over time
# number_of_rows <- data.frame(seq(1,nrow((FA4_data)),1))
# New_data<- cbind.data.frame(number_of_rows,FA4_model$residuals)
# colnames(New_data)<- c("Time Order","Residuals")



# ggplot(data=New_data,aes(`Time Order`,Residuals))+geom_point(colour = "blue3")+theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())+
#   geom_hline(yintercept = 0)





summary(FA4_model)