One factor linear regression model

Read the data

df <- read.csv("int92.csv")

Our variable of interest

Performance dependent variable and clock independent variable.

plot(df[,"clock"],df[,"perf"], main="Int2000",
    xlab="Clock", ylab="Performance")

From the graph we can see that there is a linear relationship between them.

Fit the model

fit <- lm(perf~clock, data = df)
summary(fit)

## 
## Call:
## lm(formula = perf ~ clock, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -112.677  -34.603    0.681   24.328  158.241 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.02525   12.24693   1.309    0.195    
## clock        0.80239    0.07982  10.053 1.32e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 51.5 on 76 degrees of freedom
## Multiple R-squared:  0.5708, Adjusted R-squared:  0.5651 
## F-statistic: 101.1 on 1 and 76 DF,  p-value: 1.32e-15

The model R squared value 0.6505 explains 65% variability in performance due to clock.

The slope Parameter clock p value is <2e-16 is < 0.05 significance at 5% level of significance.

Fitted regression model : \(Performance_{i} = 51.78709 + 0.58635 * clock\)

Residual Analysis

par(mfrow = c(2,2))
plot(fit)

From the residuals vs fitted plot we can see that the residuals show pattern and the assumption of equal variance is violated and from the normal q-q plot we can also see that normality of the data is not met.

DATA 605 Exercise Regression text Chapters 1-3

Yohannes Deboch

04/10/2019

One factor linear regression model