Read the data
df <- read.csv("int92.csv")
Our variable of interest
Performance dependent variable and clock independent variable.
plot(df[,"clock"],df[,"perf"], main="Int2000",
xlab="Clock", ylab="Performance")
From the graph we can see that there is a linear relationship between them.
Fit the model
fit <- lm(perf~clock, data = df)
summary(fit)
##
## Call:
## lm(formula = perf ~ clock, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -112.677 -34.603 0.681 24.328 158.241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.02525 12.24693 1.309 0.195
## clock 0.80239 0.07982 10.053 1.32e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 51.5 on 76 degrees of freedom
## Multiple R-squared: 0.5708, Adjusted R-squared: 0.5651
## F-statistic: 101.1 on 1 and 76 DF, p-value: 1.32e-15
The model R squared value 0.6505 explains 65% variability in performance due to clock.
The slope Parameter clock p value is <2e-16 is < 0.05 significance at 5% level of significance.
Fitted regression model : \(Performance_{i} = 51.78709 + 0.58635 * clock\)
Residual Analysis
par(mfrow = c(2,2))
plot(fit)
From the residuals vs fitted plot we can see that the residuals show pattern and the assumption of equal variance is violated and from the normal q-q plot we can also see that normality of the data is not met.