Problem: to model a linear regression analysis of two variables, one of which is the percentage of individuals in the county with at least a high-school diploma (column dip), and the other is the crime rate per 100,000 residents for the counties (column rate).
  1. Scatter plot:

  1. The estimated regression line:
## 
## Call:
## lm(formula = crime$rate ~ crime$dip)
## 
## Coefficients:
## (Intercept)    crime$dip  
##     20517.6       -170.6
  1. Since the slope is negative, we can conclude that increase by 1% in high-school graduation will lead to decrease in crime rate by 170.6 points in population of 100000 residents.

  2. Outliers: (78,14016), (77,2105)

  3. QQ plot of the residuals:

As can be seen from the QQ-Plot, residuals are not normally distributed.

  1. Plot of the errors vs. the fitted values:

According to the graph, variance of the errors is not constant.

  1. 95% confidence interval for the slope:
##                  2.5 %      97.5 %
## (Intercept) 13997.3245 27037.87538
## crime$dip    -253.2798   -87.87061

I am 95% confident that increasing high-school graduation by 1% will lead to decrease of crime rate by a number in the interval between -253 and -87. However, the interval does not suggest that there is a strong linear relationship since the diffrence between the bounds is pretty big.

Appendix Code

crime=read.table("~/Desktop/crime.csv", header=TRUE, sep=",")
#1.a
plot(crime$dip,crime$rate,xlab = "diploma %", ylab = "crime rate", main = "Diploma vs Crime rate",pch = 20, col = "darkblue")
the.model = lm(crime$rate ~ crime$dip)
abline(the.model,lwd = 2)
#1.b
the.model
#1.d

#1.e
qqnorm(the.model$residuals, col="darkblue")
qqline(the.model$residuals)
#1.f
plot(the.model$fitted.values,the.model$residuals,xlab = "Fitted Values",ylab = "errors",main = "Constancy of variance of the errors", col="darkblue")

#1.g
confint(the.model,level = 0.95)