Data 605 Week 11 Discussion

Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?

This analysis look at the diamonds dataset and measures the relationship between the carat and price attributes. The diamonds dataset contains the prices and attributes for more that 54,000 diamonds.

 plot(diamonds$carat, diamonds$price, main="Diamonds",
xlab="Diamonds", ylab="Price")

diamonds.lm <- lm(price ~ carat, data=diamonds)

summary(diamonds.lm)

## 
## Call:
## lm(formula = price ~ carat, data = diamonds)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18585.3   -804.8    -18.9    537.4  12731.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2256.36      13.06  -172.8   <2e-16 ***
## carat        7756.43      14.07   551.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1549 on 53938 degrees of freedom
## Multiple R-squared:  0.8493, Adjusted R-squared:  0.8493 
## F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16

plot(fitted(diamonds.lm),resid(diamonds.lm))

qqnorm(resid(diamonds.lm))
qqline(resid(diamonds.lm))

par(mfrow=c(2,2))
plot(diamonds.lm)

7756.43/14.07

## [1] 551.2743

The correlation coefficient divided by the standard error has a ratio of 551.2743. The large ration lets us know that the amount of variability is very small. The p value is less than 0.05, we can conclude that there is a strong linear relation between a diamond’s carat and its price. For this pair of attributes the linear model was appropriate.

Data 605 Week 11 Discussion

2022-04-01