Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages ------------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'tidyr' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.3
## Warning: package 'purrr' was built under R version 3.5.3
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'forcats' was built under R version 3.5.3
## -- Conflicts --------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
cat <- read.csv("https://raw.githubusercontent.com/Zchen116/assignments/master/catsM.csv",header=TRUE, sep=",")
head(cat)
## X Sex Bwt Hwt
## 1 1 M 2.0 6.5
## 2 2 M 2.0 6.5
## 3 3 M 2.1 10.1
## 4 4 M 2.2 7.2
## 5 5 M 2.2 7.6
## 6 6 M 2.2 7.9
summary(cat)
## X Sex Bwt Hwt
## Min. : 1 M:97 Min. :2.0 Min. : 6.50
## 1st Qu.:25 1st Qu.:2.5 1st Qu.: 9.40
## Median :49 Median :2.9 Median :11.40
## Mean :49 Mean :2.9 Mean :11.32
## 3rd Qu.:73 3rd Qu.:3.2 3rd Qu.:12.80
## Max. :97 Max. :3.9 Max. :20.50
cats <- cat[c(2:4)]
colnames(cats) <- c("sex", "weight", "height")
head(cats)
## sex weight height
## 1 M 2.0 6.5
## 2 M 2.0 6.5
## 3 M 2.1 10.1
## 4 M 2.2 7.2
## 5 M 2.2 7.6
## 6 M 2.2 7.9
visualization
cats_lm <- lm(cats$height ~ cats$weight)
plot(cats$weight, cats$height, main = "Weight Data for Domestic Cats", xlab = "Weight", ylab="Height")
abline(cats_lm)

summary(cats_lm)
##
## Call:
## lm(formula = cats$height ~ cats$weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7728 -1.0478 -0.2976 0.9835 4.8646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1841 0.9983 -1.186 0.239
## cats$weight 4.3127 0.3399 12.688 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.557 on 95 degrees of freedom
## Multiple R-squared: 0.6289, Adjusted R-squared: 0.625
## F-statistic: 161 on 1 and 95 DF, p-value: < 2.2e-16
plot(fitted(cats_lm),resid(cats_lm))

qqnorm(resid(cats_lm))
qqline(resid(cats_lm))

According to Q-Q plot graph, our linear model Was appropriate.