New York Air Quality Measurements
Ozone numeric Ozone (ppb) Solar.R numeric Solar R (lang) Wind numeric Wind (mph) Temp numeric Temperature (degrees F) Month numeric Month (1–12) Day numeric Day of month (1–31)
#forest <- read.csv("forestfires.csv")
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
pairs(airquality)
To predict numeric Ozone (ppb)
one quadratic term: Temp ^2
# create quadratic term
temp_squared <- (airquality$Temp)^2
# create model
airqualitylm <- lm(airquality$Ozone ~ temp_squared + airquality$Solar.R + airquality$Wind, data = airquality)
summary(airqualitylm)
##
## Call:
## lm(formula = airquality$Ozone ~ temp_squared + airquality$Solar.R +
## airquality$Wind, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.830 -13.799 -3.252 10.089 96.996
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.718665 14.186985 -0.403 0.6877
## temp_squared 0.011208 0.001621 6.916 3.53e-10 ***
## airquality$Solar.R 0.059356 0.022733 2.611 0.0103 *
## airquality$Wind -3.218499 0.644853 -4.991 2.34e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.81 on 107 degrees of freedom
## (42 observations deleted due to missingness)
## Multiple R-squared: 0.6195, Adjusted R-squared: 0.6089
## F-statistic: 58.08 on 3 and 107 DF, p-value: < 2.2e-16
The model is as follows: Ozone = \(Ozone = -5.718665 + 0.011208*Temp^{2} + 0.059356*Solar - 3.218499*Wind\)
It is a a statistically significant predictor of evaluation score with p-value less than 0.05. For Multiple R-squared, the model is around 61% fits the data.
plot(airqualitylm$fitted.values, airqualitylm$residuals)
abline(0,0)
# qqplot
qqnorm(airqualitylm$residuals)
qqline(airqualitylm$residuals)
Q-Q plot are not uniformly scattered and have deviation at lower and quantiles. The residuals does not show randomly.