This is an hypothesis testing of the ‘airquality’ data in R. To determine the correlation between Ozone and temperature.
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Correlation measure that describes dependency between random variables.
airqualitydata <- na.omit(as.data.frame(airquality))
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
plot(airquality$Ozone, airquality$Temp)
cor.test (airquality$Ozone, airquality$Temp)
##
## Pearson's product-moment correlation
##
## data: airquality$Ozone and airquality$Temp
## t = 10.418, df = 114, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5913340 0.7812111
## sample estimates:
## cor
## 0.6983603
BElow are three widely used methods for calculating the regression relationship. Pearson Correlation Coefficient Spearman’s Correlation Kendall’s Tau
cor(airquality$Ozone, airquality$Temp, method = "spearman")
## [1] NA
cor(airqualitydata$Ozone, airqualitydata$Temp, method = "spearman")
## [1] 0.7729319
plot(airqualitydata$Temp, airqualitydata$Ozone, col="red", pch =19)
R square = 0.0 means the model has no predictive value R square = 1.0 means the model predicts perfectly There exists a positive correlation between Ozone and Temperature.
library(graphics)
pairs(airqualitydata, panel = panel.smooth, main = "airquality data")
cor(airqualitydata$Ozone, airqualitydata$Temp, method = "kendall")
## [1] 0.5861471
cor(airqualitydata$Ozone, airqualitydata$Temp, method = "pearson")
## [1] 0.6985414
plot(airquality$Wind, airquality$Ozone, col="red", pch =19)
Observed and Predicted The goodness of fit of the data.
mod1 <- lm(airqualitydata$Ozone ~ airqualitydata$Temp, data=airqualitydata)
summary(mod1)
##
## Call:
## lm(formula = airqualitydata$Ozone ~ airqualitydata$Temp, data = airqualitydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.922 -17.459 -0.874 10.444 118.078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -147.6461 18.7553 -7.872 2.76e-12 ***
## airqualitydata$Temp 2.4391 0.2393 10.192 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.92 on 109 degrees of freedom
## Multiple R-squared: 0.488, Adjusted R-squared: 0.4833
## F-statistic: 103.9 on 1 and 109 DF, p-value: < 2.2e-16
plot(mod1)
ggplot(airqualitydata, aes(x = Ozone, y = Temp)) +
xlab("Ozone") +
ylab("Temp") +
geom_point() +
geom_line() +
ggtitle("Relationship between 'Ozone' and 'Temp'") +
stat_smooth(method = "loess", formula = y ~ x, size = 1, col = "red")