This is an hypothesis testing of the ‘airquality’ data in R. To determine the correlation between Ozone and temperature.

str(airquality)
## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Correlation measure that describes dependency between random variables.

airqualitydata <- na.omit(as.data.frame(airquality))
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
plot(airquality$Ozone, airquality$Temp)

cor.test (airquality$Ozone, airquality$Temp)
## 
##  Pearson's product-moment correlation
## 
## data:  airquality$Ozone and airquality$Temp
## t = 10.418, df = 114, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5913340 0.7812111
## sample estimates:
##       cor 
## 0.6983603

BElow are three widely used methods for calculating the regression relationship. Pearson Correlation Coefficient Spearman’s Correlation Kendall’s Tau

cor(airquality$Ozone, airquality$Temp, method = "spearman")
## [1] NA
cor(airqualitydata$Ozone, airqualitydata$Temp, method = "spearman")
## [1] 0.7729319
plot(airqualitydata$Temp, airqualitydata$Ozone, col="red", pch =19)

R square = 0.0 means the model has no predictive value R square = 1.0 means the model predicts perfectly There exists a positive correlation between Ozone and Temperature.

library(graphics)
pairs(airqualitydata, panel = panel.smooth, main = "airquality data")

cor(airqualitydata$Ozone, airqualitydata$Temp, method = "kendall")
## [1] 0.5861471
cor(airqualitydata$Ozone, airqualitydata$Temp, method = "pearson")
## [1] 0.6985414
plot(airquality$Wind, airquality$Ozone, col="red", pch =19)

Observed and Predicted The goodness of fit of the data.

mod1 <- lm(airqualitydata$Ozone ~ airqualitydata$Temp, data=airqualitydata)
summary(mod1)
## 
## Call:
## lm(formula = airqualitydata$Ozone ~ airqualitydata$Temp, data = airqualitydata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.922 -17.459  -0.874  10.444 118.078 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -147.6461    18.7553  -7.872 2.76e-12 ***
## airqualitydata$Temp    2.4391     0.2393  10.192  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.92 on 109 degrees of freedom
## Multiple R-squared:  0.488,  Adjusted R-squared:  0.4833 
## F-statistic: 103.9 on 1 and 109 DF,  p-value: < 2.2e-16
plot(mod1)

ggplot(airqualitydata, aes(x = Ozone, y = Temp)) +
  xlab("Ozone") + 
  ylab("Temp") +
  geom_point() +
  geom_line() +
  ggtitle("Relationship between 'Ozone' and 'Temp'") +
  stat_smooth(method = "loess", formula = y ~ x, size = 1, col = "red")