To manipulate the airquality data of New York we have better, I wanted to first get all of my data into R and then create the new variables discussed in the unit to make the data more meaningful for any queries I wanted to do. First I I loaded the data set in the R ,Since there are too many data content, I just load the first six lines of the dataset and the six line at the end, and then summarize the data briefly.
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
tail(airquality)
## Ozone Solar.R Wind Temp Month Day
## 148 14 20 16.6 63 9 25
## 149 30 193 6.9 70 9 26
## 150 NA 145 13.2 77 9 27
## 151 14 191 14.3 75 9 28
## 152 18 131 8.0 76 9 29
## 153 20 223 11.5 68 9 30
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Then I made a scatter of the wind speed and temperature of every month.
library(ggplot2)
airquality$Month<-factor(airquality$Month)
qplot(Wind,Temp,data=airquality,color=Month)
Then I add regression analysis to the scatter plot.
library(ggplot2)
airquality$Month<-factor(airquality$Month)
qplot(Wind,Temp,data=airquality,color=Month,geom = c("point","smooth"),facets = .~Month)
## `geom_smooth()` using method = 'loess'
ggplot(airquality,aes(Wind,Temp))+
geom_point(aes(color=factor(Month)))+
geom_smooth(se=FALSE,aes(color=factor(Month)))
## `geom_smooth()` using method = 'loess'
Then I performed linear regression analysis.
library(ggplot2)
airquality$Month<-factor(airquality$Month)
ggplot(airquality,aes(Wind,Temp))+
geom_point(aes(color=factor(Month)))+
geom_smooth(method="lm",se=FALSE,aes(color=factor(Month)))
Finally, I fit all the regression lines into a whole.
library(ggplot2)
airquality$Month<-factor(airquality$Month)
ggplot(airquality,aes(Wind,Temp))+
geom_point(aes(color=factor(Month)))+
geom_smooth(method="lm",se=FALSE,aes(color=factor(Month),group=1))
It appears based on this graph with aggregate data that the temperature is negatively related to wind speed, In other words , the temperature decreases with the increase of wind speed.