First, we need to load the required packages, which are ggplot2 and datasets:
library(ggplot2)
library(datasets)
Next, we do frequency histograms for four of the numerical quantities which are mean ozone, solar radiation, average wind speed, and maximum daily temperature:
#This histogram is skewed to the right.
qplot(Ozone, ylab = "number of observations", data = airquality, binwidth = 10)
#This histogram is skewed to the left.
qplot(Solar.R, ylab = "number of observations", data = airquality, binwidth = 10)
#There are more observations that have a lower wind speed.
qplot(Wind, ylab = "number of observations", data = airquality, binwidth = 10)
#This histogram is skewed to the left.
qplot(Temp, ylab = "number of observations", data = airquality,
binwidth = 10)
Next, we obtain the box and whisker plots for solar radiation and maximum daily temperature:
#The mean is at 185.9 Langleys.
boxplot(airquality$`Solar.R`)
#The mean is at 77.88 degrees F
boxplot(airquality$`Temp`)
The summary for the data is as follows:
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
We also obtain the linear correlation coefficient between temperature and solar radiation as well as between average wind speed and maximum daily temperature:
#There is absolutely no correlation.
cor(airquality$`Temp`, airquality$`Solar.R`)
## [1] NA
#There is absolutely no correlation.
cor(airquality$`Wind`, airquality$`Ozone`)
## [1] NA
Finally, we obtain a scatterplot for solar radiation and maximum temperature with a line of best fit:
ggplot(data = airquality, aes(x = Solar.R, y = Temp)) +
geom_point(shape = 1) +
geom_smooth(method = lm)
## Warning: Removed 7 rows containing missing values (stat_smooth).
## Warning: Removed 7 rows containing missing values (geom_point).
Scatterplot for mean ozone and average windspeed with a line of best fit:
ggplot(data = airquality, aes(x = Ozone, y = Wind)) +
geom_point(shape = 1) +
geom_smooth(method = lm)
## Warning: Removed 37 rows containing missing values (stat_smooth).
## Warning: Removed 37 rows containing missing values (geom_point).
Scatterplot for solar radiation and maximum temperature with a Loess curve of best fit:
ggplot(data = airquality, aes(x = Solar.R, y = Temp)) +
geom_point(shape = 1) +
geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
## Warning: Removed 7 rows containing missing values (stat_smooth).
## Warning: Removed 7 rows containing missing values (geom_point).
Scatterplot for mean ozone and average windspeed with a Loess curve of best fit:
ggplot(data = airquality, aes(x = Ozone, y = Wind)) +
geom_point(shape = 1) +
geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
## Warning: Removed 37 rows containing missing values (stat_smooth).
## Warning: Removed 37 rows containing missing values (geom_point).
#http://rpubs.com/catlin/DPBWeek6
#http://www.cookbook-r.com/Graphs/Scatterplots_(ggplot2)/