Look at the basic structure and find summary statistics of the data.
data("airquality")
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
The class of the variable Month is integer. Convert it into factor and change its values into its original factor values.
airquality <- airquality %>%
mutate(Month = recode_factor(Month, "5" = "May", "6" = "June", "7" = "July", "8" = "August", "9" = "September"))
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : Factor w/ 5 levels "May","June","July",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Create a histogram of the variable Temp stacked up by Month using qplot.
qplot(data = airquality, Temp, geom = "histogram", bins = 10, fill = Month)
Create a histogram of the variable Temp stacked up by Month using ggplot with a title centered and label the axes.
ggplot(airquality) +
geom_histogram(aes(Temp, fill = Month), color = "black", binwidth = 5, alpha = 0.8) +
labs(x = "Temperature", y = "Frequency",
title = "Histogram of Temperature") +
theme(plot.title = element_text(hjust = 0.5))
Create side-by-side boxplots of the variable Temp categorized by Month.
ggplot(airquality) +
geom_boxplot(aes(Temp, Month, fill = Month)) +
labs(x = "Temperature", y = "Month",
title = "Boxplots of Temp vs Month") +
theme(plot.title = element_text(hjust = 0.5))
Create side-by-side boxplots of the variable Temp categorized by Month in grey scale. Look at the boxplot for May and there is no line inside the box for median. So, the feature grey scale may not be good for boxplots.
ggplot(airquality) +
geom_boxplot(aes(Temp, Month, fill = Month)) +
labs(x = "Temperature", y = "Month",
title = "Boxplots of Temp vs Month") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_grey()
Create a scatterplot of the variables Wind and Ozone and sketch a regression line with standard error.
ggplot(airquality, aes(Wind, Ozone)) +
geom_point(aes(color = Month)) +
geom_smooth(method = "lm") +
labs(title = "Scatterplot of Wind vs Ozone") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_color_viridis_d(alpha = 0.8)