Look at the summary statistics of the dataset, and see how Month has changed to have characters instead of numbers
str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : chr "May" "May" "May" "May" ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
summary(airquality)
Ozone Solar.R Wind Temp
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
NA's :37 NA's :7
Month Day
Length:153 Min. : 1.0
Class :character 1st Qu.: 8.0
Mode :character Median :16.0
Mean :15.8
3rd Qu.:23.0
Max. :31.0
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 May 1
2 36 118 8.0 72 May 2
3 12 149 12.6 74 May 3
4 18 313 11.5 62 May 4
5 NA NA 14.3 56 May 5
6 28 NA 14.9 66 May 6
Month is a categorical variable with different levels, called factors.
Reorder the Months so they do not default to alphabetical
Brief essay: Plot 5 uses ggplot to create boxplot catergorized by months from May to September. Month in value y is filled with color, while value x is for Solar Radiation. Apparently, Solar Radiation in May reaches the highest. I did fill=Month command in each boxplot with a different color to help with a clearer visualization for this dataset.