Structure and Summary of the data Airquality

Look at the basic structure and find summary statistics of the data.

data("airquality")
str(airquality)
## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
summary(airquality)
##      Ozone           Solar.R           Wind             Temp      
##  Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
##  1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
##  Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
##  Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
##  3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
##  Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
##  NA's   :37       NA's   :7                                       
##      Month            Day      
##  Min.   :5.000   Min.   : 1.0  
##  1st Qu.:6.000   1st Qu.: 8.0  
##  Median :7.000   Median :16.0  
##  Mean   :6.993   Mean   :15.8  
##  3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :9.000   Max.   :31.0  
## 

Conversion of the Class of Month

The class of the variable Month is integer. Convert it into factor and change its values into its original factor values.

airquality <- airquality %>% 
  mutate(Month = recode_factor(Month, "5" = "May", "6" = "June", "7" = "July", "8" = "August", "9" = "September"))
str(airquality)
## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : Factor w/ 5 levels "May","June","July",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Plot 1: Stacked Histogram - qplot

Create a histogram of the variable Temp stacked up by Month using qplot.

qplot(data = airquality, Temp, geom = "histogram", bins = 10, fill = Month)

Plot 2: Stacked Histogram - ggplot

Create a histogram of the variable Temp stacked up by Month using ggplot with a title centered and label the axes.

ggplot(airquality) +
  geom_histogram(aes(Temp, fill = Month), color = "black", binwidth = 5, alpha = 0.8) +
  labs(x = "Temperature", y = "Frequency",
       title = "Histogram of Temperature") +
  theme(plot.title = element_text(hjust = 0.5))

Plot 3: Side-by-side Boxplots

Create side-by-side boxplots of the variable Temp categorized by Month.

ggplot(airquality) + 
  geom_boxplot(aes(Temp, Month, fill = Month)) +
  labs(x = "Temperature", y = "Month",
       title = "Boxplots of Temp vs Month") +
  theme(plot.title = element_text(hjust = 0.5))

Plot 4: Side-by-side Boxplots in grey scale

Create side-by-side boxplots of the variable Temp categorized by Month in grey scale. Look at the boxplot for May and there is no line inside the box for median. So, the feature grey scale may not be good for boxplots.

ggplot(airquality) +
  geom_boxplot(aes(Temp, Month, fill = Month)) +
  labs(x = "Temperature", y = "Month",
       title = "Boxplots of Temp vs Month") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_grey()

Plot 5: Scatterplot

Create a scatterplot of the variables Wind and Ozone and sketch a regression line with standard error.

ggplot(airquality, aes(Wind, Ozone)) +
  geom_point(aes(color = Month)) +
  geom_smooth(method = "lm") +
  labs(title = "Scatterplot of Wind vs Ozone") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_color_viridis_d(alpha = 0.8)

The End