Objective:

You will use R to analyze the built-in airquality dataset, applying descriptive statistics techniques to explore environmental data. The assignment covers measures of central tendency, spread, histograms, boxplots, scatterplots, correlations, and summary tables, aligning with the Week 6 agenda on Descriptive Statistics.

Dataset

Source: Built-in R dataset airquality.

Description: Contains 153 observations of daily air quality measurements in New York from May to September 1973.

Variables (selected for this assignment):

Notes

-The airquality dataset has missing values in Ozone and Solar.R. The code uses na.rm = TRUE or use = “complete.obs” to handle them.

-If you encounter errors, check that tidyverse and corrplot are installed and loaded.

-Feel free to modify plot aesthetics (e.g., colors, binwidth) to enhance clarity.

Instructions:

Complete the following tasks using R to analyze the airquality dataset. Submit your Rpubs link that includes code, outputs (tables and plots), and written interpretations for each task. Ensure you load the dataset using data(airquality) and install/load the tidyverse and corrplot packages.

#Load your dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
data("airquality")

airquality
##     Ozone Solar.R Wind Temp Month Day
## 1      41     190  7.4   67     5   1
## 2      36     118  8.0   72     5   2
## 3      12     149 12.6   74     5   3
## 4      18     313 11.5   62     5   4
## 5      NA      NA 14.3   56     5   5
## 6      28      NA 14.9   66     5   6
## 7      23     299  8.6   65     5   7
## 8      19      99 13.8   59     5   8
## 9       8      19 20.1   61     5   9
## 10     NA     194  8.6   69     5  10
## 11      7      NA  6.9   74     5  11
## 12     16     256  9.7   69     5  12
## 13     11     290  9.2   66     5  13
## 14     14     274 10.9   68     5  14
## 15     18      65 13.2   58     5  15
## 16     14     334 11.5   64     5  16
## 17     34     307 12.0   66     5  17
## 18      6      78 18.4   57     5  18
## 19     30     322 11.5   68     5  19
## 20     11      44  9.7   62     5  20
## 21      1       8  9.7   59     5  21
## 22     11     320 16.6   73     5  22
## 23      4      25  9.7   61     5  23
## 24     32      92 12.0   61     5  24
## 25     NA      66 16.6   57     5  25
## 26     NA     266 14.9   58     5  26
## 27     NA      NA  8.0   57     5  27
## 28     23      13 12.0   67     5  28
## 29     45     252 14.9   81     5  29
## 30    115     223  5.7   79     5  30
## 31     37     279  7.4   76     5  31
## 32     NA     286  8.6   78     6   1
## 33     NA     287  9.7   74     6   2
## 34     NA     242 16.1   67     6   3
## 35     NA     186  9.2   84     6   4
## 36     NA     220  8.6   85     6   5
## 37     NA     264 14.3   79     6   6
## 38     29     127  9.7   82     6   7
## 39     NA     273  6.9   87     6   8
## 40     71     291 13.8   90     6   9
## 41     39     323 11.5   87     6  10
## 42     NA     259 10.9   93     6  11
## 43     NA     250  9.2   92     6  12
## 44     23     148  8.0   82     6  13
## 45     NA     332 13.8   80     6  14
## 46     NA     322 11.5   79     6  15
## 47     21     191 14.9   77     6  16
## 48     37     284 20.7   72     6  17
## 49     20      37  9.2   65     6  18
## 50     12     120 11.5   73     6  19
## 51     13     137 10.3   76     6  20
## 52     NA     150  6.3   77     6  21
## 53     NA      59  1.7   76     6  22
## 54     NA      91  4.6   76     6  23
## 55     NA     250  6.3   76     6  24
## 56     NA     135  8.0   75     6  25
## 57     NA     127  8.0   78     6  26
## 58     NA      47 10.3   73     6  27
## 59     NA      98 11.5   80     6  28
## 60     NA      31 14.9   77     6  29
## 61     NA     138  8.0   83     6  30
## 62    135     269  4.1   84     7   1
## 63     49     248  9.2   85     7   2
## 64     32     236  9.2   81     7   3
## 65     NA     101 10.9   84     7   4
## 66     64     175  4.6   83     7   5
## 67     40     314 10.9   83     7   6
## 68     77     276  5.1   88     7   7
## 69     97     267  6.3   92     7   8
## 70     97     272  5.7   92     7   9
## 71     85     175  7.4   89     7  10
## 72     NA     139  8.6   82     7  11
## 73     10     264 14.3   73     7  12
## 74     27     175 14.9   81     7  13
## 75     NA     291 14.9   91     7  14
## 76      7      48 14.3   80     7  15
## 77     48     260  6.9   81     7  16
## 78     35     274 10.3   82     7  17
## 79     61     285  6.3   84     7  18
## 80     79     187  5.1   87     7  19
## 81     63     220 11.5   85     7  20
## 82     16       7  6.9   74     7  21
## 83     NA     258  9.7   81     7  22
## 84     NA     295 11.5   82     7  23
## 85     80     294  8.6   86     7  24
## 86    108     223  8.0   85     7  25
## 87     20      81  8.6   82     7  26
## 88     52      82 12.0   86     7  27
## 89     82     213  7.4   88     7  28
## 90     50     275  7.4   86     7  29
## 91     64     253  7.4   83     7  30
## 92     59     254  9.2   81     7  31
## 93     39      83  6.9   81     8   1
## 94      9      24 13.8   81     8   2
## 95     16      77  7.4   82     8   3
## 96     78      NA  6.9   86     8   4
## 97     35      NA  7.4   85     8   5
## 98     66      NA  4.6   87     8   6
## 99    122     255  4.0   89     8   7
## 100    89     229 10.3   90     8   8
## 101   110     207  8.0   90     8   9
## 102    NA     222  8.6   92     8  10
## 103    NA     137 11.5   86     8  11
## 104    44     192 11.5   86     8  12
## 105    28     273 11.5   82     8  13
## 106    65     157  9.7   80     8  14
## 107    NA      64 11.5   79     8  15
## 108    22      71 10.3   77     8  16
## 109    59      51  6.3   79     8  17
## 110    23     115  7.4   76     8  18
## 111    31     244 10.9   78     8  19
## 112    44     190 10.3   78     8  20
## 113    21     259 15.5   77     8  21
## 114     9      36 14.3   72     8  22
## 115    NA     255 12.6   75     8  23
## 116    45     212  9.7   79     8  24
## 117   168     238  3.4   81     8  25
## 118    73     215  8.0   86     8  26
## 119    NA     153  5.7   88     8  27
## 120    76     203  9.7   97     8  28
## 121   118     225  2.3   94     8  29
## 122    84     237  6.3   96     8  30
## 123    85     188  6.3   94     8  31
## 124    96     167  6.9   91     9   1
## 125    78     197  5.1   92     9   2
## 126    73     183  2.8   93     9   3
## 127    91     189  4.6   93     9   4
## 128    47      95  7.4   87     9   5
## 129    32      92 15.5   84     9   6
## 130    20     252 10.9   80     9   7
## 131    23     220 10.3   78     9   8
## 132    21     230 10.9   75     9   9
## 133    24     259  9.7   73     9  10
## 134    44     236 14.9   81     9  11
## 135    21     259 15.5   76     9  12
## 136    28     238  6.3   77     9  13
## 137     9      24 10.9   71     9  14
## 138    13     112 11.5   71     9  15
## 139    46     237  6.9   78     9  16
## 140    18     224 13.8   67     9  17
## 141    13      27 10.3   76     9  18
## 142    24     238 10.3   68     9  19
## 143    16     201  8.0   82     9  20
## 144    13     238 12.6   64     9  21
## 145    23      14  9.2   71     9  22
## 146    36     139 10.3   81     9  23
## 147     7      49 10.3   69     9  24
## 148    14      20 16.6   63     9  25
## 149    30     193  6.9   70     9  26
## 150    NA     145 13.2   77     9  27
## 151    14     191 14.3   75     9  28
## 152    18     131  8.0   76     9  29
## 153    20     223 11.5   68     9  30

Tasks and Questions

Task 1: Measures of Central Tendency and Spread

Using functions you learned this week, compute mean, median, standard deviation, min, and max separately for Ozone, Temp, and Wind.

#Your code for Ozone goes here
mean_ozone <- function() {
  mean(airquality$Ozone, na.rm = T)
}

median_ozone <- function() {
  median(airquality$Ozone, na.rm = T)
}

sd_ozone <- function() {
  sd(airquality$Ozone, na.rm = T)
}

min_ozone <- function() {
  min(airquality$Ozone, na.rm = T)
}

max_ozone <- function() {
  max(airquality$Ozone, na.rm = T)
}

mean_ozone()
## [1] 42.12931
median_ozone()
## [1] 31.5
sd_ozone()
## [1] 32.98788
min_ozone()
## [1] 1
max_ozone()
## [1] 168
#Your code for Temp goes here
mean_temp <- function() {
  mean(airquality$Temp, na.rm = T)
}

median_temp <- function() {
  median(airquality$Temp, na.rm = T)
}

sd_temp <- function() {
  sd(airquality$Temp, na.rm = T)
}

min_temp <- function() {
  min(airquality$Temp, na.rm = T)
}

max_temp <- function() {
  max(airquality$Temp, na.rm = T)
}

mean_temp()
## [1] 77.88235
median_temp()
## [1] 79
sd_temp()
## [1] 9.46527
min_temp()
## [1] 56
max_temp()
## [1] 97
#Your code for Wind goes here
mean_wind <- function() {
  mean(airquality$Wind, na.rm = T)
}

median_wind <- function() {
  median(airquality$Wind, na.rm = T)
}

sd_wind <- function() {
  sd(airquality$Wind, na.rm = T)
}

min_wind <- function() {
  min(airquality$Wind, na.rm = T)
}

max_wind <- function() {
  max(airquality$Wind, na.rm = T)
}

mean_wind()
## [1] 9.957516
median_wind()
## [1] 9.7
sd_wind()
## [1] 3.523001
min_wind()
## [1] 1.7
max_wind()
## [1] 20.7

Question: Compare the mean and median for each variable. Are they similar or different, and what does this suggest about the distribution (e.g., skewness)? What does the standard deviation indicate about variability?

For ozone, the mean exceeds the median, showcasing right-skewedness and the most variability. Temperature has a very similar mean and median, which suggests a roughly symmetric distribution, with moderate variability. Finally, wind shows a very similar mean and median, which also suggests a roughly symmetric distribtuion, with the lowest varaibility.

Task 2: Histogram

Generate the histogram for Ozone.

#Your code goes here
# Histogram for Ozone concentration

ggplot(airquality, aes(x = Ozone)) +
  geom_histogram(binwidth = 10, fill = "#1f77b4", color = "black", na.rm = TRUE) +
  labs(title = "Histogram of Ozone Concentration",
       x = "Ozone (ppb)", y = "Count") +
  theme_minimal()

Question: Describe the shape of the ozone distribution (e.g., normal, skewed, unimodal). Are there any outliers or unusual features?

The ozone distribution is skewed right, with an outlier at 168 ppb.

Task 3: Boxplot

Create a boxplot of ozone levels (Ozone) by month, with months displayed as names (May, June, July, August, September) instead of numbers (5–9).Recode the Month variable into a new column called month_name with month names using case_when from week 4.Generate a boxplot of Ozone by month_name.

# Your code here
airquality <- airquality |>
  mutate(month_name = case_when(
    Month == 5 ~ "May",
    Month == 6 ~ "June",
    Month == 7 ~ "July",
    Month == 8 ~ "August",
    Month == 9 ~ "September"
  ))

airquality$month_name <- factor(airquality$month_name,
                                levels = c("May","June","July","August","September"))

ggplot(airquality, aes(x = month_name, y = Ozone, fill = month_name)) +
  geom_boxplot(na.rm = TRUE) +
  scale_fill_manual(values = c("May"="#1f77b4",
                               "June"="#ff7f0e",
                               "July"="#2ca02c",
                               "August"="#d62728",
                               "September"="#9467bd"),
                    guide = "none") +
  labs(title = "Boxplot of Ozone by Month",
       x = "Month", y = "Ozone (ppb)") +
  theme_minimal()

Question: How do ozone levels vary across months? Which month has the highest median ozone? Are there outliers in any month, and what might they indicate?

Ozone levels have a higher average and range of values throughout August and July, and decrease in June, May, and September. July has the highest median ozone. There are outliers in August, June, May, and September. These outliers may indicate months where weather tends to be more sporadic, as hotter temperatures lead to more ground-level ozone.

Task 4: Scatterplot

Produce the scatterplot of Temp vs. Ozone, colored by Month.

# Your code goes here

airquality$month_name <- factor(airquality$month_name,
                                levels = c("May","June","July","August","September"))

ggplot(airquality, aes(x = Temp, y = Ozone, color = month_name)) +
  geom_point(alpha = 0.7, na.rm = TRUE) +
  scale_color_manual(values = c("May"="#1f77b4",
                                "June"="#ff7f0e",
                                "July"="#2ca02c",
                                "August"="#d62728",
                                "September"="#9467bd")) +
  labs(title = "Scatterplot of Temperature vs. Ozone",
       x = "Temperature (F)", y = "Ozone (ppb)", color = "Month") +
  theme_minimal()

Question: Is there a visible relationship between temperature and ozone levels? Do certain months cluster together (e.g., higher ozone in warmer months)? Describe any patterns.

There is a clear positive relationship between temperature and ozone—higher temperatures are associated with higher ozone levels. Warmer months like July and August cluster at the top right, showing elevated ozone during the hottest periods.

Task 5: Correlation Matrix

Compute and visualize the correlation matrix for Ozone, Temp, and Wind.

# Your code goes here

# Compute correlation matrix
cor_matrix <- cor(
  airquality[, c("Ozone", "Temp", "Wind")],
  use = "complete.obs"
)

cor_matrix
##            Ozone       Temp       Wind
## Ozone  1.0000000  0.6983603 -0.6015465
## Temp   0.6983603  1.0000000 -0.5110750
## Wind  -0.6015465 -0.5110750  1.0000000
# Visualize correlation matrix
corrplot(cor_matrix,
         method = "color",
         type = "upper",
         order = "hclust",
         tl.col = "black",
         tl.srt = 45,
         addCoef.col = "black",
         title = "Correlation Matrix: Ozone, Temperature, and Wind")

Question: Identify the strongest and weakest correlations. For example, is ozone more strongly correlated with temperature or wind speed? Explain what the correlation values suggest about relationships between variables.

The strongest correlation is between Ozone and Temperature (r = 0.70), showing that ozone levels tend to rise on warmer days. The weakest is between Temperature and Wind (r = -0.51), while Ozone and Wind (r = -0.60) also show a strong negative relationship—indicating that higher wind speeds are associated with lower ozone concentrations.

Task 6: Summary Table

Generate the summary table grouped by Month.Generate the summary table grouped by Month. It should include count, average mean of ozone, average mean of temperature, and average mean of wind per month.

# your code goes here
summary_table <- airquality %>%
  group_by(month_name) %>%
  summarise(
    count     = n(),
    avg_ozone = round(mean(Ozone, na.rm = TRUE), 1),
    avg_temp  = round(mean(Temp,  na.rm = TRUE), 1),
    avg_wind  = round(mean(Wind,  na.rm = TRUE), 1),
    .groups   = "drop"
  )

summary_table
## # A tibble: 5 × 5
##   month_name count avg_ozone avg_temp avg_wind
##   <fct>      <int>     <dbl>    <dbl>    <dbl>
## 1 May           31      23.6     65.5     11.6
## 2 June          30      29.4     79.1     10.3
## 3 July          31      59.1     83.9      8.9
## 4 August        31      60       84        8.8
## 5 September     30      31.4     76.9     10.2

Question: Which month has the highest average ozone level? How do temperature and wind speed vary across months? What environmental factors might explain these differences?

August has the highest average ozone level. Temperature and wind speed showcase an inversely-related trend, as temperature seems to be higher in the months where wind speed is at its lowest, and vice versa. This is supported by seasonal expectations, where summertime has hotter, calmer air, which increases ozone buildup and temperature.

Submission Requirements

Publish it to Rpubs and submit your link on blackboard