Objective:

You will use R to analyze the built-in airquality dataset, applying descriptive statistics techniques to explore environmental data. The assignment covers measures of central tendency, spread, histograms, boxplots, scatterplots, correlations, and summary tables, aligning with the Week 6 agenda on Descriptive Statistics.

Dataset

Source: Built-in R dataset airquality.

Description: Contains 153 observations of daily air quality measurements in New York from May to September 1973.

Variables (selected for this assignment):

Notes

-The airquality dataset has missing values in Ozone and Solar.R. The code uses na.rm = TRUE or use = “complete.obs” to handle them.

-If you encounter errors, check that tidyverse and corrplot are installed and loaded.

-Feel free to modify plot aesthetics (e.g., colors, binwidth) to enhance clarity.

Instructions:

Complete the following tasks using R to analyze the airquality dataset. Submit your Rpubs link that includes code, outputs (tables and plots), and written interpretations for each task. Ensure you load the dataset using data(airquality) and install/load the tidyverse and corrplot packages.

#Load your dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
data("airquality")

Tasks and Questions

Task 1: Measures of Central Tendency and Spread

Using functions you learned this week, compute mean, median, standard deviation, min, and max separately for Ozone, Temp, and Wind.

#Your code for Ozone goes here

# Ozone

mean(airquality$Ozone, na.rm = TRUE)
## [1] 42.12931
median(airquality$Ozone, na.rm = TRUE)
## [1] 31.5
sd(airquality$Ozone, na.rm = TRUE)
## [1] 32.98788
min(airquality$Ozone, na.rm = TRUE)
## [1] 1
max(airquality$Ozone, na.rm = TRUE)
## [1] 168
#Your code for Temp goes here

# Temp

mean(airquality$Temp, na.rm = TRUE)
## [1] 77.88235
median(airquality$Temp, na.rm = TRUE)
## [1] 79
sd(airquality$Temp, na.rm = TRUE)
## [1] 9.46527
min(airquality$Temp, na.rm = TRUE)
## [1] 56
max(airquality$Temp, na.rm = TRUE)
## [1] 97
#Your code for Wind goes here

# Wind

mean(airquality$Wind, na.rm = TRUE)
## [1] 9.957516
median(airquality$Wind, na.rm = TRUE)
## [1] 9.7
sd(airquality$Wind, na.rm = TRUE)
## [1] 3.523001
min(airquality$Wind, na.rm = TRUE)
## [1] 1.7
max(airquality$Wind, na.rm = TRUE)
## [1] 20.7

Question: Compare the mean and median for each variable. Are they similar or different, and what does this suggest about the distribution (e.g., skewness)? What does the standard deviation indicate about variability?

For Ozone, the mean is higher than the median, which indicates is right skewed. The standard deviation 32.99 is large, showing high variability in ozone level For Temperature, the mean and median are very close but with a higher median than mean, indicating a left skewed distribution. The standard deviation (9.5) shows moderate variability in temperature. For Wind, the mean and median are similar, indicating a symmetric distribution. the standard deviation is 3.52, showing that wind speed is less variable than temperature an Ozone.

Task 2: Histogram

Generate the histogram for Ozone.

#Your code goes here
library(ggplot2)

p1 <- ggplot(airquality, aes(x = Ozone)) +
geom_histogram(bindwdith = 10, fill = "purple", color = "black") +
labs( title = "Histogram of Ozone Levels",
x = "Ozone",
y = "Frequency"
)
## Warning in geom_histogram(bindwdith = 10, fill = "purple", color = "black"):
## Ignoring unknown parameters: `bindwdith`
p1
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_bin()`).

Question: Describe the shape of the ozone distribution (e.g., normal, skewed, unimodal). Are there any outliers or unusual features?

The distribution of Ozone levels in the airquality dataset is right skewed. There is an outlier around 150 - 160 represent days unusually high ozone concentration. These extreme values increase the mean slightly and contribute to the right skewed shape.

Task 3: Boxplot

Create a boxplot of ozone levels (Ozone) by month, with months displayed as names (May, June, July, August, September) instead of numbers (5–9).Recode the Month variable into a new column called month_name with month names using case_when from week 4.Generate a boxplot of Ozone by month_name.

# Your code here

aq <- airquality |>
  mutate(month_name = factor(case_when(
    Month == 5 ~ "May",
    Month == 6 ~ "June",
    Month == 7 ~ "July",
    Month == 8 ~ "August",
    Month == 9 ~ "September"),
    levels = c("May", "June", "July", "August", "September")))

aq
##     Ozone Solar.R Wind Temp Month Day month_name
## 1      41     190  7.4   67     5   1        May
## 2      36     118  8.0   72     5   2        May
## 3      12     149 12.6   74     5   3        May
## 4      18     313 11.5   62     5   4        May
## 5      NA      NA 14.3   56     5   5        May
## 6      28      NA 14.9   66     5   6        May
## 7      23     299  8.6   65     5   7        May
## 8      19      99 13.8   59     5   8        May
## 9       8      19 20.1   61     5   9        May
## 10     NA     194  8.6   69     5  10        May
## 11      7      NA  6.9   74     5  11        May
## 12     16     256  9.7   69     5  12        May
## 13     11     290  9.2   66     5  13        May
## 14     14     274 10.9   68     5  14        May
## 15     18      65 13.2   58     5  15        May
## 16     14     334 11.5   64     5  16        May
## 17     34     307 12.0   66     5  17        May
## 18      6      78 18.4   57     5  18        May
## 19     30     322 11.5   68     5  19        May
## 20     11      44  9.7   62     5  20        May
## 21      1       8  9.7   59     5  21        May
## 22     11     320 16.6   73     5  22        May
## 23      4      25  9.7   61     5  23        May
## 24     32      92 12.0   61     5  24        May
## 25     NA      66 16.6   57     5  25        May
## 26     NA     266 14.9   58     5  26        May
## 27     NA      NA  8.0   57     5  27        May
## 28     23      13 12.0   67     5  28        May
## 29     45     252 14.9   81     5  29        May
## 30    115     223  5.7   79     5  30        May
## 31     37     279  7.4   76     5  31        May
## 32     NA     286  8.6   78     6   1       June
## 33     NA     287  9.7   74     6   2       June
## 34     NA     242 16.1   67     6   3       June
## 35     NA     186  9.2   84     6   4       June
## 36     NA     220  8.6   85     6   5       June
## 37     NA     264 14.3   79     6   6       June
## 38     29     127  9.7   82     6   7       June
## 39     NA     273  6.9   87     6   8       June
## 40     71     291 13.8   90     6   9       June
## 41     39     323 11.5   87     6  10       June
## 42     NA     259 10.9   93     6  11       June
## 43     NA     250  9.2   92     6  12       June
## 44     23     148  8.0   82     6  13       June
## 45     NA     332 13.8   80     6  14       June
## 46     NA     322 11.5   79     6  15       June
## 47     21     191 14.9   77     6  16       June
## 48     37     284 20.7   72     6  17       June
## 49     20      37  9.2   65     6  18       June
## 50     12     120 11.5   73     6  19       June
## 51     13     137 10.3   76     6  20       June
## 52     NA     150  6.3   77     6  21       June
## 53     NA      59  1.7   76     6  22       June
## 54     NA      91  4.6   76     6  23       June
## 55     NA     250  6.3   76     6  24       June
## 56     NA     135  8.0   75     6  25       June
## 57     NA     127  8.0   78     6  26       June
## 58     NA      47 10.3   73     6  27       June
## 59     NA      98 11.5   80     6  28       June
## 60     NA      31 14.9   77     6  29       June
## 61     NA     138  8.0   83     6  30       June
## 62    135     269  4.1   84     7   1       July
## 63     49     248  9.2   85     7   2       July
## 64     32     236  9.2   81     7   3       July
## 65     NA     101 10.9   84     7   4       July
## 66     64     175  4.6   83     7   5       July
## 67     40     314 10.9   83     7   6       July
## 68     77     276  5.1   88     7   7       July
## 69     97     267  6.3   92     7   8       July
## 70     97     272  5.7   92     7   9       July
## 71     85     175  7.4   89     7  10       July
## 72     NA     139  8.6   82     7  11       July
## 73     10     264 14.3   73     7  12       July
## 74     27     175 14.9   81     7  13       July
## 75     NA     291 14.9   91     7  14       July
## 76      7      48 14.3   80     7  15       July
## 77     48     260  6.9   81     7  16       July
## 78     35     274 10.3   82     7  17       July
## 79     61     285  6.3   84     7  18       July
## 80     79     187  5.1   87     7  19       July
## 81     63     220 11.5   85     7  20       July
## 82     16       7  6.9   74     7  21       July
## 83     NA     258  9.7   81     7  22       July
## 84     NA     295 11.5   82     7  23       July
## 85     80     294  8.6   86     7  24       July
## 86    108     223  8.0   85     7  25       July
## 87     20      81  8.6   82     7  26       July
## 88     52      82 12.0   86     7  27       July
## 89     82     213  7.4   88     7  28       July
## 90     50     275  7.4   86     7  29       July
## 91     64     253  7.4   83     7  30       July
## 92     59     254  9.2   81     7  31       July
## 93     39      83  6.9   81     8   1     August
## 94      9      24 13.8   81     8   2     August
## 95     16      77  7.4   82     8   3     August
## 96     78      NA  6.9   86     8   4     August
## 97     35      NA  7.4   85     8   5     August
## 98     66      NA  4.6   87     8   6     August
## 99    122     255  4.0   89     8   7     August
## 100    89     229 10.3   90     8   8     August
## 101   110     207  8.0   90     8   9     August
## 102    NA     222  8.6   92     8  10     August
## 103    NA     137 11.5   86     8  11     August
## 104    44     192 11.5   86     8  12     August
## 105    28     273 11.5   82     8  13     August
## 106    65     157  9.7   80     8  14     August
## 107    NA      64 11.5   79     8  15     August
## 108    22      71 10.3   77     8  16     August
## 109    59      51  6.3   79     8  17     August
## 110    23     115  7.4   76     8  18     August
## 111    31     244 10.9   78     8  19     August
## 112    44     190 10.3   78     8  20     August
## 113    21     259 15.5   77     8  21     August
## 114     9      36 14.3   72     8  22     August
## 115    NA     255 12.6   75     8  23     August
## 116    45     212  9.7   79     8  24     August
## 117   168     238  3.4   81     8  25     August
## 118    73     215  8.0   86     8  26     August
## 119    NA     153  5.7   88     8  27     August
## 120    76     203  9.7   97     8  28     August
## 121   118     225  2.3   94     8  29     August
## 122    84     237  6.3   96     8  30     August
## 123    85     188  6.3   94     8  31     August
## 124    96     167  6.9   91     9   1  September
## 125    78     197  5.1   92     9   2  September
## 126    73     183  2.8   93     9   3  September
## 127    91     189  4.6   93     9   4  September
## 128    47      95  7.4   87     9   5  September
## 129    32      92 15.5   84     9   6  September
## 130    20     252 10.9   80     9   7  September
## 131    23     220 10.3   78     9   8  September
## 132    21     230 10.9   75     9   9  September
## 133    24     259  9.7   73     9  10  September
## 134    44     236 14.9   81     9  11  September
## 135    21     259 15.5   76     9  12  September
## 136    28     238  6.3   77     9  13  September
## 137     9      24 10.9   71     9  14  September
## 138    13     112 11.5   71     9  15  September
## 139    46     237  6.9   78     9  16  September
## 140    18     224 13.8   67     9  17  September
## 141    13      27 10.3   76     9  18  September
## 142    24     238 10.3   68     9  19  September
## 143    16     201  8.0   82     9  20  September
## 144    13     238 12.6   64     9  21  September
## 145    23      14  9.2   71     9  22  September
## 146    36     139 10.3   81     9  23  September
## 147     7      49 10.3   69     9  24  September
## 148    14      20 16.6   63     9  25  September
## 149    30     193  6.9   70     9  26  September
## 150    NA     145 13.2   77     9  27  September
## 151    14     191 14.3   75     9  28  September
## 152    18     131  8.0   76     9  29  September
## 153    20     223 11.5   68     9  30  September
ggplot(aq, aes(x = month_name, y = Ozone)) +
  geom_boxplot(fill = "red", color = "black", na.rm = TRUE) +
  labs(title = "Ozone Levels by Month",
       x = "Month",
       y = "Ozone") +
  theme_minimal()

Question: How do ozone levels vary across months? Which month has the highest median ozone? Are there outliers in any month, and what might they indicate?

From May to July the Ozone concentration increase, reaching the highest level in July. After July, Ozone levels decrease in August and September.

Task 4: Scatterplot

Produce the scatterplot of Temp vs. Ozone, colored by Month.

# Your code goes here

ggplot(aq, aes(x = Temp, y = Ozone, color = month_name)) +
  geom_point(size = 3, alpha = 0.7, na.rm = TRUE) +
  labs(title = "Temperature vs Ozone", 
       x = "Temperature (F)", 
       y = "Ozone",
       Color = "Month") +
  theme_minimal()
## Ignoring unknown labels:
## • Color : "Month"

Question: Is there a visible relationship between temperature and ozone levels? Do certain months cluster together (e.g., higher ozone in warmer months)? Describe any patterns.

The relationship is positive, meaning that as temperature increase, Ozone level increase. So, warmer days have higher level concentration. May and June appear at lower temperatures and lower Ozone. July and August shows at higher temperature and higher Ozone levels. September is slightly cooler and Ozone levels start to decrease.

Task 5: Correlation Matrix

Compute and visualize the correlation matrix for Ozone, Temp, and Wind.

# Your code goes here

aq_clean <- airquality |>
  select(Ozone, Temp, Wind) |>
  na.omit()
aq_clean
##     Ozone Temp Wind
## 1      41   67  7.4
## 2      36   72  8.0
## 3      12   74 12.6
## 4      18   62 11.5
## 6      28   66 14.9
## 7      23   65  8.6
## 8      19   59 13.8
## 9       8   61 20.1
## 11      7   74  6.9
## 12     16   69  9.7
## 13     11   66  9.2
## 14     14   68 10.9
## 15     18   58 13.2
## 16     14   64 11.5
## 17     34   66 12.0
## 18      6   57 18.4
## 19     30   68 11.5
## 20     11   62  9.7
## 21      1   59  9.7
## 22     11   73 16.6
## 23      4   61  9.7
## 24     32   61 12.0
## 28     23   67 12.0
## 29     45   81 14.9
## 30    115   79  5.7
## 31     37   76  7.4
## 38     29   82  9.7
## 40     71   90 13.8
## 41     39   87 11.5
## 44     23   82  8.0
## 47     21   77 14.9
## 48     37   72 20.7
## 49     20   65  9.2
## 50     12   73 11.5
## 51     13   76 10.3
## 62    135   84  4.1
## 63     49   85  9.2
## 64     32   81  9.2
## 66     64   83  4.6
## 67     40   83 10.9
## 68     77   88  5.1
## 69     97   92  6.3
## 70     97   92  5.7
## 71     85   89  7.4
## 73     10   73 14.3
## 74     27   81 14.9
## 76      7   80 14.3
## 77     48   81  6.9
## 78     35   82 10.3
## 79     61   84  6.3
## 80     79   87  5.1
## 81     63   85 11.5
## 82     16   74  6.9
## 85     80   86  8.6
## 86    108   85  8.0
## 87     20   82  8.6
## 88     52   86 12.0
## 89     82   88  7.4
## 90     50   86  7.4
## 91     64   83  7.4
## 92     59   81  9.2
## 93     39   81  6.9
## 94      9   81 13.8
## 95     16   82  7.4
## 96     78   86  6.9
## 97     35   85  7.4
## 98     66   87  4.6
## 99    122   89  4.0
## 100    89   90 10.3
## 101   110   90  8.0
## 104    44   86 11.5
## 105    28   82 11.5
## 106    65   80  9.7
## 108    22   77 10.3
## 109    59   79  6.3
## 110    23   76  7.4
## 111    31   78 10.9
## 112    44   78 10.3
## 113    21   77 15.5
## 114     9   72 14.3
## 116    45   79  9.7
## 117   168   81  3.4
## 118    73   86  8.0
## 120    76   97  9.7
## 121   118   94  2.3
## 122    84   96  6.3
## 123    85   94  6.3
## 124    96   91  6.9
## 125    78   92  5.1
## 126    73   93  2.8
## 127    91   93  4.6
## 128    47   87  7.4
## 129    32   84 15.5
## 130    20   80 10.9
## 131    23   78 10.3
## 132    21   75 10.9
## 133    24   73  9.7
## 134    44   81 14.9
## 135    21   76 15.5
## 136    28   77  6.3
## 137     9   71 10.9
## 138    13   71 11.5
## 139    46   78  6.9
## 140    18   67 13.8
## 141    13   76 10.3
## 142    24   68 10.3
## 143    16   82  8.0
## 144    13   64 12.6
## 145    23   71  9.2
## 146    36   81 10.3
## 147     7   69 10.3
## 148    14   63 16.6
## 149    30   70  6.9
## 151    14   75 14.3
## 152    18   76  8.0
## 153    20   68 11.5
cor_matrix <- cor(aq_clean)
cor_matrix
##            Ozone       Temp       Wind
## Ozone  1.0000000  0.6983603 -0.6015465
## Temp   0.6983603  1.0000000 -0.5110750
## Wind  -0.6015465 -0.5110750  1.0000000
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", 
         tl.srt = 45)

Question: Identify the strongest and weakest correlations. For example, is ozone more strongly correlated with temperature or wind speed? Explain what the correlation values suggest about relationships between variables.

Ozone is more strongly correlated to Temperature (r= 0.698) than with Wind (r= -0.602). Meaning that higher temperature tend to get higher Ozone levels, while windier days tend to have lower Ozone levels. The weakest correlation is between Temperature and Wind (r= -0.511), showing a negative relationship.

Task 6: Summary Table

Generate the summary table grouped by Month.Generate the summary table grouped by Month. It should include count, average mean of ozone, average mean of temperature, and average mean of wind per month.

# your code goes here

summary_table <- airquality |>
  group_by(Month) |>
  summarize(
    count = n(),
    avg_Ozone = mean(Ozone, na.rm = TRUE),
    avg_Temp = mean(Temp, na.rm = TRUE),
    av_Wind = mean(Wind, na.rm = TRUE)
  )
summary_table
## # A tibble: 5 × 5
##   Month count avg_Ozone avg_Temp av_Wind
##   <int> <int>     <dbl>    <dbl>   <dbl>
## 1     5    31      23.6     65.5   11.6 
## 2     6    30      29.4     79.1   10.3 
## 3     7    31      59.1     83.9    8.94
## 4     8    31      60.0     84.0    8.79
## 5     9    30      31.4     76.9   10.2

Question: Which month has the highest average ozone level? How do temperature and wind speed vary across months? What environmental factors might explain these differences?

According to the summary table, August has the highest average Ozone level (59.961). Temperature increases from May until August and then start dropping in September, While, winds are higher during the months of May and September. Hotter temperatures in summer increase Ozone concentration,while higher wind speeds in cooler days help disperse Ozone.

Submission Requirements

Publish it to Rpubs and submit your link on blackboard