Introduction

In this presentation, we will explore how ozone levels are related to:

  • Temperature
  • Wind speed
  • Solar radiation
  • Month

About the Dataset

The airquality dataset is built into R and can be accessed by every user. It contains daily air quality measurements from New York.

Some of the variables that we will use are:

  • Ozone
  • Solar.R
  • Wind
  • Temp
  • Month

Data Preparation

# Load in packages
library(ggplot2)
library(plotly)
library(dplyr)
library(scales)
# Load in data
data(airquality)

# Show the first 5 rows of the dataset
head(airquality, 5)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5

Data Cleaning

To be able to evaluate the data, we need to make sure that we check and take care of any missing values.

# Check missing values before cleaning
print(colSums(is.na(airquality)))
##   Ozone Solar.R    Wind    Temp   Month     Day 
##      37       7       0       0       0       0
# Remove rows with missing values
air <- na.omit(airquality)

Research Question

The main question for this project is:

How are ozone levels related to temperature, wind speed, solar radiation, and month?

To explore this, we will use:

  • Scatter plots
  • Correlation
  • Interactive Plotly graph

Linear Regression

A simple linear regression line can help show the overall trend in each scatterplot.

\[Ozone = \beta_0 + \beta_1X + \epsilon\]

Here, \(X\) can represent temperature, wind speed, or solar radiation.

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope — how much ozone changes per unit increase in \(X\)

Ozone vs Temperature

Ozone vs Wind Speed

Ozone vs Solar Radiation

Correlation

Correlation helps measure the strength and direction of the relationship between two numerical variables.

\[r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}\]

Values close to 1 indicate a strong positive relationship, while values close to -1 indicate a strong negative relationship.

Correlation Analysis

# Calculate correlations between ozone and other variables
air %>%
  summarize(
    Temp_Correlation = cor(Ozone, Temp),
    Wind_Correlation = cor(Ozone, Wind),
    Solar_Correlation = cor(Ozone, Solar.R)
  )
##   Temp_Correlation Wind_Correlation Solar_Correlation
## 1        0.6985414       -0.6124966         0.3483417

Ozone Levels Across Temperature, Wind, Solar Radiation, and Month

Code for Interactive Plotly

air$MonthName <- factor(air$Month,
  levels = 5:9,
  labels = c("May", "June", "July", "August", "September")
)
plot_ly(
  air,
  x = ~Temp,
  y = ~Wind,
  z = ~Ozone,
  color = ~MonthName,
  text = ~paste("Month:", MonthName,
                "<br>Temp:", Temp,
                "<br>Wind:", Wind,
                "<br>Ozone:", Ozone,
                "<br>Solar Radiation:", Solar.R),
  type = "scatter3d",
  mode = "markers",
  marker = list(
    size = ~rescale(Solar.R, to = c(3, 15))
  )
) %>%
  layout(
    scene = list(
      xaxis = list(title = "Temperature (°F)"),
      yaxis = list(title = "Wind (mph)"),
      zaxis = list(title = "Ozone (ppb)")
    ),
    legend = list(title = list(text = "Month"))
  )

Conclusion

Based on the plots and correlation values:

  • Higher temperature has a strong correlation with ozone levels (\(r = 0.698\)).
  • Higher wind has a strong negative correlation with ozone levels (\(r = -0.612\)).
  • Solar radiation has a weaker positive correlation with ozone (\(r = 0.348\)).
  • Month helps show seasonal patterns in ozone levels.

Overall, ozone levels seem to be connected to temperature, wind speed, solar radiation, and month.