In this presentation, we will explore how ozone levels are related to:
- Temperature
- Wind speed
- Solar radiation
- Month
In this presentation, we will explore how ozone levels are related to:
The airquality dataset is built into R and can be accessed by every user. It contains daily air quality measurements from New York.
Some of the variables that we will use are:
OzoneSolar.RWindTempMonth# Load in packages library(ggplot2) library(plotly) library(dplyr) library(scales)
# Load in data data(airquality) # Show the first 5 rows of the dataset head(airquality, 5)
## Ozone Solar.R Wind Temp Month Day ## 1 41 190 7.4 67 5 1 ## 2 36 118 8.0 72 5 2 ## 3 12 149 12.6 74 5 3 ## 4 18 313 11.5 62 5 4 ## 5 NA NA 14.3 56 5 5
To be able to evaluate the data, we need to make sure that we check and take care of any missing values.
# Check missing values before cleaning print(colSums(is.na(airquality)))
## Ozone Solar.R Wind Temp Month Day ## 37 7 0 0 0 0
# Remove rows with missing values air <- na.omit(airquality)
The main question for this project is:
How are ozone levels related to temperature, wind speed, solar radiation, and month?
To explore this, we will use:
A simple linear regression line can help show the overall trend in each scatterplot.
\[Ozone = \beta_0 + \beta_1X + \epsilon\]
Here, \(X\) can represent temperature, wind speed, or solar radiation.
Correlation helps measure the strength and direction of the relationship between two numerical variables.
\[r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}\]
Values close to 1 indicate a strong positive relationship, while values close to -1 indicate a strong negative relationship.
# Calculate correlations between ozone and other variables
air %>%
summarize(
Temp_Correlation = cor(Ozone, Temp),
Wind_Correlation = cor(Ozone, Wind),
Solar_Correlation = cor(Ozone, Solar.R)
)
## Temp_Correlation Wind_Correlation Solar_Correlation ## 1 0.6985414 -0.6124966 0.3483417
air$MonthName <- factor(air$Month,
levels = 5:9,
labels = c("May", "June", "July", "August", "September")
)
plot_ly(
air,
x = ~Temp,
y = ~Wind,
z = ~Ozone,
color = ~MonthName,
text = ~paste("Month:", MonthName,
"<br>Temp:", Temp,
"<br>Wind:", Wind,
"<br>Ozone:", Ozone,
"<br>Solar Radiation:", Solar.R),
type = "scatter3d",
mode = "markers",
marker = list(
size = ~rescale(Solar.R, to = c(3, 15))
)
) %>%
layout(
scene = list(
xaxis = list(title = "Temperature (°F)"),
yaxis = list(title = "Wind (mph)"),
zaxis = list(title = "Ozone (ppb)")
),
legend = list(title = list(text = "Month"))
)
Based on the plots and correlation values:
Overall, ozone levels seem to be connected to temperature, wind speed, solar radiation, and month.