The atmos data set resides in the
nasaweather package of the R programming language.
It contains a collection of atmospheric variables measured between 1995
and 2000 on a grid of 576 coordinates in the western hemisphere. The
data set comes from the 2006 ASA Data
Expo.
Some of the variables in the atmos data set are:
temp - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (K))
pressure - The mean monthly air pressure at the surface of the Earth (measured in millibars (mb))
ozone - The mean monthly abundance of atmospheric ozone (measured in Dobson units (DU))
You can convert the temperature unit from Kelvin to Celsius with the formula
\[ celsius = kelvin - 273.15 \]
And you can convert the result to Fahrenheit with the formula
\[ fahrenheit = celsius \times \frac{9}{5} + 32 \]
To analyze this data, we will use the following R packages:
library(nasaweather)
library(tidyverse)
For the remainder of the report, we will look only at data from the year 2000. We aggregate our data by location, using the R code below.
means <- atmos %>%
filter(year == year) %>%
group_by(long, lat) %>%
summarize(temp = mean(temp, na.rm = TRUE),
pressure = mean(pressure, na.rm = TRUE),
ozone = mean(ozone, na.rm = TRUE),
cloudlow = mean(cloudlow, na.rm = TRUE),
cloudmid = mean(cloudmid, na.rm = TRUE),
cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
ungroup()
where the year object equals 2000.
Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.
ggplot(data = means, aes(x = temp, y = ozone)) +
geom_point()
We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:
means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"
We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region. We capture this relationship with a second order linear model of the form
\[ ozone = \alpha + \beta_{1} temperature + \sum_{locales} \beta_{i} locale_{i} + \sum_{locales} \beta_{j} interaction_{j} + \epsilon \]
This yields the following coefficients and relationships.
lm(ozone ~ temp + locale + temp:locale, data = means)
##
## Call:
## lm(formula = ozone ~ temp + locale + temp:locale, data = means)
##
## Coefficients:
## (Intercept) temp
## 1336.508 -3.559
## localenorth atlantic localesouth america
## 548.248 -1061.452
## localesouth pacific temp:localenorth atlantic
## -549.906 -1.827
## temp:localesouth america temp:localesouth pacific
## 3.496 1.785
ggplot(means, aes(temp, ozone, color = locale)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ locale)
## `geom_smooth()` using formula = 'y ~ x'
An anova test suggests that both locale and the interaction effect of locale and temperature are useful for predicting ozone (i.e., the p-value that compares the full model to the reduced models is statistically significant).
mod <- lm(ozone ~ temp, data = means)
mod2 <- lm(ozone ~ temp + locale, data = means)
mod3 <- lm(ozone ~ temp + locale + temp:locale, data = means)
anova(mod, mod2, mod3)
## Analysis of Variance Table
##
## Model 1: ozone ~ temp
## Model 2: ozone ~ temp + locale
## Model 3: ozone ~ temp + locale + temp:locale
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 574 99335
## 2 571 41425 3 57911 706.17 < 2.2e-16 ***
## 3 568 15527 3 25898 315.81 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
airquality[1:10, ]
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
## 7 23 299 8.6 65 5 7
## 8 19 99 13.8 59 5 8
## 9 8 19 20.1 61 5 9
## 10 NA 194 8.6 69 5 10
| Ozone | Solar.R | Wind | Temp | Month | Day |
|---|---|---|---|---|---|
| 41 | 190 | 7.4 | 67 | 5 | 1 |
| 36 | 118 | 8.0 | 72 | 5 | 2 |
| 12 | 149 | 12.6 | 74 | 5 | 3 |
| 18 | 313 | 11.5 | 62 | 5 | 4 |
| NA | NA | 14.3 | 56 | 5 | 5 |
| 28 | NA | 14.9 | 66 | 5 | 6 |
| 23 | 299 | 8.6 | 65 | 5 | 7 |
| 19 | 99 | 13.8 | 59 | 5 | 8 |
| 8 | 19 | 20.1 | 61 | 5 | 9 |
| NA | 194 | 8.6 | 69 | 5 | 10 |