Data

The atmos data set resides in the nasaweather package of the R programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the 2006 ASA Data Expo.

Some of the variables in the atmos data set are:

You can convert the temperature unit from Kelvin to Celsius with the formula

celsius = kevin - 273.15 $$

And you can convert the result to Fahrenheit with the formula

`

Cleaning

To analyze this data, we will use the following R packages:

library(nasaweather)
library(tidyverse)
year <- 1995

For the remainder of the report, we will look only at data from the year `<!—code chunk 3:1995

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
            pressure = mean(pressure, na.rm = TRUE),
            ozone = mean(ozone, na.rm = TRUE),
            cloudlow = mean(cloudlow, na.rm = TRUE),
            cloudmid = mean(cloudmid, na.rm = TRUE),
            cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
            ungroup()

where the year object equals 1995

Ozone and temperature

Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.

ggplot(data = means, aes(x = temp, y = ozone)) +
  geom_point()

We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:

means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"

Model

We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region. We capture this relationship with a second order linear model of the form

\[ ozone = \alpha + \beta_{1} temperature + \sum_{locales} \beta_{i} locale_{i} + \sum_{locales} \beta_{j} interaction_{j} + \epsilon\]

This yields the following coefficients and relationships.

lm(ozone ~ temp + locale + temp:locale, data = means)
## 
## Call:
## lm(formula = ozone ~ temp + locale + temp:locale, data = means)
## 
## Coefficients:
##               (Intercept)                       temp  
##                  1336.508                     -3.559  
##      localenorth atlantic        localesouth america  
##                   548.248                  -1061.452  
##       localesouth pacific  temp:localenorth atlantic  
##                  -549.906                     -1.827  
##  temp:localesouth america   temp:localesouth pacific  
##                     3.496                      1.785
ggplot(means, aes(temp, ozone, color = locale)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ locale)
## `geom_smooth()` using formula = 'y ~ x'

Diagnostics

An anova test suggests that both locale and the interaction effect of locale and temperature are useful for predicting ozone (i.e., the p-value that compares the full model to the reduced models is statistically significant).

mod <- lm(ozone ~ temp, data = means)
mod2 <- lm(ozone ~ temp + locale, data = means)
mod3 <- lm(ozone ~ temp + locale + temp:locale, data = means)

anova(mod, mod2, mod3)
## Analysis of Variance Table
## 
## Model 1: ozone ~ temp
## Model 2: ozone ~ temp + locale
## Model 3: ozone ~ temp + locale + temp:locale
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    574 99335                                  
## 2    571 41425  3     57911 706.17 < 2.2e-16 ***
## 3    568 15527  3     25898 315.81 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1