About the Challenge Expo

The Annual Data Challenge Expo is jointly sponsored by three American Statistical Association (ASA) Sections – Statistical Computing, Statistical Graphics, and Government Statistics.

##Data The ‘atmos’ data set resides in the nasaweather package of the R programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the ASA Data Expo.

Some of the variables in the atmos data set are:

  • temp - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (K))
  • pressure - The mean monthly air pressure at the surface of the Earth (measured in millibars (mb))
  • ozone - The mean monthly abundance of atmospheric ozone (measured in Dobson units (DU))

You can convert the temperature unit from Kelvin to Celsius with the formula

$celsius = kelvin – 273.15 $

And you can convert the result to Fahrenheit with the formula

\[ fahrenheit = celsius \times \frac{9}{5} + 32 \]

library(nasaweather) 
library(tidyverse)

For the remainder of the report, we will look only at data from the year 1995 . We aggregate our data by location, using the R code below.

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
            pressure = mean(pressure, na.rm = TRUE),
            ozone = mean(ozone, na.rm = TRUE),
            cloudlow = mean(cloudlow, na.rm = TRUE),
            cloudmid = mean(cloudmid, na.rm = TRUE),
            cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'long'. You can override using the
## `.groups` argument.

Ozone and temperature

Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.

ggplot(data = means, aes(x = temp, y = ozone)) + geom_point()

We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:

means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"

Conclusions

We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region.