NOAA Observation Station Analysis

Mid-Century baseline years compared to 2016



Project Synopsis

According to a report published on January 18, 2017 by the National Aeronautics and Space Administration (NASA) and the National Oceanic and Atmospheric Administration (NOAA):

…(the) Earth’s 2016 surface temperatures were the warmest since modern record keeping began in 1880.

Globally-averaged temperatures in 2016 were 1.78 degrees Fahrenheit (0.99 degrees Celsius) warmer than the mid-20th century mean. This makes 2016 the third year in a row to set a new record for global average surface temperatures.

Source: https://www.nasa.gov/press-release/nasa-noaa-data-show-2016-warmest-year-on-record-globally


The 2016 Weather Data Exploratory Analysis project was started to review the raw data from NOAA and identify areas of uncertainty and their potential impact on reaching a greater than 95% scientific certainty.

This is Part 3 of the 2016 Weather Data Exploratory Analysis.


This project is designed to analyze NOAA worldwide temperature observation stations in terms of:

Libraries Required

library(dplyr)          # Data manipulation
library(ggplot2)        # Graphics language for complex plots
library(knitr)          # Dynamic report generation
library(leaflet)        # Interactive mapping

Stations Data

The Stations Data to be used is the worldwide master list created in Part 1 of 2016 Weather Data Exploratory Analysis:


Read Station Files

A brief view of the data will follow each processing step for clarification purposes.

Read Master Station Data

master_stations_ww <- readRDS("~/Temp Stations/master_stations_ww.rds")
kable(head(master_stations_ww, 5))
ID Latitude Longitude Elevation Country State Location FirstYear LastYear
ACW00011604 17.1167 -61.7833 33.14 ANTIGUA AND BARBUDA NA ST JOHNS COOLIDGE FLD 1949 1949
ACW00011647 17.1333 -61.7833 62.99 ANTIGUA AND BARBUDA NA ST JOHNS 1961 1961
AE000041196 25.3330 55.5170 111.55 UNITED ARAB EMIRATES NA SHARJAH INTER. AIRP 1944 2017
AEM00041194 25.2550 55.3640 34.12 UNITED ARAB EMIRATES NA DUBAI INTL 1983 2017
AEM00041217 24.4330 54.6510 87.93 UNITED ARAB EMIRATES NA ABU DHABI INTL 1983 2017

Data Exploration

The data exploration will consist of the following:


The first step is to gather statistics for:

Gather Station Counts

sta_1951_ct <- nrow(master_stations_ww %>%
                    filter(FirstYear <= 1951,
                           LastYear  >= 1951))

sta_1951_1980_ct <- nrow(master_stations_ww %>%
                         filter(FirstYear <= 1951,
                                LastYear  >= 1980))

sta_1951_2016_ct <- nrow(master_stations_ww %>%
                         filter(FirstYear <= 1951,
                                LastYear  >= 2016))

sta_2016_ct <- nrow(master_stations_ww %>%
                    filter(FirstYear <= 2016,
                           LastYear  >= 2016))

sta_2016_pct <- round(sta_1951_2016_ct / sta_2016_ct * 100, 2)

Report Station Counts

##      Stations in 1951:    9,506
## Stations in 1951-1980:    6,817
## Stations in 1951-2016:    4,322
##      Stations in 2016:   14,119
## 
## % of 2016 stations from Baseline:   30.61%

Less than 31% of observation stations active in 2016 have complete historical records from the Mid-Century baseline.


The next step is to gather statistics for the countries with:

Gather Country Counts

ctry_1951_ct <- nrow(master_stations_ww %>%
                     filter(FirstYear <= 1951,
                            LastYear  >= 1951) %>%
                     group_by(Country) %>%
                     slice(1L) %>%
                     ungroup())

ctry_1951_1980_ct <- nrow(master_stations_ww %>%
                          filter(FirstYear <= 1951,
                                 LastYear  >= 1980) %>%
                          group_by(Country) %>%
                          slice(1L) %>%
                          ungroup())

ctry_1951_2016_ct <- nrow(master_stations_ww %>%
                          filter(FirstYear <= 1951,
                                 LastYear  >= 2016) %>%
                          group_by(Country) %>%
                          slice(1L) %>%
                          ungroup())

ctry_2016_ct <- nrow(master_stations_ww %>%
                     filter(FirstYear <= 2016,
                            LastYear  >= 2016) %>%
                     group_by(Country) %>%
                     slice(1L) %>%
                     ungroup())

ctry_2016_pct <- round(ctry_1951_2016_ct / ctry_2016_ct * 100, 2)

Report Country Counts

##      Countries in 1951:   134
## Countries in 1951-1980:   123
## Countries in 1951-2016:   109
##      Countries in 2016:   189
## 
## % of 2016 countries from Baseline:   57.67%

Less than 58% of countries active in 2016 have complete historical records from the Mid-Century baseline.


Data Visualization

The first set of visualizations will chart the historical statistics for stations and countries within the NOAA GHCN. A yearly summary table will be created from master_stations_ww which will be used as input for the charts.

Create Temporary Variables for Summary

min_fy <- min(master_stations_ww$FirstYear)
max_fy <- max(master_stations_ww$FirstYear)

table_idx <- 1

Year <- numeric()
Sta_Count <- numeric()
Cty_Count <- numeric()

Gather Summary Statistics by Year

for(i in min_fy:max_fy) {
  
  Year[table_idx] <- i
  
  Sta_Count[table_idx] <- nrow(master_stations_ww %>%
                               filter(FirstYear <= i,
                                      LastYear  >= i))
  
  Cty_Count[table_idx] <- nrow(master_stations_ww %>%
                               filter(FirstYear <= i,
                                      LastYear  >= i) %>%
                               group_by(Country) %>%
                               slice(1L) %>%
                               ungroup())
  
  table_idx <- table_idx + 1
}

Create Summary Data Frame

summary_table <- data.frame(Year, Sta_Count, Cty_Count)

Create Station Chart

sta_chart <- ggplot() +
                geom_point(data=summary_table,
                           aes(x = Year,
                               y = Sta_Count),
                           colour = "steelblue",
                           size = 1,
                           alpha = 1) +
        
                geom_point(data=summary_table,
                           aes(x = 1880,
                               y = 188),
                           colour = "red",
                           size = 2,
                           alpha = 1) +  
        
                geom_point(data=summary_table,
                           aes(x = 1951,
                               y = 9506),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                geom_point(data=summary_table,
                           aes(x = 1980,
                               y = 14629),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                geom_point(data=summary_table,
                           aes(x = 2016,
                               y = 14119),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                geom_point(data=summary_table,
                           aes(x = 2017,
                               y = 11023),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                labs(title="NOAA: GHCN Stations by Year",
                     x = "Year",
                     y = "Temperature Stations Active") +
        
                scale_x_continuous(breaks = c(1760, 1775, 1790, 1805, 1820, 1835, 1850,
                                              1865, 1880, 1895, 1910, 1925, 1940, 1955,
                                              1970, 1985, 2000, 2015)) +
        
                scale_y_continuous(breaks = c(0, 1000, 2000, 3000, 4000, 5000, 6000,
                                              7000, 8000, 9000, 10000, 11000, 12000,
                                              13000, 14000, 15000, 16000, 17000, 18000,
                                              19000, 20000)) +
        
                geom_vline(xintercept = 1951, colour = "darkolivegreen", linetype=2) +
                geom_vline(xintercept = 1980, colour = "darkolivegreen", linetype=2) +
                geom_vline(xintercept = 1880, colour = "grey40", linetype=2) +
        
                annotate("text", x = 1965.5, y = 0,
                         label = "Mid-Century Baseline",
                         size=2.25,
                         colour="darkolivegreen",
                         fontface="bold",
                         hjust=.5) +
        
                annotate("text", x = 1879, y = 688,
                         label = "Number of Stations in 1880: 188",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 1949, y = 9520,
                         label = "Number of Stations in 1951: 9,506",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 1979, y = 15298,
                         label = "Number of Stations in 1980: 14,629",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 2020, y = 13588,
                         label = "Number of Stations in 2016: 14,119",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 2020, y = 10492,
                         label = "Number of Stations in 2017: 11,023",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                theme_bw()

Create Country Chart

cty_chart <- ggplot() +
        
                geom_point(data=summary_table,
                           aes(x = Year,
                               y = Cty_Count),
                           colour = "steelblue",
                           size = 1,
                           alpha = 1) +
        
                geom_point(data=summary_table,
                           aes(x = 1880,
                               y = 17),
                           colour = "red",
                           size = 2,
                           alpha = 1) +  
        
                geom_point(data=summary_table,
                           aes(x = 1951,
                               y = 134),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                geom_point(data=summary_table,
                           aes(x = 1980,
                               y = 201),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                geom_point(data=summary_table,
                           aes(x = 2016,
                               y = 189),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                geom_point(data=summary_table,
                           aes(x = 2017,
                               y = 178),
                           colour = "red",
                           size = 2,
                           alpha = 1) + 
        
                labs(title="NOAA: GHCN Countries by Year",
                     x = "Year",
                     y = "Countries Active") +
        
                scale_x_continuous(breaks = c(1760, 1775, 1790, 1805, 1820, 1835, 1850,
                                              1865, 1880, 1895, 1910, 1925, 1940, 1955,
                                              1970, 1985, 2000, 2015)) +
        
                scale_y_continuous(breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90,
                                              100, 110, 120, 130, 140, 150, 160, 170,
                                              180, 190, 200, 210, 220)) +
        
                geom_vline(xintercept = 1951, colour = "darkolivegreen", linetype=2) +
                geom_vline(xintercept = 1980, colour = "darkolivegreen", linetype=2) +
                geom_vline(xintercept = 1880, colour = "grey40", linetype=2) +
        
                annotate("text", x = 1965.5, y = 0,
                         label = "Mid-Century Baseline",
                         size=2.25,
                         colour="darkolivegreen",
                         fontface="bold",
                         hjust=.5) +
        
                annotate("text", x = 1879, y = 24,
                         label = "Number of Countries in 1880: 17",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 1949, y = 134.5,
                         label = "Number of Countries in 1951: 134",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 1979, y = 208,
                         label = "Number of Countries in 1980: 201",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 2020, y = 185,
                         label = "Number of Countries in 2016: 189",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                annotate("text", x = 2020, y = 173,
                         label = "Number of Countries in 2017: 178",
                         size=2.5,
                         colour="red",
                         fontface="bold",
                         hjust=1) +
        
                theme_bw()



The next data visualizations will be in the form of the temperature zone map used in Part 2 of the station analysis.

The comparison maps will be:


The temperature zone map will be created within a function which will be called with the appropriate subset of data from master_stations_ww.

Temperature Zone Map Function

TZ_Map_F <- function(temp_frame) {
        
        trop_sta <- temp_frame %>%
                    filter(Latitude  > -30.0001,
                           Latitude  <  30.0001)

        temp_sta <- temp_frame %>%
                    filter((Latitude  >  30 & Latitude  <  60.0001) |
                           (Latitude  < -30 & Latitude  > -60.0001))

        polr_sta <- temp_frame %>%
                    filter((Latitude  >  60 &  Latitude  <  90.0001) |
                           (Latitude  < -60 &  Latitude  > -90.0001))
        
        TZ_Map <- leaflet() %>%
                  addTiles() %>% 
                  addProviderTiles("Stamen.TerrainBackground") %>%
                  setView(lng  = 0,
                          lat  = 0,
                          zoom = 2) %>% 
                  addCircleMarkers(data = trop_sta,
                                   lng = ~ Longitude,
                                   lat = ~ Latitude,
                                   radius = 2,
                                   color = "peru",
                                   stroke = FALSE,
                                   fillOpacity = .7) %>%
                  addCircleMarkers(data = temp_sta,
                                   lng = ~ Longitude,
                                   lat = ~ Latitude,
                                   radius = 2,
                                   color = "darkgreen",
                                   stroke = FALSE,
                                   fillOpacity = .5) %>%
                  addCircleMarkers(data = polr_sta,
                                   lng = ~ Longitude,
                                   lat = ~ Latitude,
                                   radius = 2,
                                   color = "steelblue",
                                   stroke = FALSE,
                                   fillOpacity = .8)
        
        return(TZ_Map)
}

1951 Stations Map


1980 Stations Map


1951-1980 Stations Map (Baseline)


1951-2016 Stations Map


2016 Stations Map


Exploring Starting Years

The starting years for observation stations is of interest since a small percentage of stations active in 2016 have historical information in the Baseline years.

First, we can determine the earliest and latest starting years to establish the limits for the charts function.

Get Min and Max Starting Years

min(master_stations_ww$FirstYear)
## [1] 1763
max(master_stations_ww$FirstYear)
## [1] 2017

Create Charts Function

q_chart <- function(temp_chart) {
        
        temp_hist <- qplot(temp_chart$FirstYear,
                           geom="histogram",
                           binwidth = .5,
                           main = "Histogram for NOAA Stations - Starting Year",
                           xlab = "Starting Year",
                           ylab = "# Observation Stations",
                           fill=I("burlywood4")) +
                     theme_bw() +
                     scale_x_continuous(breaks = c(1775, 1800, 1825, 1850, 1875,
                                                   1900, 1925, 1950, 1975, 2000),
                                        limits = c(1760,2020)) +
                     scale_y_continuous(breaks = c(100, 200, 300, 400, 500, 600,
                                                   700, 800, 900, 1000, 1100, 1200),
                                        limits = c(0,1200))
        
        return(temp_hist)
}

The first review is the overall starting years for worldwide stations to get an idea as to when temperature collection for NOAA began within the Global Historical Climatology Network (NOAA was officially formed in 1970).

NOAA Worldwide Stations Chart (All Years)

Generate Percentages of Starting Year for NOAA Stations

quantile(master_stations_ww$FirstYear, prob = seq(0, 1, length = 11), type = 5)
##   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
## 1763 1900 1920 1945 1953 1960 1968 1978 1990 2001 2017

Determine Average Number of Years of Service per Station

master_stations_ww %>%
        mutate(Yrs_ser = LastYear - FirstYear) %>%
        summarise(Lgth_Ser = round(mean(Yrs_ser),2))
##   Lgth_Ser
## 1    36.03

NOAA Worldwide Stations Chart (2016)

Generate Percentages of Starting Year for 2016 Stations

fy_2016 <- master_stations_ww %>%
           filter(FirstYear <= 2016,
                  LastYear  >= 2016) %>%
           select(FirstYear)

quantile(fy_2016$FirstYear, prob = seq(0, 1, length = 11), type = 5)
##   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
## 1824 1905 1940 1951 1960 1973 1986 1994 1999 2005 2016

Next, we can review the stations from 1951.

1951 Stations Chart


Key Findings

Most of the stations active in 2016 have their initial year of activity prior to 2000. This is because NOAA does not always change the station ID when replacing old, faulty, or out-of-date stations. When the new station is assigned the same ID as the previous station, the historical record transfers with it.

From a scientific record keeping perspective, transferring historical data from one measurement device to another conflicts with the “Test Reliability” for observational analysis. Reliability is synonymous with the consistency of a test, survey, observation, or other measuring device.

The issue of “Test Reliability” is raised again when investigating the mid-century baseline upon which the 2016 data was compared against in the NOAA and NASA co-authored report.

If we assume that the physical stations deployed during the mid-century baseline were still active in 2016, only 30% of the 2016 stations would have direct and consistent data observational comparisons (“Test Reliability”).

When this information is combined with the geographic maps generated from the Latitude and Longitude of the stations, it is clear that a true historical worldwide Baseline does not exist. Observation stations are disproportionately located across land masses, Hemispheres, temperature zones, and there is a lack of reliability or consistency in historical record keeping.


Next Steps

Using the master observation lists created in Part 1, a data exploration and analysis was performed on the Mid-Century Baseline years (1951-1980) compared to 2016.

In Part 4, a review of the methodology that NOAA uses to aggregate data across geographic sectors will be performed.




sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] leaflet_1.1.0 knitr_1.15.1  ggplot2_2.2.1 dplyr_0.5.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10     magrittr_1.5     munsell_0.4.3    xtable_1.8-2    
##  [5] colorspace_1.3-2 R6_2.2.0         highr_0.6        stringr_1.2.0   
##  [9] plyr_1.8.4       tools_3.4.0      grid_3.4.0       gtable_0.2.0    
## [13] DBI_0.6-1        htmltools_0.3.6  crosstalk_1.0.0  yaml_2.1.14     
## [17] lazyeval_0.2.0   assertthat_0.2.0 rprojroot_1.2    digest_0.6.12   
## [21] tibble_1.3.0     shiny_1.0.3      htmlwidgets_0.8  mime_0.5        
## [25] evaluate_0.10    rmarkdown_1.5    stringi_1.1.5    compiler_3.4.0  
## [29] scales_0.4.1     backports_1.0.5  jsonlite_1.4     httpuv_1.3.3