According to a report published on January 18, 2017 by the National Aeronautics and Space Administration (NASA) and the National Oceanic and Atmospheric Administration (NOAA):
…(the) Earth’s 2016 surface temperatures were the warmest since modern record keeping began in 1880.
Globally-averaged temperatures in 2016 were 1.78 degrees Fahrenheit (0.99 degrees Celsius) warmer than the mid-20th century mean. This makes 2016 the third year in a row to set a new record for global average surface temperatures.
Source: https://www.nasa.gov/press-release/nasa-noaa-data-show-2016-warmest-year-on-record-globally
The 2016 Weather Data Exploratory Analysis project was started to review the raw data from NOAA and identify areas of uncertainty and their potential impact on reaching a greater than 95% scientific certainty.
This is Part 2 of the 2016 Weather Data Exploratory Analysis.
This project is designed to analyze NOAA worldwide temperature observation stations in terms of:
Libraries Required
library(dplyr) # Data manipulation
library(ggplot2) # Graphics language for complex plots
library(knitr) # Dynamic report generation
library(leaflet) # Interactive mapping
The Stations Data to be used is the worldwide master list created in Part 1 of 2016 Weather Data Exploratory Analysis:
master_stations_ww.rdsThe second file to be used is a world land mass file created from The World Bank:
world_land_mass.csvSource: http://data.worldbank.org/indicator/AG.LND.TOTL.K2
A brief view of the data will follow each processing step for clarification purposes.
Read Master Station Data
master_stations_ww <- readRDS("~/Temp Stations/master_stations_ww.rds")
kable(head(master_stations_ww, 5))
| ID | Latitude | Longitude | Elevation | Country | State | Location | FirstYear | LastYear |
|---|---|---|---|---|---|---|---|---|
| ACW00011604 | 17.1167 | -61.7833 | 33.14 | ANTIGUA AND BARBUDA | NA | ST JOHNS COOLIDGE FLD | 1949 | 1949 |
| ACW00011647 | 17.1333 | -61.7833 | 62.99 | ANTIGUA AND BARBUDA | NA | ST JOHNS | 1961 | 1961 |
| AE000041196 | 25.3330 | 55.5170 | 111.55 | UNITED ARAB EMIRATES | NA | SHARJAH INTER. AIRP | 1944 | 2017 |
| AEM00041194 | 25.2550 | 55.3640 | 34.12 | UNITED ARAB EMIRATES | NA | DUBAI INTL | 1983 | 2017 |
| AEM00041217 | 24.4330 | 54.6510 | 87.93 | UNITED ARAB EMIRATES | NA | ABU DHABI INTL | 1983 | 2017 |
Read World Land Mass File
world_land_mass <- read.csv("~/Temp Stations/world_land_mass.csv", stringsAsFactors = FALSE)
kable(head(world_land_mass, 5))
| Country | Region | Area |
|---|---|---|
| AFGHANISTAN | SOUTHERN ASIA | 251827 |
| AKROTIRI | SOUTHERN EUROPE | 47 |
| ALBANIA | SOUTHERN EUROPE | 11100 |
| ALGERIA | NORTHERN AFRICA | 919595 |
| AMERICAN SAMOA [UNITED STATES] | POLYNESIA | 77 |
The data exploration will consist of the following:
The first step will be to create a subset of the master list containing stations that are defined by NOAA as being active during 2016.
Create Station List for 2016
master_stations_2016 <- master_stations_ww %>%
filter(FirstYear <= 2016,
LastYear >= 2016)
kable(head(master_stations_2016, 5))
| ID | Latitude | Longitude | Elevation | Country | State | Location | FirstYear | LastYear |
|---|---|---|---|---|---|---|---|---|
| AE000041196 | 25.3330 | 55.517 | 111.55 | UNITED ARAB EMIRATES | NA | SHARJAH INTER. AIRP | 1944 | 2017 |
| AEM00041194 | 25.2550 | 55.364 | 34.12 | UNITED ARAB EMIRATES | NA | DUBAI INTL | 1983 | 2017 |
| AEM00041217 | 24.4330 | 54.651 | 87.93 | UNITED ARAB EMIRATES | NA | ABU DHABI INTL | 1983 | 2017 |
| AEM00041218 | 24.2620 | 55.609 | 869.09 | UNITED ARAB EMIRATES | NA | AL AIN INTL | 1994 | 2017 |
| AG000060390 | 36.7167 | 3.250 | 78.74 | ALGERIA | NA | ALGER-DAR EL BEIDA | 1940 | 2016 |
The distribution analysis will begin with a review of the stations across the Northern and Southern Hemispheres.
Create Hemisphere Statistics
ww_ct <- nrow(master_stations_2016)
nh_ct <- nrow(master_stations_2016 %>%
filter(Latitude > 0))
sh_ct <- nrow(master_stations_2016 %>%
filter(Latitude < 0))
nh_pt <- round(nh_ct/ww_ct * 100, 2)
sh_pt <- round(sh_ct/ww_ct * 100, 2)
Report Statistics by Hemisphere
## Worldwide Count: 14,119
##
## Northern Hemisphere: 12,939 91.64%
## Southern Hemisphere: 1,180 8.36%
Land areas are distributed predominantly in the Northern Hemisphere (68%) relative to the Southern Hemisphere (32%) as divided by the equator. However, a significantly larger percentage of observation stations in 2016 were in the Northern Hemisphere. This fact can be important when reviewing the temperature observations later on and whether data is recorded consistently throughout the year.
The next analysis will look at distribution by geographic Quadrants.
Earth Quadrants
Create Quadrant Statistics
nw_ct <- nrow(master_stations_2016 %>%
filter(Latitude > 0,
Longitude < 0))
sw_ct <- nrow(master_stations_2016 %>%
filter(Latitude < 0,
Longitude < 0))
ne_ct <- nrow(master_stations_2016 %>%
filter(Latitude > 0,
Longitude > 0))
se_ct <- nrow(master_stations_2016 %>%
filter(Latitude < 0,
Longitude > 0))
nw_pt <- round(nw_ct/ww_ct * 100, 2)
sw_pt <- round(sw_ct/ww_ct * 100, 2)
ne_pt <- round(ne_ct/ww_ct * 100, 2)
se_pt <- round(se_ct/ww_ct * 100, 2)
Report Statistics by Quadrant
## Worldwide Count: 14,119
##
## Northwest Quadrant Count: 10,019 70.96%
## Northeast Quadrant Count: 2,919 20.67%
## Southeast Quadrant Count: 947 6.71%
## Southwest Quadrant Count: 231 1.64%
The distribution of worldwide temperature observation stations is heavily skewed, not only to the Northern Hemisphere, but predominantly to the Northwest Quadrant. This issue will be further illustrated in the later visualizations.
Another examination of station distribution is to look at the number of stations per country.
Summarize by Country
master_country <- master_stations_2016 %>%
group_by(Country) %>%
summarise(Ctry_Cnt = n()) %>%
arrange(desc(Ctry_Cnt))
## [1] "Number of Countries in 2016 Master List: 189"
There are a total of 193 Countries recognized by the United Nations, 2 countries that are not recognized, 61 dependent territories, and 6 disputed areas.
Source: http://www.travelindependent.info/countries-howmany.htm
Review summarized data
kable(head(master_country,10))
| Country | Ctry_Cnt |
|---|---|
| UNITED STATES | 8394 |
| CANADA | 1173 |
| AUSTRALIA | 663 |
| RUSSIA | 648 |
| SWEDEN | 267 |
| CHINA | 202 |
| FINLAND | 172 |
| JAPAN | 144 |
| INDIA | 132 |
| NORWAY | 116 |
kable(tail(master_country,10))
| Country | Ctry_Cnt |
|---|---|
| REUNION [FRANCE] | 1 |
| RWANDA | 1 |
| SAINT PIERRE AND MIQUELON [FRANCE] | 1 |
| SEYCHELLES | 1 |
| SINGAPORE | 1 |
| SWAZILAND | 1 |
| TRINIDAD AND TOBAGO | 1 |
| TROMELIN ISLAND [FRANCE] | 1 |
| WAKE ISLAND [UNITED STATES] | 1 |
| ZAMBIA | 1 |
The number of stations, by country, deployed around the world is also heavily skewed. Let’s take a look at the distribution statistics.
Summary of station density
quantile(master_country$Ctry_Cnt)
## 0% 25% 50% 75% 100%
## 1 2 6 15 8394
quantile(master_country$Ctry_Cnt, prob = seq(0, 1, length = 11), type = 5)
## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
## 1.0 1.0 1.0 3.0 4.0 6.0 9.9 13.0 19.7 53.6
## 100%
## 8394.0
To get a better picture of the discrepancy between observation stations per country, we’ll chart the top 20 countries in terms of observation station count.
Observation chart by country (top 20)
p <- ggplot(head(master_country, 20),
aes(x = reorder(factor(Country), Ctry_Cnt),
y = Ctry_Cnt,
fill = "burlywood4")) +
labs(title="2016 - Observation Stations by Country (Top 20)",
x = "",
y = "# Observation Stations") +
geom_bar(stat = "identity", width = 0.4) +
theme_bw() +
coord_flip() +
scale_fill_manual(values = c('burlywood4'), guide = FALSE) +
theme(axis.text = element_text(size = 8),
axis.text.y = element_text(size = 8))
p
Actual Counts per Country (Top 20)
kable(head(master_country, 20), caption = "master_country")
| Country | Ctry_Cnt |
|---|---|
| UNITED STATES | 8394 |
| CANADA | 1173 |
| AUSTRALIA | 663 |
| RUSSIA | 648 |
| SWEDEN | 267 |
| CHINA | 202 |
| FINLAND | 172 |
| JAPAN | 144 |
| INDIA | 132 |
| NORWAY | 116 |
| INDONESIA | 103 |
| SPAIN | 101 |
| UKRAINE | 98 |
| GERMANY | 77 |
| MEXICO | 75 |
| ARGENTINA | 64 |
| KAZAKHSTAN | 62 |
| ANTARCTICA | 59 |
| ALGERIA | 54 |
| SOUTH AFRICA | 53 |
Since there is a very large discrepancy between the Hemispheres, Quadrants, and countries it makes sense to look at the distribution of stations across the temperature zones of the earth in order to determine if there is a balanced recording of observations across latitudes. It is generally recognized that there are three distinct temperature zones for each of the Northern and Southern Hemispheres:
Create Temperature Zones Statistics
n_trop_ct <- nrow(master_stations_2016 %>%
filter(Latitude > 0,
Latitude < 30.0001))
n_trop_pt <- round(n_trop_ct / ww_ct * 100, 2)
n_temp_ct <- nrow(master_stations_2016 %>%
filter(Latitude > 30,
Latitude < 60.0001))
n_temp_pt <- round(n_temp_ct / ww_ct * 100, 2)
n_polr_ct <- nrow(master_stations_2016 %>%
filter(Latitude > 60,
Latitude < 90.0001))
n_polr_pt <- round(n_polr_ct / ww_ct * 100, 2)
s_trop_ct <- nrow(master_stations_2016 %>%
filter(Latitude < 0,
Latitude > -30.0001))
s_trop_pt <- round(s_trop_ct / ww_ct * 100, 2)
s_temp_ct <- nrow(master_stations_2016 %>%
filter(Latitude < -30,
Latitude > -60.0001))
s_temp_pt <- round(s_temp_ct / ww_ct * 100, 2)
s_polr_ct <- nrow(master_stations_2016 %>%
filter(Latitude < -60,
Latitude > -90.0001))
s_polr_pt <- round(s_polr_ct / ww_ct * 100, 2)
trop_cnt <- n_trop_ct + s_trop_ct
trop_pct <- round(trop_cnt / ww_ct * 100, 2)
temp_cnt <- n_temp_ct + s_temp_ct
temp_pct <- round(temp_cnt / ww_ct * 100, 2)
polr_cnt <- n_polr_ct + s_polr_ct
polr_pct <- round(polr_cnt / ww_ct * 100, 2)
Report Temperature Zones Statistics
## Northern Tropical Count: 1,177 8.34%
## Northern Temperate Count: 10,803 76.51%
## Northern Polar Count: 959 6.79%
##
## Southern Tropical Count: 582 4.12%
## Southern Temperate Count: 539 3.82%
## Southern Polar Count: 59 0.42%
##
## Total Tropical Count: 1,759 12.46%
## Total Temperate Count: 11,342 80.33%
## Total Polar Count: 1,018 7.21%
Once again, there is a vast discrepancy between temperature zones.
We can see the dramatic difference through a map which charts all worldwide observation stations color coded by their temperature zone.
Create Temperature Zone Tables
trop_sta <- master_stations_2016 %>%
filter(Latitude > -30.0001,
Latitude < 30.0001)
temp_sta <- master_stations_2016 %>%
filter((Latitude > 30 & Latitude < 60.0001) |
(Latitude < -30 & Latitude > -60.0001))
polr_sta <- master_stations_2016 %>%
filter((Latitude > 60 & Latitude < 90.0001) |
(Latitude < -60 & Latitude > -90.0001))
Create Temperature Zone Map
TZ_Map <- leaflet() %>%
addTiles() %>%
addProviderTiles("Stamen.TerrainBackground") %>%
setView(lng = 0,
lat = 0,
zoom = 2) %>%
addCircleMarkers(data = trop_sta,
lng = ~ Longitude,
lat = ~ Latitude,
radius = 2,
color = "peru",
stroke = FALSE,
fillOpacity = .7) %>%
addCircleMarkers(data = temp_sta,
lng = ~ Longitude,
lat = ~ Latitude,
radius = 2,
color = "darkgreen",
stroke = FALSE,
fillOpacity = .5) %>%
addCircleMarkers(data = polr_sta,
lng = ~ Longitude,
lat = ~ Latitude,
radius = 2,
color = "steelblue",
stroke = FALSE,
fillOpacity = .8)
Temperature Zone Map for Observation Stations - 2016
The map illustrated the vast amount of surface area missing from the observation collection. This will be critical when exploring the methods that NOAA uses to aggregate and smooth data for their analysis and reporting.
Included in world_land_mass is a region classification. This information will be joined to the master stations list to create summary totals and maps by sub-region to explore distributions at that level.
master_regions <- left_join(master_stations_2016, world_land_mass, by = "Country")
Summarize by Region
master_reg_sum <- master_regions %>%
group_by(Region) %>%
summarise(Reg_Cnt = n()) %>%
arrange(desc(Reg_Cnt))
Create Chart by Region
p <- ggplot(master_reg_sum,
aes(x = reorder(factor(Region), Reg_Cnt),
y = Reg_Cnt,
fill = "burlywood4")) +
labs(title="2016 - Observation Stations by Region",
x = "",
y = "# Observation Stations") +
geom_bar(stat = "identity", width = 0.4) +
theme_bw() +
coord_flip() +
scale_fill_manual(values = c('burlywood4'), guide = FALSE) +
scale_y_continuous(breaks = c(500, 1500, 2500, 3500, 4500, 5500, 6500, 7500, 8500, 9500),
labels = c( "500", "1,500", "2,500", "3,500", "4,500",
"5,500", "6,500", "7,500", "8,500", "9,500")) +
theme(axis.text = element_text(size = 8),
axis.text.y = element_text(size = 8))
p
North America has over 9,500 observation stations while no other region has at least 1,000. In fact, all but four regions have counts below 500.
Regional maps will help illustrate the disparity in data gathering distribution.
Create Region Map Function
reg_map <- function(temp_reg, vw_zoom) {
vw_long <- mean(temp_reg$Longitude)
vw_lat <- mean(temp_reg$Latitude)
RegMap <- leaflet() %>%
addTiles() %>%
addProviderTiles("Stamen.TerrainBackground") %>%
setView(lng = vw_long,
lat = vw_lat,
zoom = vw_zoom) %>%
addCircleMarkers(data = temp_reg,
lng = ~ Longitude,
lat = ~ Latitude,
radius = 3,
color = "red",
stroke = FALSE,
fillOpacity = .7) %>%
addCircleMarkers(data = temp_reg,
lng = ~ Longitude,
lat = ~ Latitude,
radius = 1.5,
color = "black",
stroke = FALSE,
fillOpacity = .6)
return(RegMap)
}
North America Region Map
Central America and Caribbean Map
South America Region Map
European Region Map
African Region Map
Russia and Independent States Region Map
Asian Region Map
Oceana Region Map
Antarctic Region Map
It’s clear from the maps and the statistics that sizeable segments of the planet are not being reported on by NOAA. In particular, Greenland has a land mass over 836,000 square miles but contains only 7 observation stations which all lie along it’s coast. These locations are typically ice-free areas with small populations. This raises the issue of density of observation stations. That is, what is the average number of square miles per observation station in countries and regions around the world.
The world_land_mass.csv contains the square miles of land mass for each country in the world. The master_country data will be joined with the world_land_mass data to provide information on land mass by country and the number of square miles per observation station.
The summarization will have total country area in thousands of square miles and square miles per station in hundreds of miles in order to make the resulting chart more readable. The variable Sq_Mile_Sta is set to a negative number strictly for creating a dual axis chart.
Summarize by Region for Land Mass
master_cty_lm <- left_join(master_country, world_land_mass, by = "Country") %>%
mutate(Sq_Mile_Sta = round(Area / Ctry_Cnt, 0)) %>%
arrange(desc(Area)) %>%
transform(Sq_Mile_Sta = Sq_Mile_Sta / 100 * -1,
Area = Area / 1000)
Create Chart by Land Mass
p <- ggplot(head(master_cty_lm, 30),
aes(x = reorder(factor(Country), Area),
y = Area,
fill = "burlywood4")) +
labs(title="2016 - Observation Stations by Land Mass (Top 30)",
x = "",
y = "") +
geom_linerange(head(master_cty_lm, 30),
mapping=aes(x = reorder(factor(Country), Area),
ymin = 0,
ymax = Area),
colour = "darkolivegreen4", alpha=.8, size=3) +
geom_linerange(head(master_cty_lm, 30),
mapping=aes(x = reorder(factor(Country), Area),
ymin = 0,
ymax = Sq_Mile_Sta),
colour = "steelblue", alpha=.8, size=3) +
scale_y_continuous(breaks = c(-2500, -1250, 0,
1250, 2500, 3750, 5000, 6250),
labels = c("2,500 Sq Miles", "1,250 Sq Miles",
"0",
"1,250 Sq Miles", "2,500 Sq Miles",
"3,750 Sq Miles", "5,000 Sq Miles",
"6,250 Sq Miles")) +
theme_bw() +
coord_flip() +
theme(axis.text = element_text(size = 8),
axis.text.y = element_text(size = 8)) +
geom_hline(yintercept = 0, colour = "grey40", linetype=1) +
annotate("text",
x = 1.2,
y = -2500,
label = "Square Miles per Station (in 100s)",
size=2.5,
colour="steelblue",
fontface="bold",
hjust=0) +
annotate("text",
x = 1.2,
y = 6250,
label = "Square Miles of Land Mass (in 1000s)",
size=2.5,
colour="darkolivegreen4",
fontface="bold",
hjust=1)
p
The United States, Canada, Australia, and Russia all indicate a very high level of density regarding observation stations, especially given their large land mass. However, if you refer back to the distribution maps previously created, only the United States would be considered fully covered. All of the other top land mass countries display very large sections with very few stations. This means that density within the countries is extremely high in certain areas or cities and very sparse in outlying areas.
A similar visualization can be created for the number of square miles per observation station to identify the least densely covered countries.
Create Chart by Square Miles per Station
master_cty_lm <- master_cty_lm %>%
arrange(Sq_Mile_Sta)
p <- ggplot(head(master_cty_lm, 30),
aes(x = reorder(factor(Country), desc(Sq_Mile_Sta)),
y = Sq_Mile_Sta,
fill = "burlywood4")) +
labs(title="2016 - Square Miles per Observation Station (Top 30)",
x = "",
y = "") +
geom_linerange(head(master_cty_lm, 30),
mapping=aes(x = reorder(factor(Country), desc(Sq_Mile_Sta)),
ymin = 0,
ymax = Area),
colour = "darkolivegreen4", alpha=.8, size=3) +
geom_linerange(head(master_cty_lm, 30),
mapping=aes(x = reorder(factor(Country), desc(Sq_Mile_Sta)),
ymin = 0,
ymax = Sq_Mile_Sta),
colour = "steelblue", alpha=.8, size=3) +
scale_y_continuous(breaks = c(-2500, -1250, 0,
1250, 2500, 3750, 5000, 6250),
labels = c("2,500 Sq Miles", "1,250 Sq Miles",
"0",
"1,250 Sq Miles", "2,500 Sq Miles",
"3,750 Sq Miles", "5,000 Sq Miles",
"6,250 Sq Miles")) +
theme_bw() +
coord_flip() +
theme(axis.text = element_text(size = 8),
axis.text.y = element_text(size = 8)) +
geom_hline(yintercept = 0, colour = "grey40", linetype=1) +
annotate("text",
x = 1.2,
y = -2500,
label = "Square Miles per Station (in 100s)",
size=2.5,
colour="steelblue",
fontface="bold",
hjust=0) +
annotate("text",
x = 1.2,
y = 6250,
label = "Square Miles of Land Mass (in 1000s)",
size=2.5,
colour="darkolivegreen4",
fontface="bold",
hjust=1)
p
For the countries identified in the visualization, their respective average temperatures will consist of fewer total observations and ignore potential temperature observations due to a lack of stations. For areas like Greenland and Antarctica, subject of many climate related theories, the actual observation capabilities are limited to a relatively small number of coastal stations.
Using the master observation lists created in the previous project, a data exploration and analysis was performed to review station densities and distributions around the world.
The focus was on Hemispheric, Quadrant, region and country diversity along with land mass comparisons for density. It was found that the vast majority of world is underreported by current temperature observation stations, especially when compared to the United States, and large segments of the planet are completely barren and absent from the temperature record.
In Part 3, the focus will be on the stations present during the Mid-Century baseline years (1951-1980) compared to 2016.
In Part 4, a review of the methodology that NOAA uses to aggregate data across geographic sectors will be performed.
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] leaflet_1.1.0 knitr_1.15.1 ggplot2_2.2.1 dplyr_0.5.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 magrittr_1.5 munsell_0.4.3 xtable_1.8-2
## [5] colorspace_1.3-2 R6_2.2.0 highr_0.6 stringr_1.2.0
## [9] plyr_1.8.4 tools_3.4.0 grid_3.4.0 gtable_0.2.0
## [13] DBI_0.6-1 htmltools_0.3.6 crosstalk_1.0.0 yaml_2.1.14
## [17] lazyeval_0.2.0 assertthat_0.2.0 rprojroot_1.2 digest_0.6.12
## [21] tibble_1.3.0 shiny_1.0.3 htmlwidgets_0.8 mime_0.5
## [25] evaluate_0.10 rmarkdown_1.5 labeling_0.3 stringi_1.1.5
## [29] compiler_3.4.0 scales_0.4.1 backports_1.0.5 jsonlite_1.4
## [33] httpuv_1.3.3