According to a report published on January 18, 2017 by the National Aeronautics and Space Administration (NASA) and the National Oceanic and Atmospheric Administration (NOAA):
…(the) Earth’s 2016 surface temperatures were the warmest since modern record keeping began in 1880.
Globally-averaged temperatures in 2016 were 1.78 degrees Fahrenheit (0.99 degrees Celsius) warmer than the mid-20th century mean. This makes 2016 the third year in a row to set a new record for global average surface temperatures.
Source: https://www.nasa.gov/press-release/nasa-noaa-data-show-2016-warmest-year-on-record-globally
The 2016 Weather Data Exploratory Analysis project was started to review the raw data from NOAA and identify areas of uncertainty and their potential impact on reaching a greater than 95% scientific certainty.
This is Part 1 of the 2016 Weather Data Exploratory Analysis.
The data will be coming from the NOAA FTP site at the following links:
The specific files we will be using are:
The files are in a text format with specific layouts which are defined in the readme.txt file located at the following:
The specific instructions for each file will be documented in Appendix A: Station Files and will come from the readme.txt file to make it clear how and why the file is being processed in a particular manner.
The master lists will consist of observation stations that collect temperature data, their respective geographic coordinates, country, state and location information, and years of operation. Three master lists will be created for use in future exploratory analysis and saved in .Rds format:
master_stations_ww.rdsmaster_stations_us50.rdsmaster_stations_us48.rdsLibraries Required
library(dplyr) # Data manipulation
library(knitr) # Dynamic report generation
library(readr) # Reading tabular data
library(stringi) # String processing
All of the station files are Fixed Width Format text files so the read_fwf function from the readr package will be used.
Within each code chunk the file will be read using the beginning and ending positions of the variables defined, column names will be assigned, and any preliminary processing will be performed. This includes converting data to uppercase or selecting only the variables needed for the master lists. In some cases, needed variables exist in multiple files (Latitude, Longitude). Redundant variables will be removed from the latter data tables as part of the preliminary processing.
A brief view of the data will follow each processing step for clarification purposes.
Read Inventory File
inputFile <- "~/RAW Stations/ghcnd-inventory.txt"
ghcnd_inventory <- read_fwf(inputFile, fwf_positions(c(1,13,22,32,37,42),
c(11,20,30,35,40,45)))
colnames(ghcnd_inventory) <- c("ID",
"Latitude",
"Longitude",
"Element",
"FirstYear",
"LastYear")
kable(head(ghcnd_inventory, 5))
| ID | Latitude | Longitude | Element | FirstYear | LastYear |
|---|---|---|---|---|---|
| ACW00011604 | 17.1167 | -61.7833 | TMAX | 1949 | 1949 |
| ACW00011604 | 17.1167 | -61.7833 | TMIN | 1949 | 1949 |
| ACW00011604 | 17.1167 | -61.7833 | PRCP | 1949 | 1949 |
| ACW00011604 | 17.1167 | -61.7833 | SNOW | 1949 | 1949 |
| ACW00011604 | 17.1167 | -61.7833 | SNWD | 1949 | 1949 |
Read Country File
inputFile <- "~/RAW Stations/ghcnd-countries.txt"
ghcnd_countries <- read_fwf(inputFile, fwf_positions(c(1,4),
c(2,50)))
colnames(ghcnd_countries) <- c("CCode",
"Country")
ghcnd_countries <- ghcnd_countries %>%
transform(Country = toupper(Country)) ## Convert Country to uppercase
kable(head(ghcnd_countries, 5))
| CCode | Country |
|---|---|
| AC | ANTIGUA AND BARBUDA |
| AE | UNITED ARAB EMIRATES |
| AF | AFGHANISTAN |
| AG | ALGERIA |
| AJ | AZERBAIJAN |
Read Stations File
inputFile <- "~/RAW Stations/ghcnd-stations.txt"
ghcnd_stations <- read_fwf(inputFile, fwf_positions(c( 1, 13, 22, 32, 39, 42, 73, 77, 81),
c(11, 20, 30, 37, 40, 71, 75, 79, 85)))
colnames(ghcnd_stations) <- c("ID",
"Latitude",
"Longitude",
"Elevation",
"SCode",
"Location",
"GSNFlag",
"HCN-CRNFlag",
"WMOid")
ghcnd_stations <- ghcnd_stations %>%
transform(Location = toupper(Location)) %>% ## Convert Location to uppercase
select(ID, Elevation, SCode, Location) ## Select only the variables needed
kable(head(ghcnd_stations, 5))
| ID | Elevation | SCode | Location |
|---|---|---|---|
| ACW00011604 | 10.1 | NA | ST JOHNS COOLIDGE FLD |
| ACW00011647 | 19.2 | NA | ST JOHNS |
| AE000041196 | 34.0 | NA | SHARJAH INTER. AIRP |
| AEM00041194 | 10.4 | NA | DUBAI INTL |
| AEM00041217 | 26.8 | NA | ABU DHABI INTL |
Read States File
inputFile <- "~/RAW Stations/ghcnd-states.txt"
ghcnd_states <- read_fwf(inputFile, fwf_positions(c( 1, 4),
c( 2, 50)))
colnames(ghcnd_states) <- c("SCode",
"State")
ghcnd_states <- ghcnd_states %>%
transform(State = toupper(State)) ## Convert State to uppercase
kable(head(ghcnd_states, 5))
| SCode | State |
|---|---|
| AB | ALBERTA |
| AK | ALASKA |
| AL | ALABAMA |
| AR | ARKANSAS |
| AS | AMERICAN SAMOA |
The project is focused on worldwide temperature data. Observation stations in the NOAA inventory record many different observation types, Elements, but this project is concerned with the TAVG, TMAX, and TMIN observations.
Set Element Vector
temp_elements <- c("TAVG",
"TMAX",
"TMIN")
The inventory file is structured to have multiple entries for each observation station if they record more than one type of observation. The master list will contain a single entry for each station that meets the criteria of recording any of the values in the temp_elements vector.
The dplyr package will be used to filter, group, and reduce the entries to a single observation. Upon completion, the Element variable will be removed from the table since it will no longer be used. The final instruction will create a two character country code using the first two characters of the station ID. This variable will be used to join the data from the countries data.
Process Inventory Data
ghcnd_inventory <- ghcnd_inventory %>%
filter(Element %in% temp_elements) %>% ## Filter for temperature values
group_by(ID) %>% ## Group data by ID
slice(1L) %>% ## Reduce entries to a single observation
ungroup() %>%
select(-Element) %>% ## Remove the Element variable
mutate(CCode = substr(ID,1,2)) ## Create country code from ID
kable(head(ghcnd_inventory, 5))
| ID | Latitude | Longitude | FirstYear | LastYear | CCode |
|---|---|---|---|---|---|
| ACW00011604 | 17.1167 | -61.7833 | 1949 | 1949 | AC |
| ACW00011647 | 17.1333 | -61.7833 | 1961 | 1961 | AC |
| AE000041196 | 25.3330 | 55.5170 | 1944 | 2017 | AE |
| AEM00041194 | 25.2550 | 55.3640 | 1983 | 2017 | AE |
| AEM00041217 | 24.4330 | 54.6510 | 1983 | 2017 | AE |
The countries data will be joined with the inventory data by the variable CCode using dplyr. Once complete, the CCode can be removed and ghcnd_countries can be removed from the working environment.
Join Countries Data to Inventory Data
ghcnd_inventory <- left_join(ghcnd_inventory, ghcnd_countries, by = "CCode") %>%
select(-CCode)
rm(ghcnd_countries)
kable(head(ghcnd_inventory, 5))
| ID | Latitude | Longitude | FirstYear | LastYear | Country |
|---|---|---|---|---|---|
| ACW00011604 | 17.1167 | -61.7833 | 1949 | 1949 | ANTIGUA AND BARBUDA |
| ACW00011647 | 17.1333 | -61.7833 | 1961 | 1961 | ANTIGUA AND BARBUDA |
| AE000041196 | 25.3330 | 55.5170 | 1944 | 2017 | UNITED ARAB EMIRATES |
| AEM00041194 | 25.2550 | 55.3640 | 1983 | 2017 | UNITED ARAB EMIRATES |
| AEM00041217 | 24.4330 | 54.6510 | 1983 | 2017 | UNITED ARAB EMIRATES |
The stations data can be joined to the inventory data using the variable ID. Once complete, ghcnd_stations can be removed from the working environment.
Join Stations Data to Inventory Data
ghcnd_inventory <- left_join(ghcnd_inventory, ghcnd_stations, by = "ID")
rm(ghcnd_stations)
kable(head(ghcnd_inventory, 5))
| ID | Latitude | Longitude | FirstYear | LastYear | Country | Elevation | SCode | Location |
|---|---|---|---|---|---|---|---|---|
| ACW00011604 | 17.1167 | -61.7833 | 1949 | 1949 | ANTIGUA AND BARBUDA | 10.1 | NA | ST JOHNS COOLIDGE FLD |
| ACW00011647 | 17.1333 | -61.7833 | 1961 | 1961 | ANTIGUA AND BARBUDA | 19.2 | NA | ST JOHNS |
| AE000041196 | 25.3330 | 55.5170 | 1944 | 2017 | UNITED ARAB EMIRATES | 34.0 | NA | SHARJAH INTER. AIRP |
| AEM00041194 | 25.2550 | 55.3640 | 1983 | 2017 | UNITED ARAB EMIRATES | 10.4 | NA | DUBAI INTL |
| AEM00041217 | 24.4330 | 54.6510 | 1983 | 2017 | UNITED ARAB EMIRATES | 26.8 | NA | ABU DHABI INTL |
The final step in the station processing is to join the states data with the inventory data using the variable SCode. Once compete, the table will be reorganized to make viewing cleaner and ghcnd_states can be removed from the working environment.
Join States Data to Inventory Data
ghcnd_inventory <- left_join(ghcnd_inventory, ghcnd_states, by = "SCode") %>%
select(ID, Latitude, Longitude, Elevation, Country,
State, Location, FirstYear, LastYear)
rm(ghcnd_states)
kable(head(ghcnd_inventory, 5))
| ID | Latitude | Longitude | Elevation | Country | State | Location | FirstYear | LastYear |
|---|---|---|---|---|---|---|---|---|
| ACW00011604 | 17.1167 | -61.7833 | 10.1 | ANTIGUA AND BARBUDA | NA | ST JOHNS COOLIDGE FLD | 1949 | 1949 |
| ACW00011647 | 17.1333 | -61.7833 | 19.2 | ANTIGUA AND BARBUDA | NA | ST JOHNS | 1961 | 1961 |
| AE000041196 | 25.3330 | 55.5170 | 34.0 | UNITED ARAB EMIRATES | NA | SHARJAH INTER. AIRP | 1944 | 2017 |
| AEM00041194 | 25.2550 | 55.3640 | 10.4 | UNITED ARAB EMIRATES | NA | DUBAI INTL | 1983 | 2017 |
| AEM00041217 | 24.4330 | 54.6510 | 26.8 | UNITED ARAB EMIRATES | NA | ABU DHABI INTL | 1983 | 2017 |
The recorded elevation is set to -999.9 for stations with missing values.
Check For Missing Elevations
nrow(ghcnd_inventory %>%
filter(Elevation == -999.9))
## [1] 447
There are 447 stations with missing Elevation recordings. These readings will be converted to NA (Not Available). If the value recorded is valid it will be converted from meters to feet. The conversion factor is 1 meter is equal to 3.28084 feet.
Convert Elevations
ghcnd_inventory <- ghcnd_inventory %>%
transform(Elevation = ifelse(Elevation == -999.9,
NA,
round(Elevation * 3.28084, 2)))
summary(ghcnd_inventory$Elevation)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -1148.3 290.0 899.9 1776.2 2378.6 16512.5 447
The maximum and minimum elevation levels are within the known ranges of the highest and lowest points on Earth.
The worlwide master list is complete and can now be saved.
Save Master Worldwide Station List
saveRDS(ghcnd_inventory, "~/Temp Stations/master_stations_ww.rds")
To create the master lists for the 50 United States and 48 Contiguous Unites States the ghcnd_inventory file will need to be filtered. The first check to make is the number of states that are recorded as part of the U.s.
Check U.S. States
temp_us <- ghcnd_inventory %>%
filter(Country == "UNITED STATES")
table(temp_us$State)
##
## ALABAMA ALASKA
## 245 850
## ARIZONA ARKANSAS
## 554 223
## CALIFORNIA COLORADO
## 1252 633
## CONNECTICUT DELAWARE
## 80 24
## DISTRICT OF COLUMBIA FLORIDA
## 1 368
## GEORGIA HAWAII
## 330 308
## IDAHO ILLINOIS
## 536 313
## INDIANA IOWA
## 260 252
## KANSAS KENTUCKY
## 362 290
## LOUISIANA MAINE
## 295 205
## MARYLAND MASSACHUSETTS
## 227 233
## MICHIGAN MINNESOTA
## 514 380
## MISSISSIPPI MISSOURI
## 216 360
## MONTANA NEBRASKA
## 768 359
## NEVADA NEW HAMPSHIRE
## 444 159
## NEW JERSEY NEW MEXICO
## 116 542
## NEW YORK NORTH CAROLINA
## 548 383
## NORTH DAKOTA OHIO
## 245 340
## OKLAHOMA OREGON
## 348 759
## PACIFIC ISLANDS PENNSYLVANIA
## 1 493
## RHODE ISLAND SOUTH CAROLINA
## 36 190
## SOUTH DAKOTA TENNESSEE
## 304 282
## TEXAS U.S. MINOR OUTLYING ISLANDS
## 961 1
## UTAH VERMONT
## 633 122
## VIRGINIA WASHINGTON
## 332 566
## WEST VIRGINIA WISCONSIN
## 273 357
## WYOMING
## 468
rm(temp_us)
There are three “states” that can be removed from the data for the purposes of the future uses of the U.s. master lists.
Create U.S. 50 Master List
rem_states <- c("DISTRICT OF COLUMBIA", "PACIFIC ISLANDS", "U.S. MINOR OUTLYING ISLANDS")
ghcnd_inventory <- ghcnd_inventory %>%
filter(Country == "UNITED STATES") %>%
filter(!State %in% rem_states)
Save U.S. 50 Master List
saveRDS(ghcnd_inventory, "~/Temp Stations/master_stations_us50.rds")
The master list for the 48 Contiguous United States requires Alaska and Hawaii being removed.
Create U.S. 48 Master List
rem_states <- c("ALASKA", "HAWAII")
ghcnd_inventory <- ghcnd_inventory %>%
filter(!State %in% rem_states)
Save U.S. 48 Master List
saveRDS(ghcnd_inventory, "~/Temp Stations/master_stations_us48.rds")
Clean Working Environment
rm(ghcnd_inventory)
rm(inputFile, rem_states, temp_elements)
Using the master observation lists created in this project, a data exploration and analysis will be performed to review station densities and distributions around the world.
In Part 2, the focus will be on Hemispheric, Quadrant, and country diversity along with land mass comparisons for density.
In Part 3, the focus will be on the stations present during the Mid-Century baseline years (1951-1980) compared to 2016.
In Part 4, a review of the methodology that NOAA uses to aggregate data across geographic sectors will be performed.
FORMAT OF “ghcnd-countries.txt”
| Variable | Columns | Type |
|---|---|---|
| CODE | 1-2 | Character |
| NAME | 4-50 | Character |
These variables have the following definitions:
FORMAT OF “ghcnd-inventory.txt”
| Variable | Columns | Type |
|---|---|---|
| ID | 1-11 | Character |
| LATITUDE | 13-20 | Real |
| LONGITUDE | 22-30 | Real |
| ELEMENT | 32-35 | Character |
| FIRSTYEAR | 37-40 | Integer |
| LASTYEAR | 42-45 | Integer |
These variables have the following definitions:
FORMAT OF “ghcnd-stations.txt”
| Variable | Columns | Type |
|---|---|---|
| ID | 1-11 | Character |
| LATITUDE | 13-20 | Real |
| LONGITUDE | 22-30 | Real |
| ELEVATION | 32-37 | Real |
| STATE | 39-40 | Character |
| NAME | 42-71 | Character |
| GSN FLAG | 73-75 | Character |
| HCN/CRN FLAG | 77-79 | Character |
| WMO ID | 81-85 | Character |
FORMAT OF “ghcnd-states.txt”
| Variable | Columns | Type |
|---|---|---|
| CODE | 1-2 | Character |
| NAME | 4-50 | Character |
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] stringi_1.1.5 readr_1.1.0 knitr_1.15.1 dplyr_0.5.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 digest_0.6.12 rprojroot_1.2 assertthat_0.2.0
## [5] R6_2.2.0 DBI_0.6-1 backports_1.0.5 magrittr_1.5
## [9] evaluate_0.10 highr_0.6 lazyeval_0.2.0 rmarkdown_1.5
## [13] tools_3.4.0 stringr_1.2.0 hms_0.3 yaml_2.1.14
## [17] compiler_3.4.0 htmltools_0.3.6 tibble_1.3.0