This data set deals with the monthly, seasonal, and annual means from the Northern Hemisphere, from 1880-present. The data records the deviations from the corresponding 1951-1980 means. This data shows that global temperature continued to rise rapidly in the 21st century, with new record heights being reached in every decade. The data is available at NASA’s Goddard Institute for Space Studies.
When comparing seasonal temperatures, it is convenient to use “meteorological seasons” based on temperature and defined as groupings of whole months. Thus, Dec-Jan-Feb (DJF) is the Northern Hemisphere meteorological winter, Mar-Apr-May (MAM) is N.H. meteorological spring, Jun-Jul-Aug (JJA) is N.H. meteorological summer and Sep-Oct-Nov (SON) is N.H. meteorological autumn. String these four seasons together and you have the meteorological year that begins on Dec. 1 and ends on Nov. 30 (D.N).
This analysis uses two plots: the data from each meteorological season is plotted on the same line plot, and the data from the meteorological year is plotted on an area plot. These visualizations affirm the original claim – global temperature continued to rise rapidly in the 21st century, with new record heights being reached in every decade. It also shows that data from the Northern Hemisphere in the meteorological winter is the most turbulent, and there was a large spike in deviation in the mid-1940’s.
View interactive plotly graphs: Deviation by Meteorological Season Deviation by Meteorological Year
Load Required Packages:
library("ggplot2")
library("knitr")
library("devtools")
Set working directory and read CSV:
setwd("/Users/brianbartling/Documents/Visualization/Programming Assignment 1 Data New")
data <- read.csv("ExcelFormattedGISTEMPDataCSV.csv")
Initial Analysis:
head(data)
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J.D D.N DJF MAM
## 1 1880 -29 -19 -17 -27 -13 -28 -22 -6 -16 -15 -18 -20 -19 *** **** -19
## 2 1881 -8 -13 2 -2 -3 -27 -5 -1 -8 -18 -25 -14 -10 -11 -13 -1
## 3 1882 10 10 2 -19 -17 -24 -9 5 0 -21 -20 -24 -9 -8 2 -11
## 4 1883 -32 -41 -17 -23 -24 -11 -7 -12 -18 -11 -19 -17 -19 -20 -32 -22
## 5 1884 -17 -11 -33 -35 -31 -37 -33 -25 -22 -22 -30 -28 -27 -26 -15 -33
## 6 1885 -64 -29 -23 -44 -41 -50 -28 -27 -19 -19 -22 -5 -31 -33 -41 -36
## JJA SON
## 1 -19 -16
## 2 -11 -17
## 3 -9 -14
## 4 -10 -16
## 5 -32 -25
## 6 -35 -20
tail(data)
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J.D D.N
## 131 2010 73 79 92 87 75 64 61 65 61 71 79 49 71 73
## 132 2011 50 51 64 66 53 59 74 73 56 67 56 53 60 60
## 133 2012 45 49 57 69 76 62 56 64 75 79 74 52 63 63
## 134 2013 67 57 65 54 61 65 59 66 77 70 81 67 66 64
## 135 2014 74 50 77 78 86 66 58 82 90 86 68 79 75 74
## 136 2015 82 88 90 74 76 80 **** **** **** **** **** **** **** ***
## DJF MAM JJA SON
## 131 72 85 63 71
## 132 50 61 69 60
## 133 49 67 61 76
## 134 58 60 63 76
## 135 64 81 69 81
## 136 83 80 **** ****
This shows that there is an increase in mean global temperatures over the period of 1880 - 2015.
Parse the data by meteorological seasons:
data <- subset(data, select=-(2:14))
Reshape wide -> long & clean data:
longdata <- reshape(data, idvar = "Year",
varying = list(c("DJF", "MAM", "JJA", "SON")),
v.names = "Deviation", times = c("DJF", "MAM", "JJA", "SON"),
timevar="Season", direction = "long")
## Warning in `[<-.factor`(`*tmp*`, ri, value = c(-19L, -1L, -11L, -22L,
## -33L, : invalid factor level, NA generated
longdata$Deviation[longdata$Deviation=="***"] <- NA
longdata$Deviation[longdata$Deviation=="****"] <- NA
longdata$Deviation <- as.numeric(as.character(longdata$Deviation))
longdata.DN <- reshape(data, idvar = "Year", varying = "D.N", v.names = "Deviation",
times = "Year", timevar="M.Year", direction = "long")
longdata.DN <- subset(longdata.DN, select=-(2:5))
longdata.DN$Deviation[longdata.DN$Deviation=="***"] <- NA
longdata.DN$Deviation <- as.numeric(as.character(longdata.DN$Deviation))
Plot the seasons by year:
p1 <- ggplot(data=longdata, aes(x=Year, y=Deviation, colour=Season)) + geom_line() +
scale_x_continuous(breaks = seq(1880, 2015, 20)) +
ggtitle("Deviation by Meteorological Season") +
geom_smooth(colour='blue')
p1
## Warning: Removed 57 rows containing non-finite values (stat_smooth).
## Warning: Removed 6 rows containing missing values (geom_path).
p2 <- ggplot(data=longdata.DN, aes(x=Year, y=Deviation, colour = M.Year)) + geom_area() +
scale_x_continuous(breaks = seq(1880, 2015, 20)) +
ggtitle("Deviation by Meteorological Year") +
scale_y_continuous(breaks = seq(-50, 75, 50))
p2
## Warning: Removed 2 rows containing missing values (position_stack).