Introduction

This data set deals with the monthly, seasonal, and annual means from the Northern Hemisphere, from 1880-present. The data records the deviations from the corresponding 1951-1980 means. This data shows that global temperature continued to rise rapidly in the 21st century, with new record heights being reached in every decade. The data is available at NASA’s Goddard Institute for Space Studies.

When comparing seasonal temperatures, it is convenient to use “meteorological seasons” based on temperature and defined as groupings of whole months. Thus, Dec-Jan-Feb (DJF) is the Northern Hemisphere meteorological winter, Mar-Apr-May (MAM) is N.H. meteorological spring, Jun-Jul-Aug (JJA) is N.H. meteorological summer and Sep-Oct-Nov (SON) is N.H. meteorological autumn. String these four seasons together and you have the meteorological year that begins on Dec. 1 and ends on Nov. 30 (D.N).

This analysis uses two plots: the data from each meteorological season is plotted on the same line plot, and the data from the meteorological year is plotted on an area plot. These visualizations affirm the original claim – global temperature continued to rise rapidly in the 21st century, with new record heights being reached in every decade. It also shows that data from the Northern Hemisphere in the meteorological winter is the most turbulent, and there was a large spike in deviation in the mid-1940’s.

View interactive plotly graphs: Deviation by Meteorological Season Deviation by Meteorological Year

Exploratory Data Analysis and Cleaning

Load Required Packages:

library("ggplot2")
library("knitr")
library("devtools")

Set working directory and read CSV:

setwd("/Users/brianbartling/Documents/Visualization/Programming Assignment 1 Data New")
data <- read.csv("ExcelFormattedGISTEMPDataCSV.csv")

Initial Analysis:

head(data)
##   Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J.D D.N  DJF MAM
## 1 1880 -29 -19 -17 -27 -13 -28 -22  -6 -16 -15 -18 -20 -19 *** **** -19
## 2 1881  -8 -13   2  -2  -3 -27  -5  -1  -8 -18 -25 -14 -10 -11  -13  -1
## 3 1882  10  10   2 -19 -17 -24  -9   5   0 -21 -20 -24  -9  -8    2 -11
## 4 1883 -32 -41 -17 -23 -24 -11  -7 -12 -18 -11 -19 -17 -19 -20  -32 -22
## 5 1884 -17 -11 -33 -35 -31 -37 -33 -25 -22 -22 -30 -28 -27 -26  -15 -33
## 6 1885 -64 -29 -23 -44 -41 -50 -28 -27 -19 -19 -22  -5 -31 -33  -41 -36
##   JJA SON
## 1 -19 -16
## 2 -11 -17
## 3  -9 -14
## 4 -10 -16
## 5 -32 -25
## 6 -35 -20
tail(data)
##     Year Jan Feb Mar Apr May Jun  Jul  Aug  Sep  Oct  Nov  Dec  J.D D.N
## 131 2010  73  79  92  87  75  64   61   65   61   71   79   49   71  73
## 132 2011  50  51  64  66  53  59   74   73   56   67   56   53   60  60
## 133 2012  45  49  57  69  76  62   56   64   75   79   74   52   63  63
## 134 2013  67  57  65  54  61  65   59   66   77   70   81   67   66  64
## 135 2014  74  50  77  78  86  66   58   82   90   86   68   79   75  74
## 136 2015  82  88  90  74  76  80 **** **** **** **** **** **** **** ***
##     DJF MAM  JJA  SON
## 131  72  85   63   71
## 132  50  61   69   60
## 133  49  67   61   76
## 134  58  60   63   76
## 135  64  81   69   81
## 136  83  80 **** ****

This shows that there is an increase in mean global temperatures over the period of 1880 - 2015.

Parse the data by meteorological seasons:

data <- subset(data, select=-(2:14))

Reshape wide -> long & clean data:

longdata <- reshape(data, idvar = "Year", 
                  varying = list(c("DJF", "MAM", "JJA", "SON")), 
                  v.names = "Deviation", times = c("DJF", "MAM", "JJA", "SON"), 
                  timevar="Season", direction = "long")
## Warning in `[<-.factor`(`*tmp*`, ri, value = c(-19L, -1L, -11L, -22L,
## -33L, : invalid factor level, NA generated
longdata$Deviation[longdata$Deviation=="***"] <- NA 
longdata$Deviation[longdata$Deviation=="****"] <- NA
longdata$Deviation <- as.numeric(as.character(longdata$Deviation))


longdata.DN <- reshape(data, idvar = "Year", varying = "D.N", v.names = "Deviation",
                       times = "Year", timevar="M.Year", direction = "long")

longdata.DN <- subset(longdata.DN, select=-(2:5))
longdata.DN$Deviation[longdata.DN$Deviation=="***"] <- NA 
longdata.DN$Deviation <- as.numeric(as.character(longdata.DN$Deviation))

Plot the seasons by year:

p1 <- ggplot(data=longdata, aes(x=Year, y=Deviation, colour=Season)) + geom_line() + 
        scale_x_continuous(breaks = seq(1880, 2015, 20)) + 
        ggtitle("Deviation by Meteorological Season") + 
        geom_smooth(colour='blue')
p1
## Warning: Removed 57 rows containing non-finite values (stat_smooth).
## Warning: Removed 6 rows containing missing values (geom_path).

Deviation by Meteorological Season
p2 <- ggplot(data=longdata.DN, aes(x=Year, y=Deviation, colour = M.Year)) + geom_area() + 
        scale_x_continuous(breaks = seq(1880, 2015, 20)) + 
        ggtitle("Deviation by Meteorological Year") + 
        scale_y_continuous(breaks = seq(-50, 75, 50))
p2
## Warning: Removed 2 rows containing missing values (position_stack).

Deviation by Meteorological Year