library(data.table)
## Warning: package 'data.table' was built under R version 3.3.2
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.3.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(abind)
library(chron)
## Warning: package 'chron' was built under R version 3.3.2
library(xts)
## Warning: package 'xts' was built under R version 3.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## The following objects are masked from 'package:data.table':
## 
##     first, last
historical_climate_Phoenix <- read.csv("KPHX_Phoenix/Phoenix_1976-2017_alldata.csv")
View(head(historical_climate_Phoenix))
historical_climate_Phoenix <- historical_climate_Phoenix[,2:36]
View(head(historical_climate_Phoenix))
KPHX <- historical_climate_Phoenix[, c(2:8, 10:11, 13:14, 29, 31, 33)]
colnames(KPHX) <- c("USAF", "WBAN", "Year", "Month", "Day", "Hour", "Minute", "Latitude (ang. degrees)", "Longitude (ang. degrees)", "Elevation (m)", "Call Letters", "Air Temperature (C)", "Dew Point Temperature (C)", "SLP (mb)")
KPHX <- unite(KPHX, "Date", "Day", "Month", "Year", sep = "/", remove = TRUE)
View(KPHX)
KPHX$Minute <- formatC(KPHX$Minute, width = 2, format = "d", flag = "0")
KPHX$Hour <- formatC(KPHX$Hour, width = 2, format = "d", flag  = "0")
KPHX <- unite(KPHX, "Time", "Hour", "Minute", sep = ":", remove = FALSE)
View(head(KPHX))
class(KPHX$Date)
## [1] "character"
KPHX$Date <- as.Date(KPHX$Date, "%d/%m/%Y")
class(KPHX$Date)
## [1] "Date"
KPHX <- unite(KPHX, "DateTime", "Date", "Time", sep = " ", remove = FALSE)
View(head(KPHX))

Hmmm. It can’t be 89 Celsius in Phoenix in 1976… right? Right. There is more formatting that needs to be done. Now to convert! We also need to convert the lat/lon values too. Conversions are based on the USAF WBAN metadata (ncdc.noaa.gov).

KPHX$`Latitude (ang. degrees)` <- 0.001*(KPHX$`Latitude (ang. degrees)`)
KPHX$`Longitude (ang. degrees)` <- 0.001*(KPHX$`Longitude (ang. degrees)`)
KPHX$`Air Temperature (C)` <- 0.1*(KPHX$`Air Temperature (C)`)
KPHX$`Dew Point Temperature (C)` <- 0.1*(KPHX$`Dew Point Temperature (C)`)
KPHX$`SLP (mb)` <- 0.1*(KPHX$`SLP (mb)`)
class(KPHX)
## [1] "data.frame"
range(KPHX$`Air Temperature (C)`)
## [1]  -3.3 999.9

Uh ok, what is 999.9 C??? Commonly used as an error signal. So, we need to remove these so that when we plot or do any statistics, we aren’t being skewed by these unreasonably high values, because really, the data points are simply not available due to errors, not that they are actually this high.

KPHX$`Air Temperature (C)`[KPHX$`Air Temperature (C)` == "999.9"] <- NA
KPHX$`Dew Point Temperature (C)`[KPHX$`Dew Point Temperature (C)` == "999.9"] <- NA
KPHX$`SLP (mb)`[KPHX$`SLP (mb)` == "9999.9"] <- NA
KPHX$`Air Temperature (C)` <- as.numeric(KPHX$`Air Temperature (C)`)
KPHX$`Dew Point Temperature (C)` <- as.numeric(KPHX$`Dew Point Temperature (C)`)
KPHX$`SLP (mb)` <- as.numeric(KPHX$`SLP (mb)`)

Now they have “NA” in place of the 999s and the data are numeric objects. If we plot a histogram of the data, we will see that the NA has removed the error data points from consideration, but maintains the symmetry needed for the object to function as a data frame.

hist(KPHX$`Air Temperature (C)`)

OK, our data is clean! Let’s save this data so that next time we use it (if we were to exit out), we could get back into the data and not have to start over with the whole process again.

write.csv(KPHX, file = "KPHX_Phoenix/Phoenix_1976-2017.csv")

Plots and Statistics

Moving on to plotting

Now, KPHX is hourly data. It would be nice if we could look at this from a different perspective.

daily.max.min <- read.csv("KPHX_Phoenix/dailymaxmin_1933-2017.csv")

Check the data to see how it is formatted.

View(head(daily.max.min))

So this is in day/month/year formatting, just like how we formatted the KPHX data! While it is technically a form of stats, it is also a plot, let’s compare the two histograms of our cleaned data and the new one we just brought in.

par(mfrow = c(1,2))
hist(daily.max.min$TMAX)
hist(KPHX$`Air Temperature (C)`)

dev.off()
## null device 
##           1

Kinda similar.

Let’s plot!

plot(daily.max.min$DATE, daily.max.min$TMAX, xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")), 
    main = "Maximum Daily Temperature at Sky Harbor International Airport")

But the dates seem kind of random… can we change that?

class(daily.max.min$DATE)
## [1] "factor"
daily.max.min$DATE <- as.Date(daily.max.min$DATE, "%d/%m/%Y")
class(daily.max.min$DATE)
## [1] "Date"
plot(daily.max.min$DATE, daily.max.min$TMAX, xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")), 
     main = "Maximum Daily Temperature at Sky Harbor International Airport")

Now the x axis is more consistent in its labels. You can manually create labels, but it is a pain and I would not suggest it unless necessary. okay, but this is kind of dull, plus the points get confusing in the middle.

plot(daily.max.min$DATE, daily.max.min$TMAX, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Maximum Temperature at Sky Harbor International Airport", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")

Let’s throw in some analysis, because it’s nice to look at, but what does it tell us? How do we glean data from it? A simple linear regression should be able to tell us if Sky Harbor has seen increasing temperatures over the past 84 years.

phx.warming.daytime.lm <- lm(daily.max.min$TMAX~daily.max.min$DATE)
plot(daily.max.min$DATE, daily.max.min$TMAX, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Maximum Temperature at Sky Harbor International Airport", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")
abline(phx.warming.daytime.lm, col = "dark red")

But that is only 1-2 degrees C. I thought we would have a much larger temperature increase… We do! BUT, urbanization retains more heat at night.

phx.warming.night.lm <- lm(daily.max.min$TMIN~daily.max.min$DATE)
plot(daily.max.min$DATE, daily.max.min$TMIN, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Minimum Temperature at Sky Harbor International Airport", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")
abline(phx.warming.night.lm, col = "dark red")

Whoa, big increase in the slope. Let’s look at them side by side, like with the histograms.

par(mfrow = c(1,2))
plot(daily.max.min$DATE, daily.max.min$TMAX, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Max Temp at Sky Harbor", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")
abline(phx.warming.daytime.lm, col = "dark red")
plot(daily.max.min$DATE, daily.max.min$TMIN, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Min Temp at Sky Harbor", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")
abline(phx.warming.night.lm, col = "dark red")

dev.off()
## null device 
##           1

We can visually see the difference, how do we quantify that?

phx.warming.daytime.lm
## 
## Call:
## lm(formula = daily.max.min$TMAX ~ daily.max.min$DATE)
## 
## Coefficients:
##        (Intercept)  daily.max.min$DATE  
##          2.994e+01           5.976e-05
phx.warming.night.lm
## 
## Call:
## lm(formula = daily.max.min$TMIN ~ daily.max.min$DATE)
## 
## Coefficients:
##        (Intercept)  daily.max.min$DATE  
##          1.472e+01           2.513e-04
slopes <- c(phx.warming.daytime.lm$coefficients[2], phx.warming.night.lm$coefficients[2])
View(slopes)
slopes[2]-slopes[1]
## daily.max.min$DATE 
##       0.0001915092
(slopes[2]-slopes[1])*365*80 
## daily.max.min$DATE 
##           5.592069

That is the temperature change over time between the two slopes, so the nights have warmed about 5.5 degrees MORE than the days.

Okay, so what about the data we cleaned. Let’s look at the data against something besides time.

plot(KPHX$`Dew Point Temperature (C)`, KPHX$`Air Temperature (C)`, ylab = expression(paste("Air Temperature [",degree,"C]"),
    xlab = expression(paste("Dew Point Temperature [",degree,"C]"))))

Back to the time data

plot(KPHX$Date, KPHX$`Air Temperature (C)`, ylab = expression(paste("Air Temperature [",degree,"C]")), xlab = "Date", 
    main = "Hourly Air Temperature: 1976-2017")

Eeesh, ugly. Hard to see much in there. Let’s use statistics to break this up and analyse.

median.airtemp <- aggregate(`Air Temperature (C)`~Date, data = KPHX, FUN = median)
colnames(median.airtemp) <- c("Date", "Median Air Temperature")
View(median.airtemp)

So aggregate changes the format of your dates automatically. Thankfully, not into Excel mdy format.

max.airtemp <- aggregate(`Air Temperature (C)`~Date, data = KPHX, FUN = max)
colnames(max.airtemp) <- c("Date", "Maximum Air Temperature")
min.airtemp <- aggregate(`Air Temperature (C)`~Date, data = KPHX, FUN = min)
colnames(min.airtemp) <- c("Date", "Minimum Air Temperature")

It would be nice to see them all together. But we don’t need the date in there three times.

KPHX.airtemp.stats <- merge.data.frame(median.airtemp, max.airtemp, by = "Date")

Okay, that is max and median, but what about min? We can do it the same way, or we can simply append another column onto the existing dataframe.

KPHX.airtemp.stats$`Minimum Air Temperature` <- min.airtemp$`Minimum Air Temperature`
View(KPHX.airtemp.stats)

So now we can look at linear regression for each!

median.lm <- lm(KPHX.airtemp.stats$`Median Air Temperature`~KPHX.airtemp.stats$Date)
max.lm <- lm(KPHX.airtemp.stats$`Maximum Air Temperature`~KPHX.airtemp.stats$Date)
min.lm <- lm(KPHX.airtemp.stats$`Minimum Air Temperature`~KPHX.airtemp.stats$Date)
plot(KPHX.airtemp.stats$Date, KPHX.airtemp.stats$`Median Air Temperature`, col = "red", ylab = expression(paste("Air Temperature [",degree,"C]")),
     xlab = "Date", type = "l", main = "Median Air Temperature: 1976-2017", col.lab = "navy", col.main = "navy")
abline(median.lm, col = "brown")
abline(max.lm, col = "dark red")
abline(min.lm, col = "dark blue")

Okay, now we have seen how to export tables, but what about graphs and images? First, decide the type of file you want: let’s do .png since it is flexible.

png(filename = "png_files/daily.max.1933-2017.png", width = 1080, height = 720)
plot(daily.max.min$DATE, daily.max.min$TMAX, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Maximum Temperature at Sky Harbor International Airport", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")
abline(phx.warming.daytime.lm, col = "dark red")
dev.off()
## quartz_off_screen 
##                 2
png(filename = "png_files/daily.min.1933-2017.png", width = 1080, height = 720)
plot(daily.max.min$DATE, daily.max.min$TMIN, col = "orange", xlab = "Time", ylab = expression(paste("Air Temperature [", degree, "C]")),
     main = "Minimum Temperature at Sky Harbor International Airport", col.lab = "navy", col.main = "navy", pch = 21, cex = 2.5, type = "l")
abline(phx.warming.night.lm, col = "dark red")
dev.off()
## quartz_off_screen 
##                 2

Extra stuff

ANOVA, t-test, PCA, jack-knife, bootstrap are all at your disposal anova() t.test() prcomp() boot(), library(boot) jackknife(), library(bootstrap)

Anytime you are unsure how to use a function, ?function in the console will take you to its help page. Also, Google is a wealth of resources. SOMEONE has had the same general question before and they often share their thoughts online since R is open-sourced.

For spatial analysis

Spatial(): library(sp), library(rgdal) and many more.

FTP can be done through the terminal in R Studio. Flip over there real quick and I can show you how download climate data though any ftp access data source would work the same.

cd “~/Google Drive/Presentations/R_Workshop/Data” cd “KPHX_Phoenix” ftp open ftp.ncdc.noaa.gov anonymous your email address cd pub/data/noaa/1980 KPHX: 722780-23183 cd ../2017 get 722780-23183-2017.gz bye gunzip 722780-23183-2017.gz

After this you will need to read in the data using fixed width filing systems read.fwf() for example. Information about the formatting should be in the metadata for the data repository.