Preparing the Data

setwd("/Users/katyhaller/Library/Mobile Documents/com~apple~CloudDocs/R Directory/EPH 727/EPH 727/Session_7_Lab_2")

library(tidyverse)
na_values <- c("missing", "999.99","","-9999","*", "T")
library(readr)
fivecities <- read_csv("FiveCityDataSet.csv", 
    col_types = cols(DATE = col_datetime(format = "%m/%d/%Y %H:%M"), 
        HOURLYPrecip = col_number()),
    na = na_values
    )

mcity <- fivecities
#define columns you want to extract
colset <- c("STATION","STATION_NAME","LATITUDE","LONGITUDE","DATE","HOURLYVISIBILITY","HOURLYDRYBULBTEMPF","HOURLYWETBULBTEMPF","HOURLYRelativeHumidity","HOURLYPrecip")
lab2_data <- mcity[colset]

#checking out lab2_data...
#dim(lab2_data) 863617 x 10
#any(is.na(lab2_data[1:10])) #True
#summary(lab2_data) #looks good -- the variables have the correct variable types
#list(lab2_data$DATE[1:5])
#names(lab2_data)

Question A

Describe bases for defining time in the aggregated data sets, e.g. by day or by month or by year (2 points).

In this lab, date is aggregated by date (month & day) and month. The daily data provides finer resolution and perhaps outliers are more obvious when the extremes of one day can be captured. Hourly data would permit this even moreso. Monthly data provides the opportunity to look at more general/seasonal trends.

Daily Data

lab2_data$new_date <- as.Date(as.POSIXct(strptime(lab2_data$DATE, format="%Y-%m-%d %H:%M:%S", tz="EST")))
library(tidyverse)
attach(lab2_data)
agg_met_data <- aggregate(cbind(LATITUDE,LONGITUDE,HOURLYDRYBULBTEMPF), by=list(STATION_NAME,new_date), FUN=mean, data=lab2_data, na.action=na.omit)

detach(lab2_data)


#list(agg_met_data[1:5,5])
#mean(agg_met_data[,5], na.rm=T) #68.5512 = mean temp_dry_f
#names(agg_met_data)
#names(agg_met_data);list(agg_met_data[1:10,])
names(agg_met_data) <- c("station","r_date","lat","long","temp_dry_f")
#list(agg_met_data[1:10,])
attach(agg_met_data)

Plotting Temperature by Day/Month

Daily Aggregates

library(lattice)
temp <- subset(agg_met_data, station=="HOMESTEAD AFB FL US")
xyplot(
  temp$temp_dry_f~temp$r_date, 
  auto.key=list(station),
  lines=T,points=F,
  main="Daily Temperature of 12 US Airports, 2009-14",
  xlab="Year",
  ylab="Temperature (°F)")

xyplot(temp_dry_f~r_date, 
       group=station, 
       auto.key=list(station),
       lines=T,points=T,
       main="Daily Temperature of 12 US Airports, 2009-14",
       xlab="Year",
       ylab="Temperature (°F)")

Question B

Based on the skills you have acquired, plot monthly trend of ambient air temperature in the US across all stations for which data are provided or you used the data that you acquired on your own (2 points).

Monthly Aggregates

#extracting month 
daily_met <- agg_met_data
daily_met$month <- as.integer(substr(agg_met_data$r_date,6,7))
#extracting year
daily_met$year = as.integer(substr(daily_met$r_date,0,4))
#daily_met$c_month <- ((daily_met$year - 2009) * 12) + daily_met$month #what does this do?
attach(daily_met)

#extracting month 
daily_met$month = as.integer(substr(daily_met$r_date,6,7))
#extracting year
daily_met$year = as.integer(substr(daily_met$r_date,0,4))
daily_met$c_month <- ((daily_met$year - 2009) * 12) + daily_met$month


detach(daily_met)
attach(daily_met)
data_mnt <- aggregate(cbind(lat,long,temp_dry_f), by=list(station,month), FUN=mean, na.rm=T, data=daily_met, na.action=NULL)
names(data_mnt) <- c("st","month","lat","long","temp_dry_f")
detach(daily_met)
#dim(data_mnt) #792 x 5

data_mnt_sort <- data_mnt[order(data_mnt$st, data_mnt$month),]
monthnona <- na.omit(data_mnt_sort) #dataset without missings
#data_mnt_sort <- data_mnt[order(data_mnt$st, data_mnt$month),]
#help(order)

library(hrbrthemes)  
monthnona %>%
  ggplot( aes(x=month, y=temp_dry_f, group=st, color=st)) +
    geom_line() +
    ggtitle("Average Monthly Temperature by Airport") +
    theme_ipsum() +
    scale_x_continuous(name="Month", breaks = seq(1,12,2)) +
    scale_y_continuous(name="Temperature °F") +
    geom_point(aes(color=st))

attach(daily_met)

Question C

Based on the graph, identify which of the years have witnessed mildest temperature between 2009 and 2014 and which of the selected stations experienced highest seasonal variations in air temperature and its associated implications for changes in environmental conditions and health impacts (3 points).

The area that seems to have the mildest temperature changes throughout the year (from 2009-2014) seems to be Los Angeles, whose temperature seems to be lowest around March (~53 degrees F) and highest around August (~65 degrees F). The area with greatest variation appears to be Dallas, which is lowest around January (~43 degrees F) and highest around August (~85 degrees F). It seems that the infrastructure in Dallas, then, is likely better equipped to deal with extreme heat days because they have very high temperatures relative to the rest of the year during the summer. LA, on the other hand, is likely not well prepared for either extreme heat or extreme cold. We see the effects of this when the heat becomes overwhelming in LA, their systems are overwhelmed and rolling blackouts occur. This could impact human health significantly if people are not able to access “cool shelters” and excess mortality may be expected on extreme weather days.

Question D

Explain, whether annual variations in the ambient air temperature is higher than the variation in the ambient temperature across different stations in the US. Based on this comparison explain which of them should be given higher weight in examining the impact of temperature on human health (3 points).

Annual variations in temperature might look quite extreme across the whole US. Among these 12 areas, we see a lowest average monthly temperature in New York (~22 degrees F, January) and a lowest monthly temperature in Dallas (~85 degrees, August). Such extremes in temperature may erroneously lead one to conclude that the infrastructures in the US are built to endure extreme high and low heats. In reality, colder places (New York) are built for colder winters and mild summers. Warmer places (Denver) are prepared for high heat year round, but rarely drop below 65 degress from month to month. These places (with lesser variation in temperature) are ill-equipped to manage extreme weather events outside of their norm. The lack of resiliency likely puts human health at risk in the face of record high and low temperatures. As such, the variation that occurs year-long within each individual location should be give more weight than the overall variation across the US.

(Yearly Data)

#aggregating data by year and stations
data_yy <- aggregate(cbind(lat,long,temp_dry_f), by=list(station,year), FUN=mean, nr.rm=T, data=agg_met_data, na.action=NULL)
detach(daily_met)
#dim(data_yy) #72 x 5
attach(daily_met)
#names(data_yy) <- c("st","yyyy","lat","long","temp_dry_f")