Question 1

Use data for any city in the US except for Miami for at least four years, and explore temporal lags and monthly lags of the selected meteorological conditions: temperature, relative humidity and atmospheric sea level pressure. Based on this exploration, identify the important daily and monthly lags of these selected meteorological conditions. Include your graphs, results and description of these lags and their relevance for health effect research (4 points).

#importing all US data
lab5_tm <- read.table(file='pittsburgh.csv', sep=',', header=T)
#summary(lab5_tm) #date is split into 3 columns, yyyy mm day and need to be added together.
#also, the data from NOAA has hourly measurements, daily summaries, and monthly summaries.
#the daily and monthly summaries are missing RH data -- so i am going to drop those and aggregate the hourly data by day.

#subsetting only hourly measurements
lab5_tm <- lab5_tm %>% filter(type == "FM-15") 
#creating one date-coded variable and then dropping the three separate date columns
lab5_tm$date<-as.Date(with(lab5_tm,paste(yyyy,mm,day,sep="-")),"%Y-%m-%d")
lab5_tm <- subset(lab5_tm, select = - c(yyyy, day)) 

Data was downloaded from NOAA’s Climate Local Climatological Data source for Pittsburgh - at the Pittsburgh International Airport - for January 1, 2015 - December 31, 2020. This data has 85% coverage for the specified time frame and recorded on an hourly basis. For the purpose of this assignment, the data for relative humidity, sea leavel pressure, and temperature need to be aggregated by date.

lab5_tm$temp <- as.numeric(lab5_tm$temp)
lab5_tm$rh <- as.numeric(lab5_tm$rh)
lab5_tm$slp <- as.numeric(lab5_tm$slp)

lab5_tm <- na.omit(lab5_tm)

agg_lab5_tm <- aggregate(cbind(lab5_tm$temp,lab5_tm$rh,lab5_tm$slp), by=list(lab5_tm$station,lab5_tm$date), FUN=mean, na.rm=T, data=lab5_tm, na.action=NULL)

names(agg_lab5_tm) <- c("station","date","temp","rh","slp")
#summary(agg_lab5_tm) #no weird values!

#renaming the datasets
pittsburgh <- lab5_tm
lab5_tm <- agg_lab5_tm
lab5_tm <- mutate(lab5_tm, month=(month(lab5_tm$date)))
aggm_lab5_tm <- aggregate(cbind(lab5_tm$temp,lab5_tm$rh,lab5_tm$slp), by=list(lab5_tm$station,lab5_tm$month), FUN=mean, na.rm=T, data=lab5_tm, na.action=NULL)
names(aggm_lab5_tm) <- c("station","month","temp_m","rh_m","slp_m")
lab5_tm <- merge(lab5_tm,aggm_lab5_tm,by="month")
lab5_tm <- subset(lab5_tm, select = - c(station.y)) 
names(lab5_tm) <- c("month", "station","date","temp","rh","slp","temp_m","rh_m","slp_m")

Now that the data is aggregated by date, I can examine temporal lags and monthly lags of the selected meteorological conditions: temperature, relative humidity and atmospheric sea level pressure.

Temperature

#creating single-day lag
lab5_tm$temp_lag1 <- shift(lab5_tm$temp, n=1, fill=NA, type="lag")

#use complete set of observation meaning removing missing values
#cor(lab5_tm$temp_lag5, lab5_tm$temp, use='complete') #0.9217368

#cor.test(lab5_tm$temp_lag5, lab5_tm$temp, na.rm=T)

#autocorrlation without lag specification
acf(lab5_tm$temp, lag.max = 30)

acf(lab5_tm$temp, lag.max=7)

#acf(lab5_tm$temp, lag.max=7, plot=F)

The correlation of temperature with its one-day lag is 0.9217368, which means the daily temperature is highly correlated with the temperature of the day before. The 95% confidence interval for this correlation is (0.9217368, 0.9334563), and this value is statistically different from a correlation of 0 (t=115.9, df=2171, p-value < 2.2e-16).

Autocorrelations of Temperature by Days of Lag

0 1 2 3 4 5 6 7
1 0.86 0.795 0.766 0.749 0.737 0.738 0.738

Temperature is highly autocorrelated during a month - from one to thirty day lag, temperature is at least 60% correlated. This means that if we are examining the impact of heat or cold exposure on a human health outcome, the exposure is unlikley to be isolated to one day, because the temperature of one day is highly correlated with the temperature of the next 30 days.

#creating single-month lag
aggm_lab5_tm$temp_mlag1 <- shift(aggm_lab5_tm$temp_m, n=1, fill=NA, type="lag")

#use complete set of observation meaning removing missing values
#cor(aggm_lab5_tm$temp_mlag1, aggm_lab5_tm$temp_m, use='complete') #0.8451656

#cor.test(aggm_lab5_tm$temp_mlag1, aggm_lab5_tm$temp_m, na.rm=T)

#autocorrlation without lag specification
acf(aggm_lab5_tm$temp_m, lag.max = 12)

# acf(aggm_lab5_tm$temp_m, lag.max=12, plot=F)

The correlation of temperature with its one-month lag is 0.8451656, which means the monthly temperature is fairly correlated with the temperature of the month before. The 95% confidence interval for this correlation is (0.4975404, 0.9588896), and this value is statistically different from a correlation of 0 (t=4.7436, df=9, p-value = 0.001054).

Autocorrelations of Temperature by Months of Lag

0 1 2 3 4 5 6 7 8 9 10 11
1.000 0.721 0.305 -0.121 -0.415 -0.543 -0.498 -0.318 -0.084 0.12 0.193 0.141

Temperature is less autocorrelated between months. Within a year, we see substantial fluctuation in temperature autocorrelation. This makes sense for Pittsburgh which experiences hot summers but cold winters.

If we’re examining a health outcome, the exposure to heat/cold might be important over the duration of days, but perhaps not months.

All three measures - daily lag

#Use ts function to create a time variable
lab5_ts <- as.ts(lab5_tm[,4:6], start=decimal_date(as.Date("2015-01-01")), frequency = 1)

plot(lab5_ts)

lab5_ts <- na.omit(lab5_ts)
acf(lab5_ts)

#just focus on correlation on temperature alone
lab5_ts <- as.ts(lab5_tm$temp, start = decimal_date(as.Date("2009-01-01")), frequency = 1)
#acf(lab5_ts, plot=F)
pacf(lab5_tm$temp, lag=12)

#pacf(lab5_tm$temp, lag=12, plot=F)

Temperature seems to be unique in this way – there is not significant autocorrelation for relative humidity or sea level pressure lags. The only cross-correlational relationship of interest is the lag relationship between temperature and sea level pressure. We know chemically that pressure and temperature of gasses are inversely related, so this finding makes sense.

Question 2

Use the us_met_2009_2013.csv (provided as a part of this exercise) file, explore spatial autocorrelation in the selected meteorological conditions in the continental US [two libraries needed for this part of the assignment are: gstat and spdep]. Based on this exploration, identify the distance (geographic) threshold within which spatial autocorrelation exists (4 points). NOTE: Your distance threshold will be in decimal degrees.

Temperature

lab5_sp<- read_csv("~/Library/Mobile Documents/com~apple~CloudDocs/R Directory/EPH 727/EPH 727/Session_11_lab_5/us_met_2009_2013.csv")
#summary(lab5_sp)
lab5_sp<-na.omit(lab5_sp)

tmpvar <- variogram(data=lab5_sp, temp~1, loc=~va_lng+va_lat)
plot(tmpvar, col="red", pch="O", bg="brown", cex=1.25, 
     main="Temperature Semivariance Curve",
     sub="autocorrelation threshold=18.33°",
     xlab="distance in degrees",
     ylab="semivariance") 

Relative Humidity

rhvar <- variogram(data=lab5_sp, rh~1, loc=~va_lng+va_lat)
plot(rhvar, col="red", pch="O", bg="brown", cex=1.25, 
     main="Relative Humidity Semivariance Curve",
     sub="autocorrelation threshold=3.4149854°",
     xlab="distance in degrees",
     ylab="semivariance") 

Sea Level Pressure

slpvar <- variogram(data=lab5_sp, slp~1, loc=~va_lng+va_lat)
plot(slpvar, col="red", pch="O", bg="brown", cex=1.25, 
     main="Relative Humidity Semivariance Curve",
     sub="autocorrelation threshold=2.0758915°",
     xlab="distance in degrees",
     ylab="semivariance") 

Condition Distance (°) of autocorrelation threshold
Temperature 18.33
Relative Humidity 3.41
Sea Level Pressure 2.08

Question 3

Based on the lecture, class discussion, reading from the previous week and this laboratory exercise, provide a comparison of spatial and temporal lags of meteorological and environmental conditions in health research and whether temporal lags are more important than spatial lags and vice-versa. Use appropriate examples and/or graphic to support your answers (2 points).

I think that temperature lag is significant than relative humidity lag or sea level pressure lag. Records somewhat far apart in space and time can be used to gather insight on missing temperature data because temperature is highly autocorrelated across space and time. This correlation seems to be quite strong across time, with 84.5% correlation in temperature from one month to the next. For relative humidity and pressure, measurements must be must closer in time to a missing record (or event of interest) to provide any meaningful context because neither are highly correlated across space nor time.