Question 1

Explore daily meteorological and clinical/emergency visits [veteran affair clinic facilities] data of 24 cities provided to you. Download/acquire population size data (by yourself) of these 24 cities, collocate these data with the asthma data and compute daily VA clinic visit rates for these cities. Develop exploratory statistical analyses of citywide asthma exacerbation clinic visits, include tables and graphics along with the description of asthma execration related clinic visits among the US veterans (4 points).

Importing and Merging Data

setwd("/Users/katyhaller/Library/Mobile Documents/com~apple~CloudDocs/R Directory/EPH 727/EPH 727/Session_12_Lab_6")
city_info <- read.table("city_info.csv", sep=",", header=T)
city_info[city_info == "Suptropical"] <- "Subtropical"
#info from 24 cities containing zone (region of US), city name, lat, long, and city ID
met_data <- read.table("f_mcity_daily_met_est.csv", sep=",", header=T)
#meteorological data for the 24 cities
met_data$v_d <- as.Date(met_data$rdate, format="%b/%d/%y")
# summary(met_data)

asthma<- read.table(file="asthma_va_test.csv", sep=",", header=T)
asthma$visit_day <- as.Date(asthma$visitdatetime,format="%Y-%m-%d %H:%M:%S")
#explore these data
table<-table(asthma$city_id, asthma$visit_day)
asthma_daily <- as.data.frame(table)
names(asthma_daily) <- c("city_id","date","cases")
asthma_daily$date <- as.Date(asthma_daily$date, "%Y-%m-%d")


#data from 1/1/2009 - 12/31/2014 in 24 cities
int_met_data <- merge(city_info, met_data, by.x="city_id", by.y="city_id")
int_met_data <- merge(city_info, met_data, by.x=c("city_id", "year"), by.y=c("city_id", "yyyy"))
all_data <- merge(int_met_data, asthma_daily, by.x=c("city_id","v_d"), by.y=c("city_id", "date"))

Population Data from the Census Bureau: 2009 | 2010-2014 * 2010-2014 data for: Jefferson City | Augusta

Daily Case Rates

#daily total cases
ggplot(data = all_data, aes(v_d,cases)) +
  geom_bar(stat = "identity", fill = "darkorchid4") +
  xlab("Date") + ylab("Asthma Cases (count)") +
  ggtitle("Daily Asthma Cases Count at VA Hospitals 2009-2014 across 24 Cities") +
  theme_bw()

#aggregating cases and popuation means by city
all_data$population <- as.numeric(all_data$population)
agg_all <- aggregate(cbind(all_data$cases, all_data$population), by=list(all_data$city), FUN=mean, data=all_data, na.action=na.omit)
names(agg_all) <-c("City","mean_daily", "pop")

#creating rate variable for aggregate
agg_all <- agg_all %>%
  mutate(
    rate = mean_daily / pop * 100000
  )
#ordering it nicely for the bar plot
agg_all <- agg_all %>% arrange(desc(rate))

#bar graph of daily rate per 100,000 people over the 5 years
par(mar=c(11,4,4,4))
barplot(height=agg_all$rate, names=agg_all$City, col="#69b3a2", las=2, main = "Daily Asthma Exacerbation Case Rates 
        per 100,000 people at VA Hospitals, 2009-2014")

Exploratory Analysis

all_data <- all_data %>%
  mutate(
    rate = cases / population * 100000
  )

# Compute the analysis of variance
res.aov <- aov(rate ~ zone, data = all_data)
# Summary of the analysis
#summary(res.aov)

zones<-group_by(all_data, zone) %>%
  summarise(
    count = n(),
    mean = mean(rate, na.rm = TRUE),
    sd = sd(rate, na.rm = TRUE)
  )
## `summarise()` ungrouping output (override with `.groups` argument)
p<-ggboxplot(all_data, x = "zone", y = "rate", 
          color = "zone",
          ylab = "Case Rate", xlab = "Zone",
          title="Case Rates per Capita by Zone")
p+ rotate_x_text()

Daily Asthma Exacerbation Case Rates per Capita by Zone

zone count mean sd
Desert SW 4308 0.142 0.447
Lower MW 4308 0.157 0.486
NE 4308 0.201 0.924
Pacific NW 4308 0.067 0.160
Pacific SW 4308 0.040 0.078
SE 4308 0.044 0.111
Subtropical 2872 0.068 0.144
Suptropical 1436 0.093 0.228
Upper Midwest 4308 0.078 0.150

At least one zone has a different mean daily asthma exacerbation cases per capita rate.

ANOVA Table

Df Sum Sq Mean Sq F value Pr(>F)
zone 8 105 13.08 75.45 <2e-16
Residuals 34455 5973 0.173

Question 2

Collocate asthma and meteorological data by week and examine association between asthma and meteorological conditions, as well as the associations between clinic visits (due to asthma exacerbation) and time lagged meteorological conditions. Explain whether there are seasonal and citywide variations in these associations. Include your results in the form of tables and graphics and their descriptions as parts of your assignment (4 points)

Aggregating by Week

all_data$week<-week(all_data$v_d)

#aggregating cases and popuation means by city
agg_week <- aggregate(cbind(
  all_data$cases, all_data$population, all_data$temp, all_data$slp, all_data$rh), by=list(all_data$city, all_data$week, all_data$zone), FUN=mean, data=all_data, na.action=na.omit)
names(agg_week) <-c("City","Week", "Zone","Rate","pop", "temp", "slp", "rh")

#ordering it nicely for the bar plot
agg_week <- agg_week %>% arrange(desc(Rate))
ggplot(data = agg_week, aes(Week,Rate)) +
  geom_bar(stat = "identity", fill = "darkorchid4") +
  xlab("Week") + ylab("Asthma Cases (count)") +
  ggtitle("Weekly Asthma Case Rate/100,00 People at VA Hospitals 2009-2014") +
  theme_bw()

agg_se <- filter(agg_week, Zone == "SE")
agg_ne <- filter(agg_week, Zone == "NE")
agg_pacnw <- filter(agg_week, Zone == "Pacific NW")

ggplot(data = agg_pacnw, aes(Week,Rate)) +
  geom_bar(stat = "identity", fill = "darkorchid4") +
  xlab("Week") + ylab("Asthma Cases (count)") +
  ggtitle("Weekly Asthma Case Rate/100,00 People at VA Hospitals in PNW") +
  theme_bw()

lab5_ts <- as.ts(agg_pacnw[,c(4,6,7,8)],frequency = 1)

plot(lab5_ts)

lab5_ts <- na.omit(lab5_ts)
acf(lab5_ts)

#just focus on correlation on temperature alone
#acf(lab5_ts, plot=F)
pacf(agg_pacnw$rh, lag=12)

#pacf(agg_pacnw$temp, lag=12, plot=F)

It seems that there is a positive relationship between asthma case rate per 100,000 people and weekly temperature & relative humidity (and a negative one between SLP and rate). The greatest relationship seems to be between rate and relative humidity, with the most significant impact up to 2 weeks after.

Question 3

Write a critique on the methodology used for health and meteorological data collocation, focusing on exposure uncertainty (2 points).

Because of mismatch, misalignment, and missingness when collocating health and meteorological data, there is often significant exposure uncertainty. For example, PM10 is often only monitored at specific EPA sites that are sparsely located around the US. When looking to assess the impact of PM10 on respiratory outcomes, one often must assume that the measurement for PM10 at one location is representative of the PM10 exposure for everyone in a whole community (like a city or county). However, we know that the exposure to air pollutants can very greatly over close geographic spaces. As a result, we often find differing relationships between PM10 exposure and respiratory outcomes, both in significance and extent, because we fill in missingness by aggregating from one or few measurement sites. This exposure uncertainty represents a challenge for researchers who must find a way to work with the data that is available while not making erroneous assumptions due to the data’s missingness.