Lab 2-Precipitation Analysis

# read in daily precip data from Pittsburgh from file.
# check your working directory and place the data there
setwd("C:/Users/ach150/OneDrive - University of Pittsburgh/lab_fall2024")
# to see documentation about what a function does and how to use it
# type a questions mark and the name of the function(), and look at the help 
# window in lower right hand corner

# and can load using
p <- read_csv("daily.csv")
## Rows: 7661 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): STATION, NAME, DATE, PRCP_ATTRIBUTES, SNOW_ATTRIBUTES, SNWD_ATTRIBUTES
## dbl (6): LATITUDE, LONGITUDE, ELEVATION, PRCP, SNOW, SNWD
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# OR insert the full filepath to wherever the daily.csv is located
# convert Date to date, R thought it was a "character"
p$DATE <- as.Date(p$DATE, format="%m/%d/%Y")


# Another method to do this would be using the mutate function and "pipes"
# A pipe is %>% and is a way of chaining functions together.
# For example, 
# new_data <- function3(function2(function1(data)))
# is the SAME as
# new_data <- data %>%
#   function1() %>%
#   function2() %>%
#   function3()

p <- p %>%
  mutate(DATE =as.Date(p$DATE, format="%m/%d/%Y"))

# First make a plot of all the data using base R
plot(p$DATE, p$PRCP, type="l")

# FINISH THIS CODE: now in ggplot
ggplot(data = p) +
  geom_line(aes(x= DATE, y= PRCP))

# with plotly, you can make an interactive graph. Once you make this one, play around with the 
# interactive graph in the plotting window below this R chunk.
ggplotly(ggplot(p) +
           geom_line(aes(x=DATE, y= PRCP)))
# Lets add a month, year, and day of year (DOY) colums using functions from the 
# lubridate package to make future calculations easier. We sometimes have to tell R
# which package a function comes from by adding package::function when there are 
# multiple R functions with the same name.

# FINISH THIS CODE: You will have to find the function that estimates DOY or year day
# perhaps on the internet
p <- p %>%
  mutate(year = lubridate::year(DATE),
         month = lubridate::month(DATE),
         doy = yday(DATE))
 
 
# Calculate the annual maximum, mean precipitation, and annual total precipitation
# the group_by function makes this easy by making "groups" out of a variable/column
# and then applying the same function to all using summarise function
# FINISH THIS CODE: What column do you have to group_by to get ANNUAL summary stats
p_annual <- p %>%
  group_by(year) %>%
  summarise(annual_mean = mean(PRCP), 
            annual_total = sum(PRCP),
            annual_max = max(PRCP))

# inspect the p_annual dataframe by clicking on the object in the environment
# does it look like what you expected? Notice the NAs, data is messy and looks like
# not every where had complete dataset. We could remove that by adding na.rm= TRUE
# to mean(PRCP, na.rm=TRUE), but we do not want incomplete datasets here

# first lets drop rows with pesky NA data
p_annual <- p_annual %>%
  filter(!is.na(annual_max))

QUESTION 1:

Why would have incomplete daily rainfall data be a problem for calculating statistics such total and max precip? And why is the mean not a good statistic here?

Answer:

If you have missing data, you cannot calculate the totals accurately. The mean is not a good statistic since there are zeroes and missing data.

# ANALYSIS 1: Calculate the return period and exceedence probability of the annual
# maximum rainfall from this dataset. Sort the data from highest to lowest
# value using the arrange() function.
# Then add columns of rank, return period (Tr), and probability using the mutate()function
# and equations from class. 
# FINISH THIS CODE in style of tidyverse OR re-code the same 
# analysis using whatever way of coding you prefer.

p_annual <- p_annual %>%
  arrange(desc(annual_max)) %>%
  mutate(rank = 1:nrow(p_annual)) %>%
  mutate(Tr = (17+1)/rank) %>%
  mutate(prob = 1/Tr)

# FINISH THIS CODE: MAKE FIGURE 1, plot Precipitation magnitude vs return period
ggplot(p_annual) +
  geom_line(aes(x = annual_max, y = Tr)      ) 

Question 2:

What is the rainiest month on average. Is it what you expected, why or why not?

ANSWER:

June, that is not what I expected, I would’ve thought it to be a little earlier like April or May as the Spring often seems very rainy.

#ANALYSIS 2: Calculating monthly climate normals. Using a similar approach above
# group by year and month...group_by(year, month)... to calculate the sum or total
# for each month in each year. THEN group_by month to calculate the mean monthly
# climate normal across all years.
# FINISH THIS CODE in style of tidyverse OR do the same 
# analysis using whatever way of coding you prefer.

p_month <- p %>%
  # lets first filter to only years with full dataset from the p_annual summary
  filter(year %in% p_annual$year) %>%
  group_by(year,month) %>%
  summarise(month_total = sum(PRCP)) %>%
  ungroup(year,month) %>%
  group_by(month) %>%
  summarise(month_mean = mean(month_total))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# FINISH THIS CODE: MAKE FIGURE 2, PLOT the mean monthly precipitation vs. month. Submit in report
ggplot(p_month) +
  geom_line(aes(x = month, y = month_mean))

QUESTION 3:

Is there a significant trend? If so, which way was it trending? Hypothesize about the trend or lack of trend and why or why not this data is appropriate for this statistical test.

ANSWER:

There is not a significant trend, most likely because there is not a large enough dataset.

# ANALYSIS 3: Trends in precip. Fit a linear regression to the annual
# precipitation calculated in p_annual. 

# FINISH THIS code to fit a linear regression using the lm() function. 
# look at the arguments by typing ?lm() in the console
model <- lm(annual_total ~ year - 1, data = p_annual)

# Run this to extract the summary of the fitted model
summary(model)
## 
## Call:
## lm(formula = annual_total ~ year - 1, data = p_annual)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.529 -4.753 -1.133  1.249 23.700 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)    
## year 0.0213134  0.0009571   22.27 1.81e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.931 on 16 degrees of freedom
## Multiple R-squared:  0.9687, Adjusted R-squared:  0.9668 
## F-statistic: 495.9 on 1 and 16 DF,  p-value: 1.813e-13
# FINISH THIS CODE: MAKE FIGURE 3, plot the annual total precip vs year.
ggplot(p_annual) +
  geom_line(aes(x = year, y = annual_total)) 

QUESTION 4:

By looking at this plot, what 2 years have the biggest storms? And provide 2 potential causes of big rains occurring at the time of year they occurred?

ANSWER:

2004 and 2000. The big rains could’ve occured at these times because of thunderstorms causing convection as well as hurricane season.

# ANALYSIS 4: Lets have fun making more plots to compare inter-annual variation in 
# precip

# Run this code to add another dimension in color to a plot
ggplot(p) +
  geom_line(aes(x=doy, y=PRCP, color=year)) +
  scale_color_viridis_c()

# hmmm...its hard to see differences by year. Lets try plotting it in different plot windows, run this code.
ggplot(p) +
  geom_line(aes(x=doy, y=PRCP)) +
  facet_wrap(~year)

# Lets try plotting cumulative precip so we can better see inter-annual differences. Run this code
p <- p %>%
  group_by(year) %>%
  mutate(cumulative = cumsum(PRCP)) %>%
  ungroup()

ggplot(p) +
  geom_line(aes(x=doy, y=cumulative, group=year, color=year)) +
  scale_color_viridis_c() +
  theme_bw()
## Warning: Removed 696 rows containing missing values or values outside the scale range
## (`geom_line()`).

QUESTION 5:

What season or time of year has the greatest inter-annual variability in precipitation in Pittsburgh?

ANSWER:

the cumulative precipitation diverges starting late summer with lots of interannual variability from late summer through the end of the year.

QUESTION 6:

from 1 to 10 (10 highest, 1 lowest), how fun is plotting and interactive plotting in R?

  1. It can be satisfying and it is interesting but I am still getting used to the coding.