One thing that I heard the most about Seattle when I first moved here last year is that it rains A LOT. I was not worried since I am from Taipei which is also a city that rains quite often. After living one year in Seattle, I feel that it actually rained more in Taipei than in Seattle. To prove that I am right, I decide to use my well earned degree in Buiness Analytics to verify my guess. To do this, I need the the rainfall data from both Seattle and Taipei. The Seattle precipitation data can be found at the National Oceanic and Atmospheric Administration (NOAA) website and the Taipei precipitation data is web scraped at the Central Weather Bureau of Taiwan. I will use daily precipitation data from 2008 to 2019 to test which city rains more, Seattle or Taipei?

Load packages

library(tidyverse)
library(lubridate)
library(reshape2)

Read in data

Taipei <- read_csv("Taipei_Rainfall.csv")
Seattle <- read_csv("Seattle_Rainfall.csv")

Clean Taipei

First, “T” in Precipitation means that there is rain, but less than 0.1 mm. I will just consider it as no rain and set it to 0. Also, the first column is redundant, drop the first index column. Last, use Date range from 2008-01-01 to 2019-12-31

Taipei_final <- Taipei[,2:3]
# Change all "T"s into 0s
Taipei_final$Precipitation <- gsub("T", "0", Taipei_final$Precipitation)
# Make the Precipitation column numeric
Taipei_final$Precipitation <- as.numeric(Taipei_final$Precipitation)    
# Keep data after 2008 
Taipei_final <- Taipei_final[Taipei_final$Date >= ymd("2008-01-01"),]   
# Change column names
names(Taipei_final) <- c("Date", "Taipei_PRCP")                             

Clean Seattle

Keep only DATE and PRCP.

Seattle_Final <- Seattle[, 6:7]                                            # Keep DATE and PRCP
Seattle_Final <- Seattle_Final[Seattle_Final$DATE <= ymd("2019-12-31"),]   # Keep data before 2019 
names(Seattle_Final) <- c("Date", "Seattle_PRCP")                          # Change column names

Combine both rainfall data frames

df <- left_join(Taipei_final, Seattle_Final, by = "Date")

Now we have a clean data, we could now move on to see which city receives more rain?
There are two ways to look into this: - 1) Which city has more rainy days? - 2) Which city has more average precipitation

Moreover, I would define a rainy days as a day that has more than 0mm of percipitation recorded.

Number of rainy days

Taipei_per <- sum(df$Taipei_PRCP != 0) / nrow(df)    # Percentage of rainy days in Taipei
Seattle_per <- sum(df$Seattle_PRCP != 0) / nrow(df)  # Percentage of rainy days in Seattle
print(paste0("Average rainy days in Taipei in a year is: ",round(Taipei_per * 365), " days."))
## [1] "Average rainy days in Taipei in a year is: 164 days."
print(paste0("Average rainy days in Seattle in a year is: ",round(Seattle_per * 365), " days."))
## [1] "Average rainy days in Seattle in a year is: 160 days."

In the past 12 years, the percentage of rainy days are pretty similar between Seattle and Taipei.

T test

We could also use t-tests to see if the percentage of rainy days are significantly different.

df_t <- df
df_t$Taipei_PRCP <- ifelse(df_t$Taipei_PRCP == 0, 0, 1)
df_t$Seattle_PRCP <- ifelse(df_t$Seattle_PRCP == 0, 0, 1)
t.test(df_t$Taipei_PRCP, df_t$Seattle_PRCP)
## 
##  Welch Two Sample t-test
## 
## data:  df_t$Taipei_PRCP and df_t$Seattle_PRCP
## t = 1.0105, df = 8763.9, p-value = 0.3123
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.01007899  0.03152549
## sample estimates:
## mean of x mean of y 
## 0.4485512 0.4378280

It turns out that the percentage of rainy days in Seattle and Taipei are not Significantly different, given that the p-value is greater than the commonly used 0.05 threshold. Now I want to check the total annual precipitation of the two cities.

Total precipitation by year

# Group data by year
df_by_Year <- df %>%
  group_by(year(Date)) %>%
  summarise(Taipei = sum(Taipei_PRCP),
            Seattle = sum(Seattle_PRCP)) 
names(df_by_Year) <- c("Year", "Taipei", "Seattle")
df_by_Year$Year <- factor(df_by_Year$Year)
head(df_by_Year)
## # A tibble: 6 x 3
##   Year  Taipei Seattle
##   <fct>  <dbl>   <dbl>
## 1 2008   2969.    782.
## 2 2009   1669.    977 
## 3 2010   2278.   1195.
## 4 2011   1759.    925.
## 5 2012   2910.   1226 
## 6 2013   2541.    828
df_by_Year_melt <- melt(df_by_Year, id.vars = "Year")
names(df_by_Year_melt) <- c("Year", "City", "Rainfall")
ggplot(data = df_by_Year_melt, aes(x = Year, y = Rainfall, fill = City))+ 
    geom_bar(stat = "identity", position = "dodge") +
    ggtitle("Total Amount of Rainfall by Year") +
    ylab("Precipitation (mm)") +
    theme(axis.text.x = element_text(angle = 45, size = 6, vjust = -0.001))


From this plot we can clearly see that for the past 12 years, the total amount of rain in Taipei is way more than the total amount of rain in Seattle. Even though the amount of rainy days are pretty close, the total precipitation in Taipei is way larger. Next, I want to check the average precipitaion by month.

Average precipitation by Month

# Group data by month
df_by_Month <- df %>%
  group_by(month(Date), year(Date)) %>%
  summarise(Taipei = sum(Taipei_PRCP),
            Seattle = sum(Seattle_PRCP)) 
names(df_by_Month) <- c("Month", "Year", "Taipei", "Seattle")
df_by_Month <- df_by_Month %>%
  group_by(Month) %>%
  summarise(Taipei = mean(Taipei),
            Seattle = mean(Seattle))
df_by_Month$Month <- factor(df_by_Month$Month, labels = month.abb)
head(df_by_Month)
## # A tibble: 6 x 3
##   Month Taipei Seattle
##   <fct>  <dbl>   <dbl>
## 1 Jan     97.5   133. 
## 2 Feb    127.    101. 
## 3 Mar    138.    124. 
## 4 Apr    135.     90.3
## 5 May    246.     48.0
## 6 Jun    369.     33.5
df_by_Month_melt <- melt(df_by_Month, id.vars = "Month")
names(df_by_Month_melt) <- c("Month", "City", "Rainfall")
ggplot(data = df_by_Month_melt, aes(x = Month, y = Rainfall, fill = City))+ 
    geom_bar(stat = "identity", position = "dodge") +
    ggtitle("Average Amount of Rainfall by Month") +
    ylab("Precipitation (mm)")


When we look at the average monthly precipitation, we can see that the distribution of rainfall is pretty different. In Seattle, it rains more in the winter than in the summer. On the other hand, in Taipei, it rains a lot more in the summer than in the winter. The high volume of rain Taipei recieves in the summer are due to the typhoons that strike Taiwan nearly every summer. In this 12-year period, there are a total of 59 typhoons that the Central Weather Bureau had issued typhoon warnings, let alone those the Bureau had not. That is 4.9 typhoons on average per year!

According to the Köppen climate classification, Taipei is in the humid subtropical climate zone where it rains a lot in the summer. On the contrary, Settle is in the Oceanic climate zone where it rains more often in the winter.

Rainfall by day

df_200 <- df %>%
  filter(Taipei_PRCP >= 200)
df_melt <- melt(df, id.vars = "Date")
names(df_melt) <- c("Date", "City", "PRCP")
ggplot(data = df_melt, aes(x = Date, y = PRCP, color = City)) +
  geom_point(alpha = 0.6) +
  geom_text(x = df_200$Date[1], y = df_200$Taipei_PRCP[1] + 15, label = df_200$Date[1], inherit.aes = FALSE, size = 3) + 
  geom_text(x = df_200$Date[2], y = df_200$Taipei_PRCP[2] + 15, label = df_200$Date[2], inherit.aes = FALSE, size = 3) + 
  geom_text(x = df_200$Date[3], y = df_200$Taipei_PRCP[3] + 15, label = df_200$Date[3], inherit.aes = FALSE, size = 3) + 
  geom_text(x = df_200$Date[4], y = df_200$Taipei_PRCP[4] + 15, label = df_200$Date[4], inherit.aes = FALSE, size = 3) + 
  geom_text(x = df_200$Date[5], y = df_200$Taipei_PRCP[5] + 15, label = df_200$Date[5], inherit.aes = FALSE, size = 3) + 
  geom_text(x = df_200$Date[6], y = df_200$Taipei_PRCP[6] + 8, label = df_200$Date[6], inherit.aes = FALSE, size = 3) + 
  geom_text(x = df_200$Date[7], y = df_200$Taipei_PRCP[7] + 15, label = df_200$Date[7], inherit.aes = FALSE, size = 3) 


We can see from this plot that the variation among daily rainfall in Taipei is huge. There are 7 days that received more than 200mm of rain in a single day! (200mm in 24 hour is the definition of heavy rain given by the Central Weather Bureau) I tried to figure out what happened during those days and below is a table showing the typhoons.

Date Cause
2008-09-13 Typhoon Sinlaku
2008-09-14 Typhoon Sinlaku
2012-06-12 Typhoon Guchol
2013-08-21 Typhoon Trami
2014-05-21 Plum Rain season
2015-08-08 Typhoon Soudelor
2015-09-28 Typhoon Dujuan

Conclusion

After looking deep into the data, it would be safe to say that I was right. It did rain more in Taipei than in Seattle, if we are talking about total precipitation. On the other hand, there is no significant difference in the number of rainy days between Seattle and Taipei. Regardless, one thing we could all agree on is that there are way too many rainy days in both Seattle and Taipei and we should definitely enjoy the clear sky when we have the chance.