One thing that I heard the most about Seattle when I first moved here last year is that it rains A LOT. I was not worried since I am from Taipei which is also a city that rains quite often. After living one year in Seattle, I feel that it actually rained more in Taipei than in Seattle. To prove that I am right, I decide to use my well earned degree in Buiness Analytics to verify my guess. To do this, I need the the rainfall data from both Seattle and Taipei. The Seattle precipitation data can be found at the National Oceanic and Atmospheric Administration (NOAA) website and the Taipei precipitation data is web scraped at the Central Weather Bureau of Taiwan. I will use daily precipitation data from 2008 to 2019 to test which city rains more, Seattle or Taipei?
library(tidyverse)
library(lubridate)
library(reshape2)
Taipei <- read_csv("Taipei_Rainfall.csv")
Seattle <- read_csv("Seattle_Rainfall.csv")
First, “T” in Precipitation means that there is rain, but less than 0.1 mm. I will just consider it as no rain and set it to 0. Also, the first column is redundant, drop the first index column. Last, use Date range from 2008-01-01 to 2019-12-31
Taipei_final <- Taipei[,2:3]
# Change all "T"s into 0s
Taipei_final$Precipitation <- gsub("T", "0", Taipei_final$Precipitation)
# Make the Precipitation column numeric
Taipei_final$Precipitation <- as.numeric(Taipei_final$Precipitation)
# Keep data after 2008
Taipei_final <- Taipei_final[Taipei_final$Date >= ymd("2008-01-01"),]
# Change column names
names(Taipei_final) <- c("Date", "Taipei_PRCP")
Keep only DATE and PRCP.
Seattle_Final <- Seattle[, 6:7] # Keep DATE and PRCP
Seattle_Final <- Seattle_Final[Seattle_Final$DATE <= ymd("2019-12-31"),] # Keep data before 2019
names(Seattle_Final) <- c("Date", "Seattle_PRCP") # Change column names
df <- left_join(Taipei_final, Seattle_Final, by = "Date")
Now we have a clean data, we could now move on to see which city receives more rain? There are two ways to look into this: - 1) Which city has more rainy days? - 2) Which city has more average precipitation
Moreover, I would define a rainy days as a day that has more than 0mm of percipitation recorded.
Taipei_per <- sum(df$Taipei_PRCP != 0) / nrow(df) # Percentage of rainy days in Taipei
Seattle_per <- sum(df$Seattle_PRCP != 0) / nrow(df) # Percentage of rainy days in Seattle
print(paste0("Average rainy days in Taipei in a year is: ",round(Taipei_per * 365), " days."))
## [1] "Average rainy days in Taipei in a year is: 164 days."
print(paste0("Average rainy days in Seattle in a year is: ",round(Seattle_per * 365), " days."))
## [1] "Average rainy days in Seattle in a year is: 160 days."
In the past 12 years, the percentage of rainy days are pretty similar between Seattle and Taipei.
We could also use t-tests to see if the percentage of rainy days are significantly different.
df_t <- df
df_t$Taipei_PRCP <- ifelse(df_t$Taipei_PRCP == 0, 0, 1)
df_t$Seattle_PRCP <- ifelse(df_t$Seattle_PRCP == 0, 0, 1)
t.test(df_t$Taipei_PRCP, df_t$Seattle_PRCP)
##
## Welch Two Sample t-test
##
## data: df_t$Taipei_PRCP and df_t$Seattle_PRCP
## t = 1.0105, df = 8763.9, p-value = 0.3123
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.01007899 0.03152549
## sample estimates:
## mean of x mean of y
## 0.4485512 0.4378280
It turns out that the percentage of rainy days in Seattle and Taipei are not Significantly different, given that the p-value is greater than the commonly used 0.05 threshold. Now I want to check the total annual precipitation of the two cities.
# Group data by year
df_by_Year <- df %>%
group_by(year(Date)) %>%
summarise(Taipei = sum(Taipei_PRCP),
Seattle = sum(Seattle_PRCP))
names(df_by_Year) <- c("Year", "Taipei", "Seattle")
df_by_Year$Year <- factor(df_by_Year$Year)
head(df_by_Year)
## # A tibble: 6 x 3
## Year Taipei Seattle
## <fct> <dbl> <dbl>
## 1 2008 2969. 782.
## 2 2009 1669. 977
## 3 2010 2278. 1195.
## 4 2011 1759. 925.
## 5 2012 2910. 1226
## 6 2013 2541. 828
df_by_Year_melt <- melt(df_by_Year, id.vars = "Year")
names(df_by_Year_melt) <- c("Year", "City", "Rainfall")
ggplot(data = df_by_Year_melt, aes(x = Year, y = Rainfall, fill = City))+
geom_bar(stat = "identity", position = "dodge") +
ggtitle("Total Amount of Rainfall by Year") +
ylab("Precipitation (mm)") +
theme(axis.text.x = element_text(angle = 45, size = 6, vjust = -0.001))
From this plot we can clearly see that for the past 12 years, the total amount of rain in Taipei is way more than the total amount of rain in Seattle. Even though the amount of rainy days are pretty close, the total precipitation in Taipei is way larger. Next, I want to check the average precipitaion by month.
# Group data by month
df_by_Month <- df %>%
group_by(month(Date), year(Date)) %>%
summarise(Taipei = sum(Taipei_PRCP),
Seattle = sum(Seattle_PRCP))
names(df_by_Month) <- c("Month", "Year", "Taipei", "Seattle")
df_by_Month <- df_by_Month %>%
group_by(Month) %>%
summarise(Taipei = mean(Taipei),
Seattle = mean(Seattle))
df_by_Month$Month <- factor(df_by_Month$Month, labels = month.abb)
head(df_by_Month)
## # A tibble: 6 x 3
## Month Taipei Seattle
## <fct> <dbl> <dbl>
## 1 Jan 97.5 133.
## 2 Feb 127. 101.
## 3 Mar 138. 124.
## 4 Apr 135. 90.3
## 5 May 246. 48.0
## 6 Jun 369. 33.5
df_by_Month_melt <- melt(df_by_Month, id.vars = "Month")
names(df_by_Month_melt) <- c("Month", "City", "Rainfall")
ggplot(data = df_by_Month_melt, aes(x = Month, y = Rainfall, fill = City))+
geom_bar(stat = "identity", position = "dodge") +
ggtitle("Average Amount of Rainfall by Month") +
ylab("Precipitation (mm)")
When we look at the average monthly precipitation, we can see that the distribution of rainfall is pretty different. In Seattle, it rains more in the winter than in the summer. On the other hand, in Taipei, it rains a lot more in the summer than in the winter. The high volume of rain Taipei recieves in the summer are due to the typhoons that strike Taiwan nearly every summer. In this 12-year period, there are a total of 59 typhoons that the Central Weather Bureau had issued typhoon warnings, let alone those the Bureau had not. That is 4.9 typhoons on average per year!
According to the Köppen climate classification, Taipei is in the humid subtropical climate zone where it rains a lot in the summer. On the contrary, Settle is in the Oceanic climate zone where it rains more often in the winter.
df_200 <- df %>%
filter(Taipei_PRCP >= 200)
df_melt <- melt(df, id.vars = "Date")
names(df_melt) <- c("Date", "City", "PRCP")
ggplot(data = df_melt, aes(x = Date, y = PRCP, color = City)) +
geom_point(alpha = 0.6) +
geom_text(x = df_200$Date[1], y = df_200$Taipei_PRCP[1] + 15, label = df_200$Date[1], inherit.aes = FALSE, size = 3) +
geom_text(x = df_200$Date[2], y = df_200$Taipei_PRCP[2] + 15, label = df_200$Date[2], inherit.aes = FALSE, size = 3) +
geom_text(x = df_200$Date[3], y = df_200$Taipei_PRCP[3] + 15, label = df_200$Date[3], inherit.aes = FALSE, size = 3) +
geom_text(x = df_200$Date[4], y = df_200$Taipei_PRCP[4] + 15, label = df_200$Date[4], inherit.aes = FALSE, size = 3) +
geom_text(x = df_200$Date[5], y = df_200$Taipei_PRCP[5] + 15, label = df_200$Date[5], inherit.aes = FALSE, size = 3) +
geom_text(x = df_200$Date[6], y = df_200$Taipei_PRCP[6] + 8, label = df_200$Date[6], inherit.aes = FALSE, size = 3) +
geom_text(x = df_200$Date[7], y = df_200$Taipei_PRCP[7] + 15, label = df_200$Date[7], inherit.aes = FALSE, size = 3)
We can see from this plot that the variation among daily rainfall in Taipei is huge. There are 7 days that received more than 200mm of rain in a single day! (200mm in 24 hour is the definition of heavy rain given by the Central Weather Bureau) I tried to figure out what happened during those days and below is a table showing the typhoons.
| Date | Cause |
|---|---|
| 2008-09-13 | Typhoon Sinlaku |
| 2008-09-14 | Typhoon Sinlaku |
| 2012-06-12 | Typhoon Guchol |
| 2013-08-21 | Typhoon Trami |
| 2014-05-21 | Plum Rain season |
| 2015-08-08 | Typhoon Soudelor |
| 2015-09-28 | Typhoon Dujuan |
After looking deep into the data, it would be safe to say that I was right. It did rain more in Taipei than in Seattle, if we are talking about total precipitation. On the other hand, there is no significant difference in the number of rainy days between Seattle and Taipei. Regardless, one thing we could all agree on is that there are way too many rainy days in both Seattle and Taipei and we should definitely enjoy the clear sky when we have the chance.