The weather in the Pacific Northwest is notoriusly rainy and gloomy, but the summers are gloriously full of perfect weather, or at least that’s what we believe. How many days are actually perfect?

To answer this question, we must first think about what a perfect day means. The temperature should be warm, but not hot and the sun should be shining most if not all of the day. So we can define a perfect day as being between a certain range of temperatures, say between 60 degrees F and 80 degrees F, and little to no precipitation. When defining a perfect day, the precipitation can’t be assumed to be zero because we will often have a small sprinkle over night, but then beautiful blue-skyed days.

We will start our data collection at NOAA and the EPA.

The NOAA website has available a weather dataset measured at Seattle-Tacoma Airport (SeaTac) spanning all the way back to 1948 containing 50 variables including both max and min temperature and precipitation data and is available for download here: Seattle Daily

The EPA has a pre-generated air quality dataset which contains the air quality index by day for every county in the US and spanning from 01.01.1980 to 11.27.2018: AQI

Data

Let’s start by loading the datasets and taking a look at their structure.

Load the Seattle weather data:

#read in seattle weather
SeattleWeather <- readr::read_csv("SeattleWeather_1948_2017.csv")
#take a look at the data
kable(head(SeattleWeather))
DATE PRCP TMAX TMIN RAIN
1948-01-01 0.47 51 42 TRUE
1948-01-02 0.59 45 36 TRUE
1948-01-03 0.42 45 35 TRUE
1948-01-04 0.31 45 34 TRUE
1948-01-05 0.17 45 32 TRUE
1948-01-06 0.44 48 39 TRUE
kable(tail(SeattleWeather))
DATE PRCP TMAX TMIN RAIN
2017-12-09 0 44 29 FALSE
2017-12-10 0 49 34 FALSE
2017-12-11 0 49 29 FALSE
2017-12-12 0 46 32 FALSE
2017-12-13 0 48 34 FALSE
2017-12-14 0 50 36 FALSE
#look at the structure of the data
str(SeattleWeather)
## Classes 'tbl_df', 'tbl' and 'data.frame':    25551 obs. of  5 variables:
##  $ DATE: Date, format: "1948-01-01" "1948-01-02" ...
##  $ PRCP: num  0.47 0.59 0.42 0.31 0.17 0.44 0.41 0.04 0.12 0.74 ...
##  $ TMAX: int  51 45 45 45 45 48 50 48 50 43 ...
##  $ TMIN: int  42 36 35 34 32 39 40 35 31 34 ...
##  $ RAIN: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 5
##   .. ..$ DATE:List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_date" "collector"
##   .. ..$ PRCP: list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ TMAX: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ TMIN: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ RAIN: list()
##   .. .. ..- attr(*, "class")= chr  "collector_logical" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

Seattle weather is a dataframe with 25551 observations of 5 variables which are DATE, PRCP, TMAX, TMIN, RAIN. Each observation represents a single day’s meausurements.

TMAX and TMIN are integers representing the maximum and minimum temperatures recorded for that day. RAIN is a logical vector which is true if there is 0.01 inches of rain and false if there is 0.0 inches precipitation whereas PRCP is a numeric representing the amount of daily precipitation measured in inches. DATE is already a date factor so it won’t need to be transformed.

Next we must consider air quality as the recent wildfires on the west coast have greatly impacted everyone’s ability to safely go outside and a good air quality rating should be factored into our definition of a perfect day.

We load each year of AQI data from 1980 to present, filtering for only Seattle, King County, Washington, and then create a single data frame for all the years of data.

#load the code from the EPA and create a table from all the years
#after data was read in and stored in a .csv, code was removed to allow for faster knitting
#data_list <- list()
#for(year in 1980:2018) {
#    temp <- tempfile()
#    base_file_name <- paste0("daily_aqi_by_county_",year)
#    url_name <- paste0("https://aqs.epa.gov/aqsweb/airdata/",base_file_name,".zip")
#    csv_name <- paste0(base_file_name,".csv")
#    download.file(url_name,temp)
#    data <- read_csv(unz(temp, csv_name)) %>%
#        filter(`State Name` == "Washington" & `county Name` == "King")
#    data_list[[year]] <- data
#    unlink(temp)
#}
#AQIdata <- do.call("rbind", data_list)
#load the .csv created from reading in the downloaded AQI year data
AQIdata <- readr::read_csv("AQI_KING.csv")
#take a look at the data
kable(head(AQIdata))
State Name county Name State Code County Code Date AQI Category Defining Parameter Defining Site Number of Sites Reporting
Washington King 53 033 1980-01-01 32 Good CO 53-033-0076 11
Washington King 53 033 1980-01-02 72 Moderate CO 53-033-0077 14
Washington King 53 033 1980-01-03 82 Moderate CO 53-033-0062 14
Washington King 53 033 1980-01-04 128 Unhealthy for Sensitive Groups CO 53-033-0077 14
Washington King 53 033 1980-01-05 89 Moderate CO 53-033-0077 14
Washington King 53 033 1980-01-06 61 Moderate CO 53-033-1003 14
kable(tail(AQIdata))
State Name county Name State Code County Code Date AQI Category Defining Parameter Defining Site Number of Sites Reporting
Washington King 53 033 2018-07-27 64 Moderate Ozone 53-033-0023 10
Washington King 53 033 2018-07-28 67 Moderate Ozone 53-033-0023 10
Washington King 53 033 2018-07-29 147 Unhealthy for Sensitive Groups Ozone 53-033-0023 10
Washington King 53 033 2018-07-30 136 Unhealthy for Sensitive Groups Ozone 53-033-0017 10
Washington King 53 033 2018-07-31 44 Good Ozone 53-033-0017 10
Washington King 53 033 2018-08-01 5 Good CO 53-033-0030 2
#look at the structure of the data
str(AQIdata)
## Classes 'tbl_df', 'tbl' and 'data.frame':    14090 obs. of  10 variables:
##  $ State Name               : chr  "Washington" "Washington" "Washington" "Washington" ...
##  $ county Name              : chr  "King" "King" "King" "King" ...
##  $ State Code               : int  53 53 53 53 53 53 53 53 53 53 ...
##  $ County Code              : chr  "033" "033" "033" "033" ...
##  $ Date                     : Date, format: "1980-01-01" "1980-01-02" ...
##  $ AQI                      : int  32 72 82 128 89 61 103 51 94 63 ...
##  $ Category                 : chr  "Good" "Moderate" "Moderate" "Unhealthy for Sensitive Groups" ...
##  $ Defining Parameter       : chr  "CO" "CO" "CO" "CO" ...
##  $ Defining Site            : chr  "53-033-0076" "53-033-0077" "53-033-0062" "53-033-0077" ...
##  $ Number of Sites Reporting: int  11 14 14 14 14 14 13 14 14 10 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 10
##   .. ..$ State Name               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ county Name              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ State Code               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ County Code              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Date                     :List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_date" "collector"
##   .. ..$ AQI                      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Category                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Defining Parameter       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Defining Site            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Number of Sites Reporting: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

The AQI or air quality index data set consists of 14090 observations of 10 variables including State Name, county Name, State Code, County Code, Date, AQI, Category, Defining Parameter, Defining Site, Number of Sites Reporting. Each observation represents a single day’s meausurements. The Date and Category variables are of interest in determining which days had acceptable air quality in Seattle. Once again the Date was read in as a date factor and category is a character.

So now we have 2 dataframes with temperature, precipitation and air quality included. The next step is to combine these dataframes into one table.

Since our AQI dataset only contains data from 1980, we will only have concurrent dates beginning from 01.01.1980 and going to 12.14.2017 which is when the Seattle Weather Data Ends.

### inner join on the dates available
AllWeather <- inner_join(SeattleWeather, AQIdata, by = c("DATE" = "Date"))

#look at how the new table is structured
str(AllWeather)
## Classes 'tbl_df', 'tbl' and 'data.frame':    13860 obs. of  14 variables:
##  $ DATE                     : Date, format: "1980-01-01" "1980-01-02" ...
##  $ PRCP                     : num  0.22 0.19 0 0.13 0.12 0 0.19 0.37 0.15 0 ...
##  $ TMAX                     : int  52 46 46 40 39 35 30 29 34 38 ...
##  $ TMIN                     : int  44 37 33 34 30 26 24 26 26 28 ...
##  $ RAIN                     : logi  TRUE TRUE FALSE TRUE TRUE FALSE ...
##  $ State Name               : chr  "Washington" "Washington" "Washington" "Washington" ...
##  $ county Name              : chr  "King" "King" "King" "King" ...
##  $ State Code               : int  53 53 53 53 53 53 53 53 53 53 ...
##  $ County Code              : chr  "033" "033" "033" "033" ...
##  $ AQI                      : int  32 72 82 128 89 61 103 51 94 63 ...
##  $ Category                 : chr  "Good" "Moderate" "Moderate" "Unhealthy for Sensitive Groups" ...
##  $ Defining Parameter       : chr  "CO" "CO" "CO" "CO" ...
##  $ Defining Site            : chr  "53-033-0076" "53-033-0077" "53-033-0062" "53-033-0077" ...
##  $ Number of Sites Reporting: int  11 14 14 14 14 14 13 14 14 10 ...
#look at the sumamry statistics of air quality
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(AQI))
variable type stat level value formatted
AQI integer missing .all 0.00000 0
AQI integer complete .all 13860.00000 13860
AQI integer n .all 13860.00000 13860
AQI integer mean .all 56.53550 56.54
AQI integer sd .all 26.53875 26.54
AQI integer p0 .all 7.00000 7
AQI integer p25 .all 39.00000 39
AQI integer p50 .all 49.00000 49
AQI integer p75 .all 66.00000 66
AQI integer p100 .all 220.00000 220
AQI integer hist .all NA ▂▇▂▁▁▁▁▁
#look at the sumamry statistics of max temperature
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(TMAX))
variable type stat level value formatted
TMAX integer missing .all 0.00000 0
TMAX integer complete .all 13860.00000 13860
TMAX integer n .all 13860.00000 13860
TMAX integer mean .all 60.12345 60.12
TMAX integer sd .all 12.67805 12.68
TMAX integer p0 .all 18.00000 18
TMAX integer p25 .all 50.00000 50
TMAX integer p50 .all 59.00000 59
TMAX integer p75 .all 69.00000 69
TMAX integer p100 .all 103.00000 103
TMAX integer hist .all NA ▁▁▅▇▆▃▂▁
#look at the sumamry statistics of min temperature
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(TMIN))
variable type stat level value formatted
TMIN integer missing .all 0.000000 0
TMIN integer complete .all 13860.000000 13860
TMIN integer n .all 13860.000000 13860
TMIN integer mean .all 45.243362 45.24
TMIN integer sd .all 8.771334 8.77
TMIN integer p0 .all 7.000000 7
TMIN integer p25 .all 39.000000 39
TMIN integer p50 .all 45.000000 45
TMIN integer p75 .all 52.000000 52
TMIN integer p100 .all 71.000000 71
TMIN integer hist .all NA ▁▁▂▆▇▇▃▁
#look at the sumamry statistics of precipitation
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(PRCP))
variable type stat level value formatted
PRCP numeric missing .all 3.000000e+00 3
PRCP numeric complete .all 1.385700e+04 13857
PRCP numeric n .all 1.386000e+04 13860
PRCP numeric mean .all 1.051519e-01 0.11
PRCP numeric sd .all 2.436655e-01 0.24
PRCP numeric p0 .all 0.000000e+00 0
PRCP numeric p25 .all 0.000000e+00 0
PRCP numeric p50 .all 0.000000e+00 0
PRCP numeric p75 .all 1.000000e-01 0.1
PRCP numeric p100 .all 5.020000e+00 5.02
PRCP numeric hist .all NA ▇▁▁▁▁▁▁▁
#how many NA's are in the new table
sum(is.na(AllWeather))
## [1] 6

From the numberic summary statistics, we can see that AQI, TMAX, TMIN, and PCRP have little or no missing data, reasonable mins and maxs, and the histograms show that the data is normally distributed.

Visualizing the Data

Now that we have our two datasets joined together, let’s take a look at the max and min temp ranges for each year with air quality.

#plot tmin and tmax against date - color green for good days and red for bad
AllWeather %>% 
    mutate(year = year(DATE), day_of_year = yday(DATE) ) %>%
    ggplot(aes(x=day_of_year, ymin=TMIN, ymax=TMAX, colour=AQI)) +
    geom_linerange(size=0.5) +
    scale_colour_gradient2(low = "green", mid="yellow", high = "red", midpoint = 100) +
    # 125 is the midpoint of the "Unhealthy for Sensitive Groups" level
    # 100 is the transition point from "Moderate" to "Unhealthy for Sensitive Groups"
    facet_grid( year ~ .) +
    xlab("Day of Year") + ylab("Temp (F, min to max)") +
    ggtitle("Maximum and Minimum Temperature with Air Quality from 1980 to 2017") +
    theme_minimal()

From the plots we can see that air quality tends to be an issue in the summer time. The temperature range is pretty consistent throughout the year.

And now the precipitation by day for each year.

#plot precipation by year
AllWeather %>% 
    mutate(year = year(DATE), day_of_year = yday(DATE) ) %>%
    ggplot(aes(x=day_of_year, y=PRCP)) +
    geom_bar(stat="identity") +
    facet_grid( year ~ .) +
    xlab("Day of Year") + ylab("Precipitation (inches)") +
    coord_cartesian(ylim = c(0, 1)) +
    scale_y_continuous(breaks=c(0,0.5,1)) +
    ggtitle("Precipitation from 1980 to 2017") +
    theme_classic()

There are definite gaps in the precipitation each year which makes sense as the summers tend to be very dry. It is interesting to see how the dry weather gaps do not occur consistently at the same time of year.

Perfect Weather

So, now that we have visualized the weather and air quality in Seattle, we can begin to define what perfect weather is. We can select a “perfect” temperature range, let’s say only the days with TMIN and TMAX between 60 and 80 degrees F, and also days with no rain. We can assume that given this temperature range, the precipitation that falls will be rain rather than snow. One approach to selecting would be to filter the dataset for RAIN being FALSE, but this does not allow for any precipitation at all. Instead,let’s consider the PCRP variable which is inches of rain measured in a day. We can filter for precipitation less than a certain amount of rain in inches to account for the common occurence of rain over night and this approach will allow for sensitivity testing of edge perfect days. The data may be sensitive close to the ranges that we select, so we will want to examine the data near our range to see if there is a great difference in perfect days.

Sensitivity Testing for AQI and Temperature Range

#let's first look at the number of days in the 60-80 degree range
SeattleWeather_TempRange6080 <- AllWeather %>%
    filter(TMAX <= 80 & TMIN >= 60)
count(SeattleWeather_TempRange6080)
## # A tibble: 1 x 1
##       n
##   <int>
## 1   123
#now let's widen the range to  55-85 degree range
SeattleWeather_TempRange5585 <- AllWeather %>%
    filter(TMAX <= 85 & TMIN >= 55)
count(SeattleWeather_TempRange5585)
## # A tibble: 1 x 1
##       n
##   <int>
## 1  1935
Temp Range (F) Days within Temp Range Percentage
60 - 80 123 0.89%
55 - 85 1935 13.96%

There’s a vast difference in the percentage of days between the temp range of 60 and 80 which is 0.89%, and between 55 and 85 which is 13.96%. We will go with the later because some people might be a little less sensitive to the temperature range and we can consider more possible perfect days.

Filtering for precipitation range:

#total number of days in the data set
nrow(AllWeather)
## [1] 13860
###how many days with rain less than 0.01
SeattleWeather_noRain0.01 <- AllWeather %>%
    filter(PRCP <= 0.01)
count(SeattleWeather_noRain0.01)
## # A tibble: 1 x 1
##       n
##   <int>
## 1  8553
###how many days with rain less than 0.01
SeattleWeather_noRain0.03 <- AllWeather %>%
    filter(PRCP <= 0.03)
count(SeattleWeather_noRain0.03)
## # A tibble: 1 x 1
##       n
##   <int>
## 1  9175
###how many days with rain less than 0.05
SeattleWeather_noRain0.05 <- AllWeather %>%
    filter(PRCP <= 0.05)
count(SeattleWeather_noRain0.05)
## # A tibble: 1 x 1
##       n
##   <int>
## 1  9652
###or days with rain less than 0.1
SeattleWeather_noRain0.1 <- AllWeather %>%
    filter(PRCP <= 0.1)
count(SeattleWeather_noRain0.1)
## # A tibble: 1 x 1
##       n
##   <int>
## 1 10495
Rain Amount(In) Days with Rain Percentage
0.01 8553 61.71%
0.03 9175 66.2%
0.05 9652 69.64%
0.10 10495 75.72%

The percentage of days with less than or equal to 0.01 inches of precipitation is 61.71%, 0.05 inches of precipitation is 69.64% while 0.1 is 75.72%. There is much less sensitivity to days within the precipitation range, so we will select 0.05 inches.

Next, we need to choose only the dates where the air quality is listed as Moderate or Good. We can use the Date and Category columns to filter for the moderate to good air quality days.

# Create a new feature on AllWeather if the day is "perfect" or not
AllWeather <- AllWeather %>%
    mutate(PerfectDay = (TMAX <= 85 & TMIN >= 55 & PRCP <= 0.05 & AQI <= 100))
#filter the complete weather data set 
PerfectWeather <- AllWeather %>% filter(PerfectDay)

str(PerfectWeather)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1534 obs. of  15 variables:
##  $ DATE                     : Date, format: "1980-07-10" "1980-07-14" ...
##  $ PRCP                     : num  0 0.03 0 0.01 0 0 0 0 0 0 ...
##  $ TMAX                     : int  65 68 74 68 68 78 85 76 69 78 ...
##  $ TMIN                     : int  55 55 55 56 57 60 60 59 57 56 ...
##  $ RAIN                     : logi  FALSE TRUE FALSE TRUE FALSE FALSE ...
##  $ State Name               : chr  "Washington" "Washington" "Washington" "Washington" ...
##  $ county Name              : chr  "King" "King" "King" "King" ...
##  $ State Code               : int  53 53 53 53 53 53 53 53 53 53 ...
##  $ County Code              : chr  "033" "033" "033" "033" ...
##  $ AQI                      : int  56 64 81 64 94 94 75 38 69 56 ...
##  $ Category                 : chr  "Moderate" "Moderate" "Moderate" "Moderate" ...
##  $ Defining Parameter       : chr  "SO2" "CO" "SO2" "CO" ...
##  $ Defining Site            : chr  "53-033-0057" "53-033-0062" "53-033-0057" "53-033-0051" ...
##  $ Number of Sites Reporting: int  11 9 10 9 12 12 10 11 11 11 ...
##  $ PerfectDay               : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

Now we have one table containing 1534 days where the weather falls in our defined temperature range with little to no rain and moderate or good air quality.

Let’s take a look at the distribution of the perfect weather days.

#plot tmin and tmax against date - color green for good days and red for bad
PerfectWeather %>% 
    mutate(year = year(DATE), day_of_year = yday(DATE) ) %>%
    ggplot(aes(x=day_of_year, ymin=TMIN, ymax=TMAX)) +
    geom_linerange(size=0.5) +
    facet_grid( year ~ .) +
    xlab("Day of Year") + ylab("Temp (F, min to max)") + 
    ggtitle("Distribution of Perfect Weather Days from 1980 to 2017") +
    theme_minimal()

We see right away that 1980 and 1981 have fewer perfect days. Everyone in the Pacific Northwest recalls the events of May 18, 1980 when Mt. St. Helens erupted causing extremely poor air quality and possibly affecting the weather of those summers.

Statistical Analysis- Hypothesis Test

Our null hypothesis is that the number of perfect days per year is not changing over time. We can also write this out in an equation form:

H0 = there is no difference in the number of perfect days over time. H1 = there is a difference in the number of perfect days over time.

We can use a linear model to evaluate whether there is a linear relationship between the number of perfect days per as a function of time. The equation for a linear model is \(y = {\beta}_0 + {\beta}_1*x\).

#summarise the number of perfect days by year
PerfectWeatherYears <- PerfectWeather %>%
    group_by( year=year(DATE) ) %>%
    count()
#linear regression model
weather_model <- lm(n ~ year, PerfectWeatherYears)
summary(weather_model)
## 
## Call:
## lm(formula = n ~ year, data = PerfectWeatherYears)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5040  -7.0110   0.5756   8.6573  20.2064 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1547.2353   314.8649  -4.914 1.95e-05 ***
## year            0.7944     0.1575   5.042 1.32e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.65 on 36 degrees of freedom
## Multiple R-squared:  0.4139, Adjusted R-squared:  0.3976 
## F-statistic: 25.42 on 1 and 36 DF,  p-value: 1.32e-05
#plot the number of perfect days as a function of time
plot(PerfectWeatherYears$n ~ PerfectWeatherYears$year, xlab = "Year", ylab = "Perfect Days / Year")
abline(weather_model)

#summary plot the model
plot(weather_model)

#histogram of the residuals
hist(weather_model$residuals)

Conditions for Regression:

So, we can see from the model coefficients that the number of perfect days is increasing by 0.79 days per year. The p-value for the model 1.310^{-5} or very close to zero and we can therefore reject our null hypothesis that the number of perfect weather days is not changing over time. The weather in the Pacific Northwest does appear to be changing over time with the number of days which fall into our definition of perfect weather increasing at 0.79 days per year. Essentially, the PNW’s reputation as a gloomy and dreary place to live is false as the weather is trending better every year.

Weekends

Finally, let’s compare the data over the range of years to find how many of these perfect days coincide with weekends or when festivals and events are happening in the area?

How many perfect days fall on weekends? It’s a common myth here in the Seattle area that you know it’s the weekend because it starts raining. Is it true that it rains more on the weekends?

#filter perfect weather dates for weekend dates
temp_filterweekend <- PerfectWeather %>% filter( is.weekend(DATE) )
#number of weekend days that are pefect and the with proportion for perfect weather dates
nrow(temp_filterweekend)
## [1] 428
nrow(temp_filterweekend)/nrow(PerfectWeather)
## [1] 0.2790091

The number of perfect weather days which fall on weekends is 428 which is 27.9% of the perfect weather days. Obviously, the old saying that you know it’s the weekend because it starts raining is false as well.

Events

We need data on events in the area which we can obtain from a local events website:

Seattle Special Events Data

#read in the url and get data for each month
baseurl <- 'https://www.events12.com/seattle/'
months <- c('january','february','march','april','may','june',
            'july','august','september','october','november','december')
dates <- c()
#get the dates for events from each month
for( month in months ) {
    url <- paste0(baseurl,month,'/')
    events_data <- read_html(url)
    datestext <- events_data %>% html_nodes(".date") %>% html_text()
    dates <- c(dates, datestext)
}
#create a dataframe of all the dates
ds <- c()
for( datetxt in dates ) {
    date <- AsDate(datetxt, on.parse.failure="warn")
    ds <- c(ds,as_date(date))
}
#create a dataframe and transform the dates to the same format as PerfectWeather
dates_df = as.data.frame(ds)
dates_df %<>% transmute(date = as_date(ds)) %>% distinct() %>%
    filter(!is.na(date)) %>%
    filter(year(date) > 2017 & year(date) < 2020) %>%
    mutate(month = month(date), day = day(date), year = 2017 ) %>%
    mutate(date = mdy( paste0(month,"-",day,"-",year) ) ) 

#how many event dates are there
nrow(dates_df)
## [1] 127
#create an events table of events dates that fall on perfect days 
Seattle_events <- inner_join(PerfectWeather %>% filter(DATE >= "2017-01-01")
                             , dates_df, by = c("DATE" = "date"))
#number of events on perfect days
nrow(Seattle_events)
## [1] 22
kable(head(Seattle_events$DATE))
x
2017-06-01
2017-07-13
2017-07-14
2017-07-18
2017-07-20
2017-07-21
kable(tail(Seattle_events$DATE))
x
2017-08-18
2017-08-19
2017-08-30
2017-08-31
2017-09-08
2017-09-09

This data set consists of 127 observations of 4 different variables including date, month, day, year. The dates were reformatted in order to join the events dates with our weather data. Obtaining data for prior events proved to be difficult and so we are using the 2017 perfect weather dates to comare to the 2018 event dates. We find the 22 events dates coincide with perfect weather dates and that these dates occur between June 1 and September 30. This is what we would expect as most perfect days occur at or near summer.

Conclusion

Though there a many days in Seattle which are gloomy and dreary, the number of perfect weather days is increasing every year. These perfect weather days tend to occur more in the summer time when the rain dries up; also, thus far the air quality issues of summer wildfires have not greatly impacted the number of perfect summer days. The weekends also do not tend to be more rainy than weekdays in the summer time. So everyone who doubts that Seattle is a beautiful place to live because of the rainy weather should know that with every coming year there are more perfect days on average. This analysis may partially explain, along with economic factors not analyzed here, why the Seattle-metro area is a relocation destination as well as a summer time outdoor activity mecca.