Seattle’s Perfect Summer: Fact or Fiction?

The weather in the Pacific Northwest is notoriusly rainy and gloomy, but the summers are gloriously full of perfect weather, or at least that’s what we believe. How many days are actually perfect?

To answer this question, we must first think about what a perfect day means. The temperature should be warm, but not hot and the sun should be shining most if not all of the day. So we can define a perfect day as being between a certain range of temperatures, say between 60 degrees F and 80 degrees F, and little to no precipitation. When defining a perfect day, the precipitation can’t be assumed to be zero because we will often have a small sprinkle over night, but then beautiful blue-skyed days.

We will start our data collection at NOAA and the EPA.

The NOAA website has available a weather dataset measured at Seattle-Tacoma Airport (SeaTac) spanning all the way back to 1948 containing 50 variables including both max and min temperature and precipitation data and is available for download here: Seattle Daily

The EPA has a pre-generated air quality dataset which contains the air quality index by day for every county in the US and spanning from 01.01.1980 to 11.27.2018: AQI

Data

Let’s start by loading the datasets and taking a look at their structure.

Load the Seattle weather data:

#read in seattle weather
SeattleWeather <- readr::read_csv("SeattleWeather_1948_2017.csv")
#take a look at the data
kable(head(SeattleWeather))

DATE	PRCP	TMAX	TMIN	RAIN
1948-01-01	0.47	51	42	TRUE
1948-01-02	0.59	45	36	TRUE
1948-01-03	0.42	45	35	TRUE
1948-01-04	0.31	45	34	TRUE
1948-01-05	0.17	45	32	TRUE
1948-01-06	0.44	48	39	TRUE

kable(tail(SeattleWeather))

DATE	TMAX	TMIN	RAIN
2017-12-09	44	29	FALSE
2017-12-10	49	34	FALSE
2017-12-11	49	29	FALSE
2017-12-12	46	32	FALSE
2017-12-13	48	34	FALSE
2017-12-14	50	36	FALSE

#look at the structure of the data
str(SeattleWeather)

## Classes 'tbl_df', 'tbl' and 'data.frame':    25551 obs. of  5 variables:
##  $ DATE: Date, format: "1948-01-01" "1948-01-02" ...
##  $ PRCP: num  0.47 0.59 0.42 0.31 0.17 0.44 0.41 0.04 0.12 0.74 ...
##  $ TMAX: int  51 45 45 45 45 48 50 48 50 43 ...
##  $ TMIN: int  42 36 35 34 32 39 40 35 31 34 ...
##  $ RAIN: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 5
##   .. ..$ DATE:List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_date" "collector"
##   .. ..$ PRCP: list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ TMAX: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ TMIN: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ RAIN: list()
##   .. .. ..- attr(*, "class")= chr  "collector_logical" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

Seattle weather is a dataframe with 25551 observations of 5 variables which are DATE, PRCP, TMAX, TMIN, RAIN. Each observation represents a single day’s meausurements.

TMAX and TMIN are integers representing the maximum and minimum temperatures recorded for that day. RAIN is a logical vector which is true if there is 0.01 inches of rain and false if there is 0.0 inches precipitation whereas PRCP is a numeric representing the amount of daily precipitation measured in inches. DATE is already a date factor so it won’t need to be transformed.

Next we must consider air quality as the recent wildfires on the west coast have greatly impacted everyone’s ability to safely go outside and a good air quality rating should be factored into our definition of a perfect day.

We load each year of AQI data from 1980 to present, filtering for only Seattle, King County, Washington, and then create a single data frame for all the years of data.

#load the code from the EPA and create a table from all the years
#after data was read in and stored in a .csv, code was removed to allow for faster knitting
#data_list <- list()
#for(year in 1980:2018) {
#    temp <- tempfile()
#    base_file_name <- paste0("daily_aqi_by_county_",year)
#    url_name <- paste0("https://aqs.epa.gov/aqsweb/airdata/",base_file_name,".zip")
#    csv_name <- paste0(base_file_name,".csv")
#    download.file(url_name,temp)
#    data <- read_csv(unz(temp, csv_name)) %>%
#        filter(`State Name` == "Washington" & `county Name` == "King")
#    data_list[[year]] <- data
#    unlink(temp)
#}
#AQIdata <- do.call("rbind", data_list)
#load the .csv created from reading in the downloaded AQI year data
AQIdata <- readr::read_csv("AQI_KING.csv")
#take a look at the data
kable(head(AQIdata))

State Name	county Name	State Code	County Code	Date	AQI	Category	Defining Parameter	Defining Site	Number of Sites Reporting
Washington	King	53	033	1980-01-01	32	Good	CO	53-033-0076	11
Washington	King	53	033	1980-01-02	72	Moderate	CO	53-033-0077	14
Washington	King	53	033	1980-01-03	82	Moderate	CO	53-033-0062	14
Washington	King	53	033	1980-01-04	128	Unhealthy for Sensitive Groups	CO	53-033-0077	14
Washington	King	53	033	1980-01-05	89	Moderate	CO	53-033-0077	14
Washington	King	53	033	1980-01-06	61	Moderate	CO	53-033-1003	14

kable(tail(AQIdata))

State Name	county Name	State Code	County Code	Date	AQI	Category	Defining Parameter	Defining Site	Number of Sites Reporting
Washington	King	53	033	2018-07-27	64	Moderate	Ozone	53-033-0023	10
Washington	King	53	033	2018-07-28	67	Moderate	Ozone	53-033-0023	10
Washington	King	53	033	2018-07-29	147	Unhealthy for Sensitive Groups	Ozone	53-033-0023	10
Washington	King	53	033	2018-07-30	136	Unhealthy for Sensitive Groups	Ozone	53-033-0017	10
Washington	King	53	033	2018-07-31	44	Good	Ozone	53-033-0017	10
Washington	King	53	033	2018-08-01	5	Good	CO	53-033-0030	2

#look at the structure of the data
str(AQIdata)

## Classes 'tbl_df', 'tbl' and 'data.frame':    14090 obs. of  10 variables:
##  $ State Name               : chr  "Washington" "Washington" "Washington" "Washington" ...
##  $ county Name              : chr  "King" "King" "King" "King" ...
##  $ State Code               : int  53 53 53 53 53 53 53 53 53 53 ...
##  $ County Code              : chr  "033" "033" "033" "033" ...
##  $ Date                     : Date, format: "1980-01-01" "1980-01-02" ...
##  $ AQI                      : int  32 72 82 128 89 61 103 51 94 63 ...
##  $ Category                 : chr  "Good" "Moderate" "Moderate" "Unhealthy for Sensitive Groups" ...
##  $ Defining Parameter       : chr  "CO" "CO" "CO" "CO" ...
##  $ Defining Site            : chr  "53-033-0076" "53-033-0077" "53-033-0062" "53-033-0077" ...
##  $ Number of Sites Reporting: int  11 14 14 14 14 14 13 14 14 10 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 10
##   .. ..$ State Name               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ county Name              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ State Code               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ County Code              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Date                     :List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_date" "collector"
##   .. ..$ AQI                      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Category                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Defining Parameter       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Defining Site            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Number of Sites Reporting: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

The AQI or air quality index data set consists of 14090 observations of 10 variables including State Name, county Name, State Code, County Code, Date, AQI, Category, Defining Parameter, Defining Site, Number of Sites Reporting. Each observation represents a single day’s meausurements. The Date and Category variables are of interest in determining which days had acceptable air quality in Seattle. Once again the Date was read in as a date factor and category is a character.

So now we have 2 dataframes with temperature, precipitation and air quality included. The next step is to combine these dataframes into one table.

Since our AQI dataset only contains data from 1980, we will only have concurrent dates beginning from 01.01.1980 and going to 12.14.2017 which is when the Seattle Weather Data Ends.

### inner join on the dates available
AllWeather <- inner_join(SeattleWeather, AQIdata, by = c("DATE" = "Date"))

#look at how the new table is structured
str(AllWeather)

## Classes 'tbl_df', 'tbl' and 'data.frame':    13860 obs. of  14 variables:
##  $ DATE                     : Date, format: "1980-01-01" "1980-01-02" ...
##  $ PRCP                     : num  0.22 0.19 0 0.13 0.12 0 0.19 0.37 0.15 0 ...
##  $ TMAX                     : int  52 46 46 40 39 35 30 29 34 38 ...
##  $ TMIN                     : int  44 37 33 34 30 26 24 26 26 28 ...
##  $ RAIN                     : logi  TRUE TRUE FALSE TRUE TRUE FALSE ...
##  $ State Name               : chr  "Washington" "Washington" "Washington" "Washington" ...
##  $ county Name              : chr  "King" "King" "King" "King" ...
##  $ State Code               : int  53 53 53 53 53 53 53 53 53 53 ...
##  $ County Code              : chr  "033" "033" "033" "033" ...
##  $ AQI                      : int  32 72 82 128 89 61 103 51 94 63 ...
##  $ Category                 : chr  "Good" "Moderate" "Moderate" "Unhealthy for Sensitive Groups" ...
##  $ Defining Parameter       : chr  "CO" "CO" "CO" "CO" ...
##  $ Defining Site            : chr  "53-033-0076" "53-033-0077" "53-033-0062" "53-033-0077" ...
##  $ Number of Sites Reporting: int  11 14 14 14 14 14 13 14 14 10 ...

#look at the sumamry statistics of air quality
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(AQI))

variable	type	stat	level	value	formatted
AQI	integer	missing	.all	0.00000	0
AQI	integer	complete	.all	13860.00000	13860
AQI	integer	n	.all	13860.00000	13860
AQI	integer	mean	.all	56.53550	56.54
AQI	integer	sd	.all	26.53875	26.54
AQI	integer	p0	.all	7.00000	7
AQI	integer	p25	.all	39.00000	39
AQI	integer	p50	.all	49.00000	49
AQI	integer	p75	.all	66.00000	66
AQI	integer	p100	.all	220.00000	220
AQI	integer	hist	.all	NA	▂▇▂▁▁▁▁▁

#look at the sumamry statistics of max temperature
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(TMAX))

variable	type	stat	level	value	formatted
TMAX	integer	missing	.all	0.00000	0
TMAX	integer	complete	.all	13860.00000	13860
TMAX	integer	n	.all	13860.00000	13860
TMAX	integer	mean	.all	60.12345	60.12
TMAX	integer	sd	.all	12.67805	12.68
TMAX	integer	p0	.all	18.00000	18
TMAX	integer	p25	.all	50.00000	50
TMAX	integer	p50	.all	59.00000	59
TMAX	integer	p75	.all	69.00000	69
TMAX	integer	p100	.all	103.00000	103
TMAX	integer	hist	.all	NA	▁▁▅▇▆▃▂▁

#look at the sumamry statistics of min temperature
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(TMIN))

variable	type	stat	level	value	formatted
TMIN	integer	missing	.all	0.000000	0
TMIN	integer	complete	.all	13860.000000	13860
TMIN	integer	n	.all	13860.000000	13860
TMIN	integer	mean	.all	45.243362	45.24
TMIN	integer	sd	.all	8.771334	8.77
TMIN	integer	p0	.all	7.000000	7
TMIN	integer	p25	.all	39.000000	39
TMIN	integer	p50	.all	45.000000	45
TMIN	integer	p75	.all	52.000000	52
TMIN	integer	p100	.all	71.000000	71
TMIN	integer	hist	.all	NA	▁▁▂▆▇▇▃▁

#look at the sumamry statistics of precipitation
kable(AllWeather %>%
    select_if(is.numeric) %>%
    skimr::skim(PRCP))

variable	type	stat	level	value	formatted
PRCP	numeric	missing	.all	3.000000e+00	3
PRCP	numeric	complete	.all	1.385700e+04	13857
PRCP	numeric	n	.all	1.386000e+04	13860
PRCP	numeric	mean	.all	1.051519e-01	0.11
PRCP	numeric	sd	.all	2.436655e-01	0.24
PRCP	numeric	p0	.all	0.000000e+00	0
PRCP	numeric	p25	.all	0.000000e+00	0
PRCP	numeric	p50	.all	0.000000e+00	0
PRCP	numeric	p75	.all	1.000000e-01	0.1
PRCP	numeric	p100	.all	5.020000e+00	5.02
PRCP	numeric	hist	.all	NA	▇▁▁▁▁▁▁▁

#how many NA's are in the new table
sum(is.na(AllWeather))

## [1] 6

From the numberic summary statistics, we can see that AQI, TMAX, TMIN, and PCRP have little or no missing data, reasonable mins and maxs, and the histograms show that the data is normally distributed.

Visualizing the Data

Now that we have our two datasets joined together, let’s take a look at the max and min temp ranges for each year with air quality.

#plot tmin and tmax against date - color green for good days and red for bad
AllWeather %>% 
    mutate(year = year(DATE), day_of_year = yday(DATE) ) %>%
    ggplot(aes(x=day_of_year, ymin=TMIN, ymax=TMAX, colour=AQI)) +
    geom_linerange(size=0.5) +
    scale_colour_gradient2(low = "green", mid="yellow", high = "red", midpoint = 100) +
    # 125 is the midpoint of the "Unhealthy for Sensitive Groups" level
    # 100 is the transition point from "Moderate" to "Unhealthy for Sensitive Groups"
    facet_grid( year ~ .) +
    xlab("Day of Year") + ylab("Temp (F, min to max)") +
    ggtitle("Maximum and Minimum Temperature with Air Quality from 1980 to 2017") +
    theme_minimal()

From the plots we can see that air quality tends to be an issue in the summer time. The temperature range is pretty consistent throughout the year.

And now the precipitation by day for each year.

#plot precipation by year
AllWeather %>% 
    mutate(year = year(DATE), day_of_year = yday(DATE) ) %>%
    ggplot(aes(x=day_of_year, y=PRCP)) +
    geom_bar(stat="identity") +
    facet_grid( year ~ .) +
    xlab("Day of Year") + ylab("Precipitation (inches)") +
    coord_cartesian(ylim = c(0, 1)) +
    scale_y_continuous(breaks=c(0,0.5,1)) +
    ggtitle("Precipitation from 1980 to 2017") +
    theme_classic()

There are definite gaps in the precipitation each year which makes sense as the summers tend to be very dry. It is interesting to see how the dry weather gaps do not occur consistently at the same time of year.

Perfect Weather

So, now that we have visualized the weather and air quality in Seattle, we can begin to define what perfect weather is. We can select a “perfect” temperature range, let’s say only the days with TMIN and TMAX between 60 and 80 degrees F, and also days with no rain. We can assume that given this temperature range, the precipitation that falls will be rain rather than snow. One approach to selecting would be to filter the dataset for RAIN being FALSE, but this does not allow for any precipitation at all. Instead,let’s consider the PCRP variable which is inches of rain measured in a day. We can filter for precipitation less than a certain amount of rain in inches to account for the common occurence of rain over night and this approach will allow for sensitivity testing of edge perfect days. The data may be sensitive close to the ranges that we select, so we will want to examine the data near our range to see if there is a great difference in perfect days.

Sensitivity Testing for AQI and Temperature Range

#let's first look at the number of days in the 60-80 degree range
SeattleWeather_TempRange6080 <- AllWeather %>%
    filter(TMAX <= 80 & TMIN >= 60)
count(SeattleWeather_TempRange6080)

## # A tibble: 1 x 1
##       n
##   <int>
## 1   123

#now let's widen the range to  55-85 degree range
SeattleWeather_TempRange5585 <- AllWeather %>%
    filter(TMAX <= 85 & TMIN >= 55)
count(SeattleWeather_TempRange5585)

## # A tibble: 1 x 1
##       n
##   <int>
## 1  1935

Temp Range (F)	Days within Temp Range	Percentage
60 - 80	123	0.89%
55 - 85	1935	13.96%

There’s a vast difference in the percentage of days between the temp range of 60 and 80 which is 0.89%, and between 55 and 85 which is 13.96%. We will go with the later because some people might be a little less sensitive to the temperature range and we can consider more possible perfect days.

Filtering for precipitation range:

#total number of days in the data set
nrow(AllWeather)

## [1] 13860

###how many days with rain less than 0.01
SeattleWeather_noRain0.01 <- AllWeather %>%
    filter(PRCP <= 0.01)
count(SeattleWeather_noRain0.01)

## # A tibble: 1 x 1
##       n
##   <int>
## 1  8553

###how many days with rain less than 0.01
SeattleWeather_noRain0.03 <- AllWeather %>%
    filter(PRCP <= 0.03)
count(SeattleWeather_noRain0.03)

## # A tibble: 1 x 1
##       n
##   <int>
## 1  9175

###how many days with rain less than 0.05
SeattleWeather_noRain0.05 <- AllWeather %>%
    filter(PRCP <= 0.05)
count(SeattleWeather_noRain0.05)

## # A tibble: 1 x 1
##       n
##   <int>
## 1  9652

###or days with rain less than 0.1
SeattleWeather_noRain0.1 <- AllWeather %>%
    filter(PRCP <= 0.1)
count(SeattleWeather_noRain0.1)

## # A tibble: 1 x 1
##       n
##   <int>
## 1 10495

Rain Amount(In)	Days with Rain	Percentage
0.01	8553	61.71%
0.03	9175	66.2%
0.05	9652	69.64%
0.10	10495	75.72%

The percentage of days with less than or equal to 0.01 inches of precipitation is 61.71%, 0.05 inches of precipitation is 69.64% while 0.1 is 75.72%. There is much less sensitivity to days within the precipitation range, so we will select 0.05 inches.

Next, we need to choose only the dates where the air quality is listed as Moderate or Good. We can use the Date and Category columns to filter for the moderate to good air quality days.

# Create a new feature on AllWeather if the day is "perfect" or not
AllWeather <- AllWeather %>%
    mutate(PerfectDay = (TMAX <= 85 & TMIN >= 55 & PRCP <= 0.05 & AQI <= 100))
#filter the complete weather data set 
PerfectWeather <- AllWeather %>% filter(PerfectDay)

str(PerfectWeather)

## Classes 'tbl_df', 'tbl' and 'data.frame':    1534 obs. of  15 variables:
##  $ DATE                     : Date, format: "1980-07-10" "1980-07-14" ...
##  $ PRCP                     : num  0 0.03 0 0.01 0 0 0 0 0 0 ...
##  $ TMAX                     : int  65 68 74 68 68 78 85 76 69 78 ...
##  $ TMIN                     : int  55 55 55 56 57 60 60 59 57 56 ...
##  $ RAIN                     : logi  FALSE TRUE FALSE TRUE FALSE FALSE ...
##  $ State Name               : chr  "Washington" "Washington" "Washington" "Washington" ...
##  $ county Name              : chr  "King" "King" "King" "King" ...
##  $ State Code               : int  53 53 53 53 53 53 53 53 53 53 ...
##  $ County Code              : chr  "033" "033" "033" "033" ...
##  $ AQI                      : int  56 64 81 64 94 94 75 38 69 56 ...
##  $ Category                 : chr  "Moderate" "Moderate" "Moderate" "Moderate" ...
##  $ Defining Parameter       : chr  "SO2" "CO" "SO2" "CO" ...
##  $ Defining Site            : chr  "53-033-0057" "53-033-0062" "53-033-0057" "53-033-0051" ...
##  $ Number of Sites Reporting: int  11 9 10 9 12 12 10 11 11 11 ...
##  $ PerfectDay               : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

Now we have one table containing 1534 days where the weather falls in our defined temperature range with little to no rain and moderate or good air quality.

Let’s take a look at the distribution of the perfect weather days.

#plot tmin and tmax against date - color green for good days and red for bad
PerfectWeather %>% 
    mutate(year = year(DATE), day_of_year = yday(DATE) ) %>%
    ggplot(aes(x=day_of_year, ymin=TMIN, ymax=TMAX)) +
    geom_linerange(size=0.5) +
    facet_grid( year ~ .) +
    xlab("Day of Year") + ylab("Temp (F, min to max)") + 
    ggtitle("Distribution of Perfect Weather Days from 1980 to 2017") +
    theme_minimal()

We see right away that 1980 and 1981 have fewer perfect days. Everyone in the Pacific Northwest recalls the events of May 18, 1980 when Mt. St. Helens erupted causing extremely poor air quality and possibly affecting the weather of those summers.

Statistical Analysis- Hypothesis Test

Our null hypothesis is that the number of perfect days per year is not changing over time. We can also write this out in an equation form:

H0 = there is no difference in the number of perfect days over time. H1 = there is a difference in the number of perfect days over time.

We can use a linear model to evaluate whether there is a linear relationship between the number of perfect days per as a function of time. The equation for a linear model is \(y = {\beta}_0 + {\beta}_1*x\).

#summarise the number of perfect days by year
PerfectWeatherYears <- PerfectWeather %>%
    group_by( year=year(DATE) ) %>%
    count()
#linear regression model
weather_model <- lm(n ~ year, PerfectWeatherYears)
summary(weather_model)

## 
## Call:
## lm(formula = n ~ year, data = PerfectWeatherYears)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5040  -7.0110   0.5756   8.6573  20.2064 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1547.2353   314.8649  -4.914 1.95e-05 ***
## year            0.7944     0.1575   5.042 1.32e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.65 on 36 degrees of freedom
## Multiple R-squared:  0.4139, Adjusted R-squared:  0.3976 
## F-statistic: 25.42 on 1 and 36 DF,  p-value: 1.32e-05

#plot the number of perfect days as a function of time
plot(PerfectWeatherYears$n ~ PerfectWeatherYears$year, xlab = "Year", ylab = "Perfect Days / Year")
abline(weather_model)

#summary plot the model
plot(weather_model)

#histogram of the residuals
hist(weather_model$residuals)

Conditions for Regression:

The residuals of the model are nearly normal: we can see from the histogram of the residuals and Normal QQ plot shows that that they are nearly normally distributed.
The variability of the residuals is nearly constant: the Residuals vs. Fitted plot shows that the variability is constant.
The residuals are independent: the Residuals vs. Fitted plot also shows that the residuals are independent.
Each variable is linearly related to the outcome: the Perfect Days per year plotted against Year shows that they are linearly related.

So, we can see from the model coefficients that the number of perfect days is increasing by 0.79 days per year. The p-value for the model 1.310^{-5} or very close to zero and we can therefore reject our null hypothesis that the number of perfect weather days is not changing over time. The weather in the Pacific Northwest does appear to be changing over time with the number of days which fall into our definition of perfect weather increasing at 0.79 days per year. Essentially, the PNW’s reputation as a gloomy and dreary place to live is false as the weather is trending better every year.

Weekends

Finally, let’s compare the data over the range of years to find how many of these perfect days coincide with weekends or when festivals and events are happening in the area?

How many perfect days fall on weekends? It’s a common myth here in the Seattle area that you know it’s the weekend because it starts raining. Is it true that it rains more on the weekends?

#filter perfect weather dates for weekend dates
temp_filterweekend <- PerfectWeather %>% filter( is.weekend(DATE) )
#number of weekend days that are pefect and the with proportion for perfect weather dates
nrow(temp_filterweekend)

## [1] 428

nrow(temp_filterweekend)/nrow(PerfectWeather)

## [1] 0.2790091

The number of perfect weather days which fall on weekends is 428 which is 27.9% of the perfect weather days. Obviously, the old saying that you know it’s the weekend because it starts raining is false as well.

Events

We need data on events in the area which we can obtain from a local events website:

Seattle Special Events Data

#read in the url and get data for each month
baseurl <- 'https://www.events12.com/seattle/'
months <- c('january','february','march','april','may','june',
            'july','august','september','october','november','december')
dates <- c()
#get the dates for events from each month
for( month in months ) {
    url <- paste0(baseurl,month,'/')
    events_data <- read_html(url)
    datestext <- events_data %>% html_nodes(".date") %>% html_text()
    dates <- c(dates, datestext)
}
#create a dataframe of all the dates
ds <- c()
for( datetxt in dates ) {
    date <- AsDate(datetxt, on.parse.failure="warn")
    ds <- c(ds,as_date(date))
}
#create a dataframe and transform the dates to the same format as PerfectWeather
dates_df = as.data.frame(ds)
dates_df %<>% transmute(date = as_date(ds)) %>% distinct() %>%
    filter(!is.na(date)) %>%
    filter(year(date) > 2017 & year(date) < 2020) %>%
    mutate(month = month(date), day = day(date), year = 2017 ) %>%
    mutate(date = mdy( paste0(month,"-",day,"-",year) ) ) 

#how many event dates are there
nrow(dates_df)

## [1] 127

#create an events table of events dates that fall on perfect days 
Seattle_events <- inner_join(PerfectWeather %>% filter(DATE >= "2017-01-01")
                             , dates_df, by = c("DATE" = "date"))
#number of events on perfect days
nrow(Seattle_events)

## [1] 22

kable(head(Seattle_events$DATE))

x
2017-06-01
2017-07-13
2017-07-14
2017-07-18
2017-07-20
2017-07-21

kable(tail(Seattle_events$DATE))

x
2017-08-18
2017-08-19
2017-08-30
2017-08-31
2017-09-08
2017-09-09

This data set consists of 127 observations of 4 different variables including date, month, day, year. The dates were reformatted in order to join the events dates with our weather data. Obtaining data for prior events proved to be difficult and so we are using the 2017 perfect weather dates to comare to the 2018 event dates. We find the 22 events dates coincide with perfect weather dates and that these dates occur between June 1 and September 30. This is what we would expect as most perfect days occur at or near summer.

Conclusion

Though there a many days in Seattle which are gloomy and dreary, the number of perfect weather days is increasing every year. These perfect weather days tend to occur more in the summer time when the rain dries up; also, thus far the air quality issues of summer wildfires have not greatly impacted the number of perfect summer days. The weekends also do not tend to be more rainy than weekdays in the summer time. So everyone who doubts that Seattle is a beautiful place to live because of the rainy weather should know that with every coming year there are more perfect days on average. This analysis may partially explain, along with economic factors not analyzed here, why the Seattle-metro area is a relocation destination as well as a summer time outdoor activity mecca.