Gillian McGovern - DATA 607 Assignment 1

Overview

In a world where any information is available online, for The Weather Channel, a company that literally has “Channel” in it’s title, are people actually consuming weather information using The Weather Channel’s TV channel, or TV in general for that matter? Where People Go To Check The Weather takes a deep dive into how people are checking the weather in the year 2015.

Link: https://fivethirtyeight.com/features/weather-forecast-news-app-habits/

Loading the Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading the Data

Let’s load data into a dataframe and take a look at the data.

weather_check_original <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/fcf572c4fba05a42f1f34f415bd5e6dc389efd68/weather-check/weather-check.csv"))
head(weather_check_original)

##   RespondentID Do.you.typically.check.a.daily.weather.report.
## 1   3887201482                                            Yes
## 2   3887159451                                            Yes
## 3   3887152228                                            Yes
## 4   3887145426                                            Yes
## 5   3887021873                                            Yes
## 6   3886937140                                            Yes
##                 How.do.you.typically.check.the.weather.
## 1                 The default weather app on your phone
## 2                 The default weather app on your phone
## 3                 The default weather app on your phone
## 4                 The default weather app on your phone
## 5 A specific website or app (please provide the answer)
## 6 A specific website or app (please provide the answer)
##   A.specific.website.or.app..please.provide.the.answer.
## 1                                                     -
## 2                                                     -
## 3                                                     -
## 4                                                     -
## 5                                            Iphone app
## 6                                       AccuWeather App
##   If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device.
## 1                                                                                                                                  Very likely
## 2                                                                                                                                  Very likely
## 3                                                                                                                                  Very likely
## 4                                                                                                                              Somewhat likely
## 5                                                                                                                                  Very likely
## 6                                                                                                                              Somewhat likely
##       Age What.is.your.gender.
## 1 30 - 44                 Male
## 2 18 - 29                 Male
## 3 30 - 44                 Male
## 4 30 - 44                 Male
## 5 30 - 44                 Male
## 6 18 - 29                 Male
##   How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.
## 1                                                              $50,000 to $74,999
## 2                                                            Prefer not to answer
## 3                                                            $100,000 to $124,999
## 4                                                            Prefer not to answer
## 5                                                            $150,000 to $174,999
## 6                                                            $100,000 to $124,999
##            US.Region
## 1     South Atlantic
## 2                  -
## 3    Middle Atlantic
## 4                  -
## 5    Middle Atlantic
## 6 West South Central

We now have a dataframe, but the column names are not only very wordy and long, but also contain . instead of . So let’s rename the column names to make them clearer and simpler.

names(weather_check_original) = c("respondent_id","checks_weather_report_daily","weather_report_source","specific_website_or_app","likelihood_to_use_smartwatch","age","gender","total_household_income_from_previous_year","us_region")
head(weather_check_original, 10)

##    respondent_id checks_weather_report_daily
## 1     3887201482                         Yes
## 2     3887159451                         Yes
## 3     3887152228                         Yes
## 4     3887145426                         Yes
## 5     3887021873                         Yes
## 6     3886937140                         Yes
## 7     3886923931                         Yes
## 8     3886913587                         Yes
## 9     3886889048                         Yes
## 10    3886848806                         Yes
##                                    weather_report_source
## 1                  The default weather app on your phone
## 2                  The default weather app on your phone
## 3                  The default weather app on your phone
## 4                  The default weather app on your phone
## 5  A specific website or app (please provide the answer)
## 6  A specific website or app (please provide the answer)
## 7                                    The Weather Channel
## 8                                                      -
## 9                                    The Weather Channel
## 10                 The default weather app on your phone
##    specific_website_or_app likelihood_to_use_smartwatch     age gender
## 1                        -                  Very likely 30 - 44   Male
## 2                        -                  Very likely 18 - 29   Male
## 3                        -                  Very likely 30 - 44   Male
## 4                        -              Somewhat likely 30 - 44   Male
## 5               Iphone app                  Very likely 30 - 44   Male
## 6          AccuWeather App              Somewhat likely 18 - 29   Male
## 7                        -                Very unlikely 30 - 44   Male
## 8                        -                            -       -      -
## 9                        -                  Very likely 30 - 44   Male
## 10                       -                  Very likely 30 - 44   Male
##    total_household_income_from_previous_year          us_region
## 1                         $50,000 to $74,999     South Atlantic
## 2                       Prefer not to answer                  -
## 3                       $100,000 to $124,999    Middle Atlantic
## 4                       Prefer not to answer                  -
## 5                       $150,000 to $174,999    Middle Atlantic
## 6                       $100,000 to $124,999 West South Central
## 7                         $25,000 to $49,999 West South Central
## 8                                          -                  -
## 9                       Prefer not to answer            Pacific
## 10                      $150,000 to $174,999 West North Central

Exploratory Analysis

Let’s first validate and recreate some of the article’s findings.

The article first shows where people go to check the weather. Let’s create a pie chart to visually show this data via a function (as we’ll probably be making a few pie charts in this lab, and we don’t need to repeat a bunch of code).

weather_report_source <- table(weather_check_original$weather_report_source)
weather_report_source_labels <- c("-", "A specific website or app", "Internet search", "Local TV News", "Newsletter", "Newspaper", "Radio weather", "The default weather app on your phone", "The Weather Channel")
create_pct_pie_chart <- function(table, labels, title) {
    percentage <- round(table/sum(table)*100)
    pie(percentage, labels = paste(labels, percentage,"%", sep=" "), main = title) # include the % value in the label
}
weather_report_source_graph_title = "Where People Visit to Check the Weather"
create_pct_pie_chart(weather_report_source, weather_report_source_labels, weather_report_source_graph_title)

As you can see, this matches the chart listed in the article. As the article mentions, The Weather Channel does surprisingly better than one might expect with a value of 15%!

Now let’s focus on daily checkers vs non-daily checkers. For daily checkers:

daily_checkers <- subset(weather_check_original, weather_check_original$checks_weather_report_daily == "Yes")
weather_report_source_daily_checkers <- table(daily_checkers$weather_report_source)
weather_report_source_daily_checkers_graph_title = "Where Daily Checkers Visit to Check the Weather"
create_pct_pie_chart(weather_report_source_daily_checkers, weather_report_source_labels, weather_report_source_daily_checkers_graph_title)

Let’s do the same for non-daily checkers:

non_daily_checkers <- subset(weather_check_original, weather_check_original$checks_weather_report_daily == "No")
weather_report_source_non_daily_checkers <- table(non_daily_checkers$weather_report_source)
weather_report_source_non_daily_checkers_graph_title = "Where Non-Daily Checkers Visit to Check the Weather"
create_pct_pie_chart(weather_report_source_non_daily_checkers, weather_report_source_labels, weather_report_source_non_daily_checkers_graph_title)

Now let’s find some trends that the article did not look into. One category the article mentions that could be interesting is a “specific website or app”. For this category, users were able to input whatever they’d like, so we should probably convert all values to lower case first due to the variability of inputs. This will give us slightly more accurate results. Since there could many different results, let’s also just look at the top 30 results.

weather_specific_website_or_app <- table(tolower(weather_check_original$specific_website_or_app)) %>% 
        as.data.frame() %>% 
        arrange(desc(Freq))
weather_specific_website_or_app_filtered <- subset(weather_specific_website_or_app, weather_specific_website_or_app$Var1 != "-") # skip blank responses
head(weather_specific_website_or_app_filtered, 30)

##                                   Var1 Freq
## 2                          accuweather   17
## 3                          weather.com   13
## 4                           weatherbug   10
## 5                          weather bug    8
## 6              the weather channel app    7
## 7                  weather underground    6
## 8                  weather channel app    5
## 9                  the weather channel    4
## 10                     weather channel    4
## 11                         intellicast    3
## 12                  iphone weather app    3
## 13                              iphone    2
## 14                          iphone app    2
## 15       national weather service site    2
## 16 the weather channel app on my phone    2
## 17                         weather app    2
## 18                     weather bug app    2
## 19                       yahoo weather    2
## 20                weather underground     1
## 21                           1 weather    1
## 22                            1weather    1
## 23                     accuweather app    1
## 24  accuweather or weather underground    1
## 25                                 aol    1
## 26                       app on iphone    1
## 27                        apple weater    1
## 28                   apple weather app    1
## 29                 apple-provided site    1
## 30      basic weather app on my iphone    1
## 31                                bing    1

Just looking at the top 30 responses (not counting blank responses), what’s interesting here is that The Weather Channel (as a company, not the actual TV channel) is not even the top input. The top input is Accuweather. The Weather Channel does take up 2 of the top 5 weather sources though.

Another possible interesting and more relevant category is specific environment - such as desktop, mobile, or CTV where each device is broken down by OS. That way we can get even more specific about where people are checking the weather. Do iOS users use the app more than Android? What could this tell us about app design?

Now let’s take a look at gender. Let’s first see if there’s a gender more likely to check the weather daily:

daily_checkers_gender <- table(daily_checkers$gender)
daily_checkers_gender_labels <- c("-", "Female", "Male")
daily_checkers_gender_graph_title = "Daily Checkers Broken Down By Gender"
create_pct_pie_chart(daily_checkers_gender, daily_checkers_gender_labels, daily_checkers_gender_graph_title)

As you can see, Females are only slightly more likely to check the weather on a daily basis.

Now let’s see if there’s a difference in the weather source for gender. Let’s try a barplot this time:

females <- subset(weather_check_original, weather_check_original$gender == "Female")
weather_report_source_females <- table(females$weather_report_source)
weather_report_source_females_ratios <- weather_report_source_females/sum(weather_report_source_females)
barplot(weather_report_source_females_ratios, las=2, ylim=c(0,0.25), main = "Where Females Visit to Check the Weather")

Now let’s check Males:

males <- subset(weather_check_original, weather_check_original$gender == "Male")
weather_report_source_males <- table(males$weather_report_source)
weather_report_source_males_ratios <- weather_report_source_males/sum(weather_report_source_males)
barplot(weather_report_source_males_ratios, legend=TRUE, args.legend = list(x = 9, y = -1.9, horiz=T), las=2, ylim=c(0,0.25), main = "Where Males Visit to Check the Weather")

These graphs are pretty similar which is not too surprising. What is interesting is that radio is greater than newspaper for males, whereas females are more likely to check the newspaper than the radio.

Conclusions

Some conclusions shown in this lab report:

People mainly check the weather via the default app on their phone
Non-daily checkers are more likely to just do an internet search
Daily checkers are more likely to use a specific website or app or The Weather Channel
For an app or a website, Accuweather is the most popular
Females are only slightly more likely to check the weather daily
Males are more likely to use the radio to check the weather than a newspaper, which is the opposite for females

To further verify the other trends the article mentions, I would continue filtering the data, and create more graphs.

Other Trends I would look into (not mentioned in the article):

Plot how househould income relates to weather source
Check if daily checkers are more likely to use a smartwatch
I would also add more fields next time such as specific device type (environment) used and what time of day the weather is checked