In a world where any information is available online, for The Weather
Channel, a company that literally has “Channel” in it’s title, are
people actually consuming weather information using The Weather
Channel’s TV channel, or TV in general for that matter?
Where People Go To Check The Weather
takes a deep dive into
how people are checking the weather in the year 2015.
Link: https://fivethirtyeight.com/features/weather-forecast-news-app-habits/
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Let’s load data into a dataframe and take a look at the data.
weather_check_original <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/fcf572c4fba05a42f1f34f415bd5e6dc389efd68/weather-check/weather-check.csv"))
head(weather_check_original)
## RespondentID Do.you.typically.check.a.daily.weather.report.
## 1 3887201482 Yes
## 2 3887159451 Yes
## 3 3887152228 Yes
## 4 3887145426 Yes
## 5 3887021873 Yes
## 6 3886937140 Yes
## How.do.you.typically.check.the.weather.
## 1 The default weather app on your phone
## 2 The default weather app on your phone
## 3 The default weather app on your phone
## 4 The default weather app on your phone
## 5 A specific website or app (please provide the answer)
## 6 A specific website or app (please provide the answer)
## A.specific.website.or.app..please.provide.the.answer.
## 1 -
## 2 -
## 3 -
## 4 -
## 5 Iphone app
## 6 AccuWeather App
## If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device.
## 1 Very likely
## 2 Very likely
## 3 Very likely
## 4 Somewhat likely
## 5 Very likely
## 6 Somewhat likely
## Age What.is.your.gender.
## 1 30 - 44 Male
## 2 18 - 29 Male
## 3 30 - 44 Male
## 4 30 - 44 Male
## 5 30 - 44 Male
## 6 18 - 29 Male
## How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.
## 1 $50,000 to $74,999
## 2 Prefer not to answer
## 3 $100,000 to $124,999
## 4 Prefer not to answer
## 5 $150,000 to $174,999
## 6 $100,000 to $124,999
## US.Region
## 1 South Atlantic
## 2 -
## 3 Middle Atlantic
## 4 -
## 5 Middle Atlantic
## 6 West South Central
We now have a dataframe, but the column names are not only very wordy
and long, but also contain .
instead of . So
let’s rename the column names to make them clearer and simpler.
names(weather_check_original) = c("respondent_id","checks_weather_report_daily","weather_report_source","specific_website_or_app","likelihood_to_use_smartwatch","age","gender","total_household_income_from_previous_year","us_region")
head(weather_check_original, 10)
## respondent_id checks_weather_report_daily
## 1 3887201482 Yes
## 2 3887159451 Yes
## 3 3887152228 Yes
## 4 3887145426 Yes
## 5 3887021873 Yes
## 6 3886937140 Yes
## 7 3886923931 Yes
## 8 3886913587 Yes
## 9 3886889048 Yes
## 10 3886848806 Yes
## weather_report_source
## 1 The default weather app on your phone
## 2 The default weather app on your phone
## 3 The default weather app on your phone
## 4 The default weather app on your phone
## 5 A specific website or app (please provide the answer)
## 6 A specific website or app (please provide the answer)
## 7 The Weather Channel
## 8 -
## 9 The Weather Channel
## 10 The default weather app on your phone
## specific_website_or_app likelihood_to_use_smartwatch age gender
## 1 - Very likely 30 - 44 Male
## 2 - Very likely 18 - 29 Male
## 3 - Very likely 30 - 44 Male
## 4 - Somewhat likely 30 - 44 Male
## 5 Iphone app Very likely 30 - 44 Male
## 6 AccuWeather App Somewhat likely 18 - 29 Male
## 7 - Very unlikely 30 - 44 Male
## 8 - - - -
## 9 - Very likely 30 - 44 Male
## 10 - Very likely 30 - 44 Male
## total_household_income_from_previous_year us_region
## 1 $50,000 to $74,999 South Atlantic
## 2 Prefer not to answer -
## 3 $100,000 to $124,999 Middle Atlantic
## 4 Prefer not to answer -
## 5 $150,000 to $174,999 Middle Atlantic
## 6 $100,000 to $124,999 West South Central
## 7 $25,000 to $49,999 West South Central
## 8 - -
## 9 Prefer not to answer Pacific
## 10 $150,000 to $174,999 West North Central
Let’s first validate and recreate some of the article’s findings.
The article first shows where people go to check the weather. Let’s create a pie chart to visually show this data via a function (as we’ll probably be making a few pie charts in this lab, and we don’t need to repeat a bunch of code).
weather_report_source <- table(weather_check_original$weather_report_source)
weather_report_source_labels <- c("-", "A specific website or app", "Internet search", "Local TV News", "Newsletter", "Newspaper", "Radio weather", "The default weather app on your phone", "The Weather Channel")
create_pct_pie_chart <- function(table, labels, title) {
percentage <- round(table/sum(table)*100)
pie(percentage, labels = paste(labels, percentage,"%", sep=" "), main = title) # include the % value in the label
}
weather_report_source_graph_title = "Where People Visit to Check the Weather"
create_pct_pie_chart(weather_report_source, weather_report_source_labels, weather_report_source_graph_title)
As you can see, this matches the chart listed in the article. As the article mentions, The Weather Channel does surprisingly better than one might expect with a value of 15%!
Now let’s focus on daily checkers vs non-daily checkers. For daily checkers:
daily_checkers <- subset(weather_check_original, weather_check_original$checks_weather_report_daily == "Yes")
weather_report_source_daily_checkers <- table(daily_checkers$weather_report_source)
weather_report_source_daily_checkers_graph_title = "Where Daily Checkers Visit to Check the Weather"
create_pct_pie_chart(weather_report_source_daily_checkers, weather_report_source_labels, weather_report_source_daily_checkers_graph_title)
Let’s do the same for non-daily checkers:
non_daily_checkers <- subset(weather_check_original, weather_check_original$checks_weather_report_daily == "No")
weather_report_source_non_daily_checkers <- table(non_daily_checkers$weather_report_source)
weather_report_source_non_daily_checkers_graph_title = "Where Non-Daily Checkers Visit to Check the Weather"
create_pct_pie_chart(weather_report_source_non_daily_checkers, weather_report_source_labels, weather_report_source_non_daily_checkers_graph_title)
Now let’s find some trends that the article did not look into. One category the article mentions that could be interesting is a “specific website or app”. For this category, users were able to input whatever they’d like, so we should probably convert all values to lower case first due to the variability of inputs. This will give us slightly more accurate results. Since there could many different results, let’s also just look at the top 30 results.
weather_specific_website_or_app <- table(tolower(weather_check_original$specific_website_or_app)) %>%
as.data.frame() %>%
arrange(desc(Freq))
weather_specific_website_or_app_filtered <- subset(weather_specific_website_or_app, weather_specific_website_or_app$Var1 != "-") # skip blank responses
head(weather_specific_website_or_app_filtered, 30)
## Var1 Freq
## 2 accuweather 17
## 3 weather.com 13
## 4 weatherbug 10
## 5 weather bug 8
## 6 the weather channel app 7
## 7 weather underground 6
## 8 weather channel app 5
## 9 the weather channel 4
## 10 weather channel 4
## 11 intellicast 3
## 12 iphone weather app 3
## 13 iphone 2
## 14 iphone app 2
## 15 national weather service site 2
## 16 the weather channel app on my phone 2
## 17 weather app 2
## 18 weather bug app 2
## 19 yahoo weather 2
## 20 weather underground 1
## 21 1 weather 1
## 22 1weather 1
## 23 accuweather app 1
## 24 accuweather or weather underground 1
## 25 aol 1
## 26 app on iphone 1
## 27 apple weater 1
## 28 apple weather app 1
## 29 apple-provided site 1
## 30 basic weather app on my iphone 1
## 31 bing 1
Just looking at the top 30 responses (not counting blank responses), what’s interesting here is that The Weather Channel (as a company, not the actual TV channel) is not even the top input. The top input is Accuweather. The Weather Channel does take up 2 of the top 5 weather sources though.
Another possible interesting and more relevant category is specific environment - such as desktop, mobile, or CTV where each device is broken down by OS. That way we can get even more specific about where people are checking the weather. Do iOS users use the app more than Android? What could this tell us about app design?
Now let’s take a look at gender. Let’s first see if there’s a gender more likely to check the weather daily:
daily_checkers_gender <- table(daily_checkers$gender)
daily_checkers_gender_labels <- c("-", "Female", "Male")
daily_checkers_gender_graph_title = "Daily Checkers Broken Down By Gender"
create_pct_pie_chart(daily_checkers_gender, daily_checkers_gender_labels, daily_checkers_gender_graph_title)
As you can see, Females are only slightly more likely to check the weather on a daily basis.
Now let’s see if there’s a difference in the weather source for gender. Let’s try a barplot this time:
females <- subset(weather_check_original, weather_check_original$gender == "Female")
weather_report_source_females <- table(females$weather_report_source)
weather_report_source_females_ratios <- weather_report_source_females/sum(weather_report_source_females)
barplot(weather_report_source_females_ratios, las=2, ylim=c(0,0.25), main = "Where Females Visit to Check the Weather")
Now let’s check Males:
males <- subset(weather_check_original, weather_check_original$gender == "Male")
weather_report_source_males <- table(males$weather_report_source)
weather_report_source_males_ratios <- weather_report_source_males/sum(weather_report_source_males)
barplot(weather_report_source_males_ratios, legend=TRUE, args.legend = list(x = 9, y = -1.9, horiz=T), las=2, ylim=c(0,0.25), main = "Where Males Visit to Check the Weather")
These graphs are pretty similar which is not too surprising. What is interesting is that radio is greater than newspaper for males, whereas females are more likely to check the newspaper than the radio.
Some conclusions shown in this lab report:
To further verify the other trends the article mentions, I would continue filtering the data, and create more graphs.
Other Trends I would look into (not mentioned in the article):