The current project is a replication of the study conducted by Walt Hickey. You can see from the linked here. The aim of his project is to investigate how people check the weather. Is it either going through default application from their phone or using other application or other sources to check the current weather. The dataset can be found on this link.
library(dplyr)
library(ggplot2)
library(ggthemes)
data <- read.csv("weather-check.csv")
glimpse(data)
## Observations: 928
## Variables: 9
## $ RespondentID <dbl> ...
## $ Do.you.typically.check.a.daily.weather.report. <fct> ...
## $ How.do.you.typically.check.the.weather. <fct> ...
## $ A.specific.website.or.app..please.provide.the.answer. <fct> ...
## $ If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device. <fct> ...
## $ Age <fct> ...
## $ What.is.your.gender. <fct> ...
## $ How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year. <fct> ...
## $ US.Region <fct> ...
By using glimpse method we were able to get the overview of the dataset. There were 928 observation and 9 variables in the current dataset. The full overview of the dataset can be seen below:
anyNA(data)
## [1] FALSE
As shown above, it shows that the current dataset has no any missing values. So we can move on to next step which is exploratory data analysis.
To explore how people check their weather, we are going to visualize the sources they used to check the weather. It can be done by this:
# propotion of the data
count <- data %>%
group_by(How.do.you.typically.check.the.weather.) %>%
summarise(n = n()) %>%
filter(How.do.you.typically.check.the.weather. != "-") %>%
mutate(prop = round(prop.table(n),2))
#visualize
count %>%
ggplot(aes(x = reorder(How.do.you.typically.check.the.weather.,n), y = prop)) +
geom_bar(stat = 'identity', width = 0.5) +
geom_text(aes(label = prop), stat = 'identity', hjust = -0.1, size = 3.5) +
coord_flip() +
xlab("Method to Check Weather") +
ylab("Propotion") +
ggtitle("How Do People Typically Check the Weather") +
theme_bw() +
theme(plot.title = element_text(size = 16),
axis.title = element_text(size = 12, face = "bold"))
As presented in the figure, It shows that about 23% of the people using default application on their phone. Then they rely on the local news to check their current weather.
Next we are going the visualize which age group and gender are more likely to check the weather.
# select age group
age <- data %>%
group_by(Do.you.typically.check.a.daily.weather.report., Age) %>%
summarise(n = n()) %>%
filter(Age != "-")
#visualize
age %>%
ggplot(aes(x = reorder(Age,n), y = n)) +
geom_bar(stat = 'identity', width = 0.5, fill = "red") +
geom_text(aes(label = n), stat = 'identity', hjust = 0.01, size = 3) +
coord_flip() +
facet_wrap(~Do.you.typically.check.a.daily.weather.report.) +
xlab("Age Group") +
ylab("Total") +
ggtitle("Which Age Group Check Weather the Most") +
theme_bw() +
theme(plot.title = element_text(size = 16),
axis.title = element_text(size = 12, face = "bold"))
As you can see generally people are more likely to check the weather as their age goes up. Why is this occur? Let’s check is there any difference on gender.
# select gender data
gender <- data %>%
group_by(Do.you.typically.check.a.daily.weather.report., What.is.your.gender.) %>%
summarise(n = n()) %>%
filter(What.is.your.gender. != "-")
#visualize
gender %>%
ggplot(aes(x = reorder(Do.you.typically.check.a.daily.weather.report.,n), y = n)) +
geom_bar(stat = 'identity', width = 0.5, fill = "red") +
geom_text(aes(label = n), stat = 'identity', hjust = 0.01, size = 3) +
coord_flip() +
facet_wrap(~What.is.your.gender.) +
xlab("Do you check weather") +
ylab("Total") +
ggtitle("Which Gender Group Check Weather the Most") +
theme_bw() +
theme(plot.title = element_text(size = 16),
axis.title = element_text(size = 12, face = "bold"))
As presented female are more likely to check weather compare to male. Lastly, we are going to see the difference of sources used between male and female.
# propotion of the data
gender_group <- data %>%
group_by(How.do.you.typically.check.the.weather., What.is.your.gender.) %>%
summarise(n = n()) %>%
filter(How.do.you.typically.check.the.weather. != "-") %>%
filter(What.is.your.gender. != "-")
#visualize
gender_group %>%
ggplot(aes(x = reorder(How.do.you.typically.check.the.weather.,n), y = n)) +
geom_bar(stat = 'identity', width = 0.5, fill = "red") +
geom_text(aes(label = n), stat = 'identity', hjust = 0.1, size = 2.5) +
coord_flip() +
facet_wrap(~What.is.your.gender.) +
xlab("Method to Check Weather") +
ylab("Propotion") +
ggtitle("Difference between Male and Female on Checking the Weather") +
theme_bw() +
theme(plot.title = element_text(size = 15),
axis.title = element_text(size = 12, face = "bold"))
There is no surprise both male and female are using default application to check the weather. However, male are more likely to use radio to check weather and female like to use newspaper, this is interesting. Although the number is in small margin.
References (data source)
https://github.com/fivethirtyeight/data/tree/master/weather-check
https://fivethirtyeight.com/features/weather-forecast-news-app-habits/