This data originates from “Where People Go To Check The Weather”. The source of the data is a Survey Monkey Audience poll commissioned by FiveThirtyEight and conducted from April 6 to April 10, 2015.
theLink <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/weather-check/weather-check.csv"
# load data into data frame
df_weathercheck <- read.csv(file=theLink, header = TRUE, sep = ",")
# display column names
colnames(df_weathercheck)
## [1] "RespondentID"
## [2] "Do.you.typically.check.a.daily.weather.report."
## [3] "How.do.you.typically.check.the.weather."
## [4] "A.specific.website.or.app..please.provide.the.answer."
## [5] "If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device."
## [6] "Age"
## [7] "What.is.your.gender."
## [8] "How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year."
## [9] "US.Region"
# display header rows
head(df_weathercheck)
## RespondentID Do.you.typically.check.a.daily.weather.report.
## 1 3887201482 Yes
## 2 3887159451 Yes
## 3 3887152228 Yes
## 4 3887145426 Yes
## 5 3887021873 Yes
## 6 3886937140 Yes
## How.do.you.typically.check.the.weather.
## 1 The default weather app on your phone
## 2 The default weather app on your phone
## 3 The default weather app on your phone
## 4 The default weather app on your phone
## 5 A specific website or app (please provide the answer)
## 6 A specific website or app (please provide the answer)
## A.specific.website.or.app..please.provide.the.answer.
## 1 -
## 2 -
## 3 -
## 4 -
## 5 Iphone app
## 6 AccuWeather App
## If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device.
## 1 Very likely
## 2 Very likely
## 3 Very likely
## 4 Somewhat likely
## 5 Very likely
## 6 Somewhat likely
## Age What.is.your.gender.
## 1 30 - 44 Male
## 2 18 - 29 Male
## 3 30 - 44 Male
## 4 30 - 44 Male
## 5 30 - 44 Male
## 6 18 - 29 Male
## How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.
## 1 $50,000 to $74,999
## 2 Prefer not to answer
## 3 $100,000 to $124,999
## 4 Prefer not to answer
## 5 $150,000 to $174,999
## 6 $100,000 to $124,999
## US.Region
## 1 South Atlantic
## 2 -
## 3 Middle Atlantic
## 4 -
## 5 Middle Atlantic
## 6 West South Central
# total num of rows
nrow(df_weathercheck)
## [1] 928
# total num of columns
ncol(df_weathercheck)
## [1] 9
colnames(df_weathercheck)[2] <- "weather_chk"
colnames(df_weathercheck)[3] <- "weather_chk_src"
colnames(df_weathercheck)[4] <- "specific_src"
colnames(df_weathercheck)[5] <- "weather_chk_freq"
colnames(df_weathercheck)[7] <- "gender"
colnames(df_weathercheck)[8] <- "household_income"
colnames(df_weathercheck)[9] <- "us_region"
head(df_weathercheck)
## RespondentID weather_chk
## 1 3887201482 Yes
## 2 3887159451 Yes
## 3 3887152228 Yes
## 4 3887145426 Yes
## 5 3887021873 Yes
## 6 3886937140 Yes
## weather_chk_src specific_src
## 1 The default weather app on your phone -
## 2 The default weather app on your phone -
## 3 The default weather app on your phone -
## 4 The default weather app on your phone -
## 5 A specific website or app (please provide the answer) Iphone app
## 6 A specific website or app (please provide the answer) AccuWeather App
## weather_chk_freq Age gender household_income us_region
## 1 Very likely 30 - 44 Male $50,000 to $74,999 South Atlantic
## 2 Very likely 18 - 29 Male Prefer not to answer -
## 3 Very likely 30 - 44 Male $100,000 to $124,999 Middle Atlantic
## 4 Somewhat likely 30 - 44 Male Prefer not to answer -
## 5 Very likely 30 - 44 Male $150,000 to $174,999 Middle Atlantic
## 6 Somewhat likely 18 - 29 Male $100,000 to $124,999 West South Central
# display summary
summary(df_weathercheck[,-1])
## weather_chk weather_chk_src
## No :182 The default weather app on your phone :213
## Yes:746 Local TV News :189
## A specific website or app (please provide the answer):175
## The Weather Channel :139
## Internet search :130
## Newspaper : 32
## (Other) : 50
## specific_src weather_chk_freq Age
## - :753 - : 11 - : 12
## Accuweather : 10 Somewhat likely :274 18 - 29:176
## weather.com : 7 Somewhat unlikely: 73 30 - 44:204
## Weather.com : 6 Very likely :362 45 - 59:278
## accuweather : 5 Very unlikely :208 60+ :258
## The Weather Channel app: 5
## (Other) :142
## gender household_income us_region
## - : 12 Prefer not to answer:169 Pacific :185
## Female:527 $25,000 to $49,999 :132 South Atlantic :154
## Male :389 $50,000 to $74,999 :111 East North Central:141
## $100,000 to $124,999:104 Middle Atlantic :104
## $75,000 to $99,999 :104 West South Central: 94
## $10,000 to $24,999 : 81 Mountain : 72
## (Other) :227 (Other) :178
df_weathercheck_subset <- subset(df_weathercheck, select = c(2,3,4,5,6,7))
head(df_weathercheck_subset)
## weather_chk weather_chk_src
## 1 Yes The default weather app on your phone
## 2 Yes The default weather app on your phone
## 3 Yes The default weather app on your phone
## 4 Yes The default weather app on your phone
## 5 Yes A specific website or app (please provide the answer)
## 6 Yes A specific website or app (please provide the answer)
## specific_src weather_chk_freq Age gender
## 1 - Very likely 30 - 44 Male
## 2 - Very likely 18 - 29 Male
## 3 - Very likely 30 - 44 Male
## 4 - Somewhat likely 30 - 44 Male
## 5 Iphone app Very likely 30 - 44 Male
## 6 AccuWeather App Somewhat likely 18 - 29 Male
barplot(table(df_weathercheck_subset$Age, df_weathercheck_subset$weather_chk_freq), beside = TRUE, legend = TRUE)
# pipeline to filter
df_weathercheck_summary <-
subset(df_weathercheck, select = c('weather_chk_freq', 'gender')) %>%
group_by(gender) %>%
filter(!(weather_chk_freq == '-' & gender =='-'))
head(df_weathercheck_summary)
## # A tibble: 6 x 2
## # Groups: gender [1]
## weather_chk_freq gender
## <fct> <fct>
## 1 Very likely Male
## 2 Very likely Male
## 3 Very likely Male
## 4 Somewhat likely Male
## 5 Very likely Male
## 6 Somewhat likely Male
barplot(table(df_weathercheck_summary$weather_chk_freq, df_weathercheck_summary$gender), beside = TRUE, legend = TRUE)
I see there are 928 rows and 9 columns in this dataset. I renamed columns with shorter names for those who are too long. I did subset the data with 6 columns and generated couple of plots. Seeing the summary I conclude below. * More females participated than males. * Most of the participants typically check a daily weather report. * Most of the participants use default weather app on their phone.