Introdution

For this week’s assignment, I selected a dataset from fivethirtyeight to create a programming sample “vignette” which will demonstrates how to use one or more of the TidyVerse packages.

The dataset I chose was about weather checking from the article: Where People Go To Check The Weather by Walt Hickey: https://fivethirtyeight.com/features/weather-forecast-news-app-habits/

Package: readr

This is a tidyverse package that reads in csv. files into R. In this step, I am using readr to pull in the csv. raw file from GitHub.

weather <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/weather-check/weather-check.csv")

glimpse(weather)
## Rows: 928
## Columns: 9
## $ RespondentID                                                                                                                                 <dbl> …
## $ Do.you.typically.check.a.daily.weather.report.                                                                                               <chr> …
## $ How.do.you.typically.check.the.weather.                                                                                                      <chr> …
## $ A.specific.website.or.app..please.provide.the.answer.                                                                                        <chr> …
## $ If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device. <chr> …
## $ Age                                                                                                                                          <chr> …
## $ What.is.your.gender.                                                                                                                         <chr> …
## $ How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.                                                              <chr> …
## $ US.Region                                                                                                                                    <chr> …

This dataset has 928 observations and 9 variables. Each observation is a respondent.

Package: dplyr

Dplyr has many functions, but I will be using it to rename the columns to something shorter and easier to refer to later on and then select to create a ggplot.

weather2 <- rename(weather, respondent_id = RespondentID,
                check_weather = Do.you.typically.check.a.daily.weather.report.,
                how_check_weather = How.do.you.typically.check.the.weather.,
                weather_app_web = A.specific.website.or.app..please.provide.the.answer.,
                likert_check_weather = If.you.had.a.smartwatch..like.the.soon.to.be.released.Apple.Watch...how.likely.or.unlikely.would.you.be.to.check.the.weather.on.that.device.,
       gender = What.is.your.gender.,
       household_income = How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.,
       us_region = US.Region                   
       )

weather2[weather2 == "-"] <- NA
weather3 <- weather2 %>%
  select(respondent_id, how_check_weather)

Package: tidy

Tidy also has many functions but I will be using it to drop NAs.

weather4 <- weather3 %>% drop_na()

Package: dplyr

Here we will go back to dplyr to use group_by and summarise to create a count of the variable: how people check the weather.

weather4$how_check_weather[weather4$how_check_weather == "A specific website or app (please provide the answer)"] <- "Other"

weather5 <- weather4 %>%
   group_by(how_check_weather) %>%
  summarise(count = n())

Package: ggplot2

Ggplot2 has many functions for graphing different types graphs (e.g. histograms, box plots) with titles, labels, colors, and keys.

ggplot(data=weather5, aes(x=how_check_weather, y=count)) +
  geom_bar(stat="identity", fill="darkblue", position = "dodge")+
  ggtitle("How Individuals Typically Check The Weather") +
   ylab("Frequency") + xlab("Method of checking weather") +
       theme(axis.text.x = element_text(angle=20, hjust=1))

Conclusion

Tidyverse is a unique collection of packages useful for cleaning, tidying, manipulation, analysis, and visualization in a more efficient way - shown in my example above using a dataset about checking the weather.

Extended Code by Leslie Tavarez

Nakesha did a great job loading the data, cleaning it, summarizing/visualizing how indiviuals typically check the weather.

First, I will analyze gender vs. weather checking methods.

weather_summary <- weather2 %>%
  group_by(gender, how_check_weather) %>%
  summarise(count = n(), .groups = "drop") %>%
  drop_na(gender, how_check_weather)  # Exclude missing data


ggplot(weather_summary, aes(x = how_check_weather, y = count, fill = gender)) +
  geom_bar(stat = "identity", position = "dodge") +
  ggtitle("How People Check the Weather by Gender") +
  xlab("Method of Checking the Weather") +
  ylab("Number of Responses") +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))

library(plotly) #this will make the chart interactive. 
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p <- ggplot(weather_summary, aes(x = how_check_weather, y = count, fill = gender)) +
  geom_bar(stat = "identity", position = "dodge") +
  ggtitle("Interactive: How People Check the Weather by Gender") +
  xlab("Method of Checking the Weather") +
  ylab("Number of Responses") +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))

ggplotly(p)

The stacked bar chart above illustrates the different weather-checking methods by gender. Both men and women predominantly use weather apps on their cellphones, reflecting convenience and ease of access. However, it is interesting to note that a significant number of respondents from both genders still rely on the local tv news as a weather source. I believe that people may still use local tv news for weather and related insights like traffic patterns.

From the previous bar graph, is seems like there are more female respondents than male. In the next step, I will show weather-checking frequency by gender.

# Summarize data by gender
weather_gender_summary <- weather2 %>%
  group_by(gender) %>%
  summarise(count = n(), .groups = 'drop')

# View the summarized data
print(weather_gender_summary)
## # A tibble: 3 × 2
##   gender count
##   <chr>  <int>
## 1 Female   527
## 2 Male     389
## 3 <NA>      12
# Bar chart to show weather-checking frequency by gender
ggplot(data = weather_gender_summary, aes(x = gender, y = count, fill = gender)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Weather-Checking Frequency by Gender",
    x = "Gender",
    y = "Number of Respondents"
  ) +
  theme_minimal()

As suspected, there are more female respondents than male respondents. This discrepancy could stem from various factors, such as a higher likelihood of survey participation by women, behavioral patterns that align with more frequent weather monitoring, or cultural and lifestyle factors that shape how individuals plan their daily routines.