Sarah Akhtar: Classwork 9

Part 1: Bike Sharing Exploration

[COMPLETED] (CW) Download the bike sharing data [COMPLETED]
[COMPLETED] (CW) Create a new R Markdown file and load the data in [COMPLETED]
(CW) Load the tidyverse library

library(tidyverse) # Loading tidyverse library

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

bike <- read_csv("bikesharing.csv") #Loading data in

## Rows: 731 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): season, month, weekday, weather
## dbl  (7): year, temperature_F, casual, registered, count, humidity, windspeed
## lgl  (2): holiday, workingday
## date (2): date, date_noyear
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

(CW) Create at least one of each of the following types of plots

geom_bar() Graphs

ggplot(bike, aes(weather, color = month)) +
  geom_bar() +
  facet_wrap(~year) +
  labs(
    title = "Bar Chart of Weather Patterns for 2011 and 2012",
    x = "Weather Category",
    y = "Count"
  ) +
  theme_linedraw()

ggplot(bike, aes(holiday, color = weekday)) +
  geom_bar() +
  facet_wrap(~year) +
  labs(
    title = "Bar Chart of Number of Holidays for 2011 and 2012",
    x = "Holiday",
    y = "Count"
  ) +
  theme_linedraw()

geom_histogram() Graph

ggplot(bike, aes(count, color = month)) +
  geom_histogram() +
  labs(
    title = "Normal Distribution of Number of Casual Riders on a Day",
    x = "Number of Casual Riders on a Day",
    y = "Count"
  ) +
  theme_linedraw()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(bike, aes(registered, color = month)) +
  geom_histogram() +
  labs(
    title = "Normal Distribution of Number of Registered Riders on a Day",
    x = "Number of Registered Riders on a Day",
    y = "Count"
  ) +
  theme_linedraw()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

geom_point() Graph

ggplot(bike, aes(count, registered, color = season, size = temperature_F)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Count of Casual Riders vs. Registered Riders in 2011 and 2012",
    x = "Count of Casual Riders",
    y = "Count of Registered Riders"
  ) +
  theme_linedraw()

ggplot(bike, aes(humidity, windspeed, color = season, size = count)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Humidity and Windspeed During Various Seasons",
    x = "Humidity",
    y = "Windspeed"
  ) +
  theme_linedraw()

(CW) Try to find “interesting” relationships between variables

From the charts above, we can see that casual bikers enjoy days that are typically in the fall, spring, and summer seasons. In addition to this, when the humidity is higher, there tend to be more casual bikers. However, when the wind speed enters within a high range of 30+ miles per hour, we see that casual bikers are not present as much. We can also see that among all bikers, there tend to be more casual bikers than registered bikers.

Part 2: Bikesharing Practice 1

(CW) Use mutate() to create a new column (perhaps count/temperature_F) and plot the new column vs date using geom_points()

bike %>%
  mutate(bike, casual_weather = count / temperature_F) %>%
  ggplot(aes(date, casual_weather)) +
  geom_point() +
  labs(
    title = "Date vs. Casual Bike Riders in 2011 and 2012",
    x = "Date",
    y = "Casual Bike Riders"
  )

(CW) Add labels to a plot using labs()

ggplot(bike, aes(humidity, registered, color = season)) +
  geom_point() +
  labs(
    title = "Humidity vs. Registered Bikers in 2011/2012",
    x = "Humidity",
    y = "Registered Bikers"
  )

(CW) Create a bar plot of weather, determining fill by weekday (remember to use position = “dodge”)

ggplot(bike, aes(weather, fill=weekday)) +
  geom_bar(position="dodge") +
  labs(title = "Weather vs. Weekday")

(CW) Create a bar plot of weekday, determining fill by weather (remember to use position = “dodge”)

ggplot(bike, aes(weekday, fill=weather)) +
  geom_bar(position="dodge") +
  labs(title = "Weekday vs. Weather")

(CW) Which of the plots from 5 and 6 is more informative?

The plot from question 3 is more informative as it demonstrates the general trend between the variants of weather. However, if we are searching for a clear distinction between each day, then the second chart from question 4 is better at conveying the difference. Depending on what the context is and the information is needed to be conveyed, both charts demonstrate informative characteristics in different ways.

Filter your observations to only days for which the weather is clear, and then plot date vs your new column.

bike %>%
  filter(weather == "clear") %>%
  mutate(casual_weather = count / temperature_F) %>%
  ggplot(aes(date, casual_weather, color = workingday)) +
  geom_point() +
  geom_smooth(method="lm")

## `geom_smooth()` using formula = 'y ~ x'

If you haven’t already, try using the pipe (%>%) for the exercises in this section.

bike %>%
  ggplot(aes(month, casual, fill = month)) +
  geom_violin()

Part 3: Bikesharing Practice 2

(CW) Create a plot of windspeed by season. Try using geom_freqpoly(), geom_density(), and geom_histogram(). Which visualization do you prefer?

Answer: The visualization that I prefer is the histogram because it appears to demonstrate the difference between each type the most. However, the density diagram also provides a good overview of the shape the data takes. The freqploy diagram is difficult to read from the overlapping lines, which is why I would select this as my lowest preference.

ggplot(bike, aes(windspeed, color=season)) +
  geom_freqpoly()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(bike, aes(windspeed, color=season)) +
  geom_density()

ggplot(bike, aes(windspeed, fill=season)) +
  geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(CW) Try filtering these plots to just fall and spring.

fall_spring_bikes <- bike %>%
  filter(season == "fall" | season == "spring")

ggplot(fall_spring_bikes, aes(windspeed, color=season)) +
  geom_freqpoly()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(fall_spring_bikes, aes(windspeed, color=season)) +
  geom_density()

ggplot(fall_spring_bikes, aes(windspeed, fill=season)) +
  geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(CW) Create a box plot of humidity by month. Choose a variable to set fill to.

bike %>% 
  ggplot(aes(humidity, month, fill=season)) +
  geom_boxplot()

bike %>% 
  ggplot(aes(humidity, month, fill=workingday)) +
  geom_boxplot()