library(tidyverse) # Loading tidyverse library
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
bike <- read_csv("bikesharing.csv") #Loading data in
## Rows: 731 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): season, month, weekday, weather
## dbl (7): year, temperature_F, casual, registered, count, humidity, windspeed
## lgl (2): holiday, workingday
## date (2): date, date_noyear
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(bike, aes(weather, color = month)) +
geom_bar() +
facet_wrap(~year) +
labs(
title = "Bar Chart of Weather Patterns for 2011 and 2012",
x = "Weather Category",
y = "Count"
) +
theme_linedraw()
ggplot(bike, aes(holiday, color = weekday)) +
geom_bar() +
facet_wrap(~year) +
labs(
title = "Bar Chart of Number of Holidays for 2011 and 2012",
x = "Holiday",
y = "Count"
) +
theme_linedraw()
ggplot(bike, aes(count, color = month)) +
geom_histogram() +
labs(
title = "Normal Distribution of Number of Casual Riders on a Day",
x = "Number of Casual Riders on a Day",
y = "Count"
) +
theme_linedraw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(bike, aes(registered, color = month)) +
geom_histogram() +
labs(
title = "Normal Distribution of Number of Registered Riders on a Day",
x = "Number of Registered Riders on a Day",
y = "Count"
) +
theme_linedraw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(bike, aes(count, registered, color = season, size = temperature_F)) +
geom_point(alpha = 0.7) +
labs(
title = "Count of Casual Riders vs. Registered Riders in 2011 and 2012",
x = "Count of Casual Riders",
y = "Count of Registered Riders"
) +
theme_linedraw()
ggplot(bike, aes(humidity, windspeed, color = season, size = count)) +
geom_point(alpha = 0.7) +
labs(
title = "Humidity and Windspeed During Various Seasons",
x = "Humidity",
y = "Windspeed"
) +
theme_linedraw()
From the charts above, we can see that casual bikers enjoy days that are typically in the fall, spring, and summer seasons. In addition to this, when the humidity is higher, there tend to be more casual bikers. However, when the wind speed enters within a high range of 30+ miles per hour, we see that casual bikers are not present as much. We can also see that among all bikers, there tend to be more casual bikers than registered bikers.
bike %>%
mutate(bike, casual_weather = count / temperature_F) %>%
ggplot(aes(date, casual_weather)) +
geom_point() +
labs(
title = "Date vs. Casual Bike Riders in 2011 and 2012",
x = "Date",
y = "Casual Bike Riders"
)
ggplot(bike, aes(humidity, registered, color = season)) +
geom_point() +
labs(
title = "Humidity vs. Registered Bikers in 2011/2012",
x = "Humidity",
y = "Registered Bikers"
)
ggplot(bike, aes(weather, fill=weekday)) +
geom_bar(position="dodge") +
labs(title = "Weather vs. Weekday")
ggplot(bike, aes(weekday, fill=weather)) +
geom_bar(position="dodge") +
labs(title = "Weekday vs. Weather")
The plot from question 3 is more informative as it demonstrates the general trend between the variants of weather. However, if we are searching for a clear distinction between each day, then the second chart from question 4 is better at conveying the difference. Depending on what the context is and the information is needed to be conveyed, both charts demonstrate informative characteristics in different ways.
bike %>%
filter(weather == "clear") %>%
mutate(casual_weather = count / temperature_F) %>%
ggplot(aes(date, casual_weather, color = workingday)) +
geom_point() +
geom_smooth(method="lm")
## `geom_smooth()` using formula = 'y ~ x'
bike %>%
ggplot(aes(month, casual, fill = month)) +
geom_violin()
Answer: The visualization that I prefer is the histogram because it appears to demonstrate the difference between each type the most. However, the density diagram also provides a good overview of the shape the data takes. The freqploy diagram is difficult to read from the overlapping lines, which is why I would select this as my lowest preference.
ggplot(bike, aes(windspeed, color=season)) +
geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(bike, aes(windspeed, color=season)) +
geom_density()
ggplot(bike, aes(windspeed, fill=season)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
fall_spring_bikes <- bike %>%
filter(season == "fall" | season == "spring")
ggplot(fall_spring_bikes, aes(windspeed, color=season)) +
geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(fall_spring_bikes, aes(windspeed, color=season)) +
geom_density()
ggplot(fall_spring_bikes, aes(windspeed, fill=season)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
bike %>%
ggplot(aes(humidity, month, fill=season)) +
geom_boxplot()
bike %>%
ggplot(aes(humidity, month, fill=workingday)) +
geom_boxplot()