library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
bike <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike.csv')
summary(bike)
## Date Rented.Bike.Count Hour Temperature
## Length:8760 Min. : 0.0 Min. : 0.00 Min. :-17.80
## Class :character 1st Qu.: 191.0 1st Qu.: 5.75 1st Qu.: 3.50
## Mode :character Median : 504.5 Median :11.50 Median : 13.70
## Mean : 704.6 Mean :11.50 Mean : 12.88
## 3rd Qu.:1065.2 3rd Qu.:17.25 3rd Qu.: 22.50
## Max. :3556.0 Max. :23.00 Max. : 39.40
## Humidity Wind.speed Visibility Dew.point.temperature
## Min. : 0.00 Min. :0.000 Min. : 27 Min. :-30.600
## 1st Qu.:42.00 1st Qu.:0.900 1st Qu.: 940 1st Qu.: -4.700
## Median :57.00 Median :1.500 Median :1698 Median : 5.100
## Mean :58.23 Mean :1.725 Mean :1437 Mean : 4.074
## 3rd Qu.:74.00 3rd Qu.:2.300 3rd Qu.:2000 3rd Qu.: 14.800
## Max. :98.00 Max. :7.400 Max. :2000 Max. : 27.200
## Solar.Radiation Rainfall Snowfall Seasons
## Min. :0.0000 Min. : 0.0000 Min. :0.00000 Length:8760
## 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.:0.00000 Class :character
## Median :0.0100 Median : 0.0000 Median :0.00000 Mode :character
## Mean :0.5691 Mean : 0.1487 Mean :0.07507
## 3rd Qu.:0.9300 3rd Qu.: 0.0000 3rd Qu.:0.00000
## Max. :3.5200 Max. :35.0000 Max. :8.80000
## Holiday Functioning.Day
## Length:8760 Length:8760
## Class :character Class :character
## Mode :character Mode :character
##
##
##
The columns which are unclear until documentation is referred is :
The column which is still unclear, even after referring the documentation is “Visibility”. The documentation does not mention the actual meaning of visibility and the units of measurement is unclear. The recorded value is unclear as well.
# Create a summary table to calculate the average visibility for each season
season_summary <- bike %>%
group_by(Seasons) %>%
summarize(AvgVisibility = mean(Visibility))
# Create the bar chart
ggplot(season_summary, aes(x = Seasons, y = AvgVisibility, fill = Seasons)) +
geom_bar(stat = "identity") +
labs(title = "Visibility vs. Seasons",
x = "Seasons",
y = "Average Visibility") +
theme_minimal()
# Create a summary table to calculate the average visibility for each season
result <- aggregate(bike$Rented.Bike.Count,by=list(bike$Seasons), mean)
result
## Group.1 x
## 1 Autumn 819.5980
## 2 Spring 730.0312
## 3 Summer 1034.0734
## 4 Winter 225.5412
# Create the bar chart
barplot(result$x, names.arg=result$Group.1, xlab="Season", ylab="Average Rented Bike count", col=rainbow(6),
main="Season vs Avg Rented Bike Count",border="black")
From the above graphs, we can draw conclusions that visibility does not effect the rented bike count. In the first graph, we have calculated the average of visibility and plotted it against seasons.
It is clear that, on an average, visibility is high in autumn and summer. This didn’t effect the Rented bike count. The rented bike count is high in Summer seasons.