library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
bike <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike.csv')
summary(bike)
##      Date           Rented.Bike.Count      Hour        Temperature    
##  Length:8760        Min.   :   0.0    Min.   : 0.00   Min.   :-17.80  
##  Class :character   1st Qu.: 191.0    1st Qu.: 5.75   1st Qu.:  3.50  
##  Mode  :character   Median : 504.5    Median :11.50   Median : 13.70  
##                     Mean   : 704.6    Mean   :11.50   Mean   : 12.88  
##                     3rd Qu.:1065.2    3rd Qu.:17.25   3rd Qu.: 22.50  
##                     Max.   :3556.0    Max.   :23.00   Max.   : 39.40  
##     Humidity       Wind.speed      Visibility   Dew.point.temperature
##  Min.   : 0.00   Min.   :0.000   Min.   :  27   Min.   :-30.600      
##  1st Qu.:42.00   1st Qu.:0.900   1st Qu.: 940   1st Qu.: -4.700      
##  Median :57.00   Median :1.500   Median :1698   Median :  5.100      
##  Mean   :58.23   Mean   :1.725   Mean   :1437   Mean   :  4.074      
##  3rd Qu.:74.00   3rd Qu.:2.300   3rd Qu.:2000   3rd Qu.: 14.800      
##  Max.   :98.00   Max.   :7.400   Max.   :2000   Max.   : 27.200      
##  Solar.Radiation     Rainfall          Snowfall         Seasons         
##  Min.   :0.0000   Min.   : 0.0000   Min.   :0.00000   Length:8760       
##  1st Qu.:0.0000   1st Qu.: 0.0000   1st Qu.:0.00000   Class :character  
##  Median :0.0100   Median : 0.0000   Median :0.00000   Mode  :character  
##  Mean   :0.5691   Mean   : 0.1487   Mean   :0.07507                     
##  3rd Qu.:0.9300   3rd Qu.: 0.0000   3rd Qu.:0.00000                     
##  Max.   :3.5200   Max.   :35.0000   Max.   :8.80000                     
##    Holiday          Functioning.Day   
##  Length:8760        Length:8760       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Question 1

The columns which are unclear until documentation is referred is :

  1. Rented Bike count- The column just refers to rented bike count, but the count is not specified whether it is for the entire day or an hour. The documentation mentions that the recorded data is the count of bikes rented at each hour.
  2. Temperature & Dew point temperature- The column name just specifies that the value is related to temperature, but the unit of measurement is not defined in the data set. The documentation reveals that the temperature is measured in Celsius.
  3. Snowfall & Rainfall- The column names here reveals that the data is regarding snowfall and rainfall, respectively, but the unit of measurement is not clear. After referring the documentation, it is clear that both are of different units i.e., Snowfall is measured in cm and Rainfall is measured in mm.

Question 2

The column which is still unclear, even after referring the documentation is “Visibility”. The documentation does not mention the actual meaning of visibility and the units of measurement is unclear. The recorded value is unclear as well.

Question 3

# Create a summary table to calculate the average visibility for each season
season_summary <- bike %>%
  group_by(Seasons) %>%
  summarize(AvgVisibility = mean(Visibility))

# Create the bar chart
ggplot(season_summary, aes(x = Seasons, y = AvgVisibility, fill = Seasons)) +
  geom_bar(stat = "identity") +
  labs(title = "Visibility vs. Seasons",
       x = "Seasons",
       y = "Average Visibility") +
  theme_minimal()

# Create a summary table to calculate the average visibility for each season
result <- aggregate(bike$Rented.Bike.Count,by=list(bike$Seasons), mean)
result
##   Group.1         x
## 1  Autumn  819.5980
## 2  Spring  730.0312
## 3  Summer 1034.0734
## 4  Winter  225.5412
# Create the bar chart
barplot(result$x, names.arg=result$Group.1, xlab="Season", ylab="Average Rented Bike count", col=rainbow(6),
        main="Season vs Avg Rented Bike Count",border="black")

From the above graphs, we can draw conclusions that visibility does not effect the rented bike count. In the first graph, we have calculated the average of visibility and plotted it against seasons.

It is clear that, on an average, visibility is high in autumn and summer. This didn’t effect the Rented bike count. The rented bike count is high in Summer seasons.