About Data Analysis Report

This RMarkdown file contains the report of the data analysis done for the project on forecasting daily bike rental demand using time series models in R. It contains analysis such as data exploration, summary statistics and building the time series models. The final report was completed on Mon Jul 15 11:12:04 2024.

Data Description:

This dataset contains the daily count of rental bike transactions between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

Data Source: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

Relevant Paper:

Fanaee-T, Hadi, and Gama, Joao. Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg

#Install, Load and explore the data

Key Packages for Time Series Analysis

  • tidyverse: For data manipulation and wrangling.
  • forecast: For time series forecasting.
  • ggplot2: For creating visualizations of time series data.
  • plotly: For interactive Plot
  • tseries : For adf.test, is a package for time series analysis and computational finance.
## Install  required libraries 
install.packages(c("tidyverse", "plotly","tseries","forecast"))
## Warning in install.packages(c("tidyverse", "plotly", "tseries", "forecast")):
## installation of package 'forecast' had non-zero exit status

Load packages

## Import required packages
library(tidyverse)
library(plotly)
library(tseries)
library(forecast)

Describe and explore the data

##Data cleaning

# reading csv file
bike_day_data = read_csv("day.csv")

# Checking for missing values in the data frames 
# used cat() to concatenate string and anyNA(data) checks if there are any missing values 

cat("Are there any missing data in the dataframe?: ", anyNA(bike_day_data), "\n")
## Are there any missing data in the dataframe?:  FALSE
## checking summary of data
#summary(bike_day_data)
# Convert dteday to Date type
bike_day_data$dteday= as.Date(bike_day_data$dteday)

# Aggregate data to count records by season
season_counts = bike_day_data %>%
  group_by(season) %>%
  summarise(count = n())

# Convert season codes to season names for better visualization (optional)
bike_day_data$season = factor(bike_day_data$season,
                               levels = c(1, 2, 3, 4),
                               labels = c("Spring", "Summer", "Fall", "Winter"))


bike_day_data$weathersit = factor(bike_day_data$weathersit, 
                                   levels = c(1, 2, 3), 
                                   labels = c("Clear", "Cloudy", "Light_Rain"))

Plot daily bike rental interactive chart

plot_ly(bike_day_data, x = ~dteday, y = ~cnt, color = ~factor(season, labels = c("Spring", "Summer", "Fall", "Winter")), type = 'scatter', mode = 'lines') %>%
  layout(
    title = "Daily Bike Rental Counts",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Bike Rented"),
    colorway = c("blue", "orange", "green", "red")  # Customize colors if needed
  )

The chart highlights the wide range of bike rental demand recorded in the dataset, suggesting potential influences such as specific events, weather conditions, or other external factors that contribute to significant spikes in rental activity, as indicated by the maximum value of 8714.

Plot season bike rental chart

ggplot(bike_day_data, aes(x = dteday, y = cnt, colour = season)) +
  geom_line() +
  labs(title = "Daily Bike Rental Counts",
       x = "Date",
       y = "Bike Rented") +
  facet_wrap(~ season) +
  theme_minimal()

By analyzing above chart, It appears that there might not be a significant impact of season on bike rentals at first glance. The counts are relatively balanced across all seasons, suggesting that bike rental demand does not vary drastically depending on the time of year represented by these seasons.

Plot Weather condition bike rental interactive chart

 plot_ly(bike_day_data, x = ~dteday, y = ~cnt, color = ~weathersit, type = 'scatter', mode = 'lines') %>%
  layout(
    title = "Daily Bike Rental Counts by Weather Situation",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Bike Rented"),
    colorway = c("green", "purple", "orange")  # Customize colors if needed
  )

This chart highlights the varying behaviors of bike renters in response to different weather conditions. Clear weather, being the most frequent, likely encourages higher rental rates, while Cloudy and Light Rain conditions may correlate with fewer rentals.

Forecast time series data using ARIMA models

# Extract the start date and calculate the frequency
start_date = as.numeric(format(min(bike_day_data$dteday), "%Y"))


# Convert to time series object
bike_ts = ts(bike_day_data$cnt, start = c(start_date, 1), frequency =365)

# Plot the time series data
p = autoplot(bike_ts) +
  labs(title = "Daily Bike Rental Counts",
       x = "Date",
       y = "Bike Rented") +
  theme_minimal()
# Decompose the time series
decomposed = decompose(bike_ts)
decom<-autoplot(decomposed) +
  theme_minimal()

# Check stationarity
adf_test = adf.test(bike_ts)
#print(adf_test)

# Differencing the time series to achieve stationarity becuase previous test told us data is not stationary
diff_bike_ts = diff(bike_ts)

# Check stationarity of the differenced series
adf_test_diff = adf.test(diff_bike_ts)
## Warning in adf.test(diff_bike_ts): p-value smaller than printed p-value
#print(adf_test_diff)


# Fit ARIMA model
fit = auto.arima(bike_ts)
#summary(fit)

# Forecast future values

 forecasts = forecast(fit, h = 30)  # Forecast the next 30 days
 autoplot(forecasts) +
  labs(title = "Bike Rental Forecast For Next 30 Days",
       x = "Date",
       y = "Bike Rented") +
  theme_minimal()

# Plot the forecast with the original time series data
forcastsWithorignal = autoplot(bike_ts) +
  autolayer(forecasts, series = "Forecast") +
  labs(title = "Bike Rental Forecast For Next 30 Days",
       x = "Date",
       y = "Bike Rented") +
  theme_minimal()

#display plot
forcastsWithorignal

Findings, Conclusions and Forecast

The analysis highlights the significant influence of weather conditions on bike rental behavior. Clear weather consistently drives higher rental rates, indicating a preference for favorable outdoor conditions.

Looking ahead, the forecasted values suggest that bike rental demand will follow seasonal patterns, with potential spikes during periods of clear weather