Bike-sharing systems represent a revolutionary evolution in the realm
of traditional bike rentals by seamlessly automating the entire process,
encompassing membership, rental, and return. These systems empower users
to effortlessly rent a bike from one location and conveniently return it
to another. The contemporary allure of these systems lies in their
pivotal role addressing traffic congestion, environmental concerns, and
promoting public health.
Beyond their evident real-world applications, the data emanating from
bike-sharing systems possesses intriguing characteristics that make them
a compelling subject for research. Parameters such as travel duration,
departure and arrival positions, and the total number of rented bikes
transform these systems into a virtual sensor network, capable of
providing valuable insights into urban mobility. Consequently,
monitoring this data is anticipated to unveil significant events within
the city.
Capital Bikeshare, boasting over 4300 bikes strategically stationed
at 500 locations across 7 jurisdictions, emerges as a robust player in
this transformative landscape. This extensive network offers residents
and visitors alike a convenient, enjoyable, and cost-effective
transportation alternative for navigating between point A and point B.
Capital Bikeshare serves as a versatile solution for daily commutes,
errands, appointments, social engagements, and more, embodying a dynamic
facet of modern urban mobility. As Capital Bikeshare celebrates its role
in fostering accessible transportation, it stands as a testament to the
evolving landscape of city living on its journey into the future.
# import library
library(dplyr)
library(data.table)
library(tidyr)
library(lubridate)
library(GGally)1️⃣ Selecting Column
# copying dataset
prepro <- copy(bike)
# changing datatypes
prepro$dteday <- ymd(prepro$dteday)
rownames(prepro) <- prepro$dteday
# selecting numeric columns
prepro <- prepro %>% select(temp, atemp, hum, windspeed)
# checking NA values
colSums(is.na(prepro))#> temp atemp hum windspeed
#> 0 0 0 0
#> temp atemp hum windspeed
#> Min. :0.05913 Min. :0.07907 Min. :0.0000 Min. :0.02239
#> 1st Qu.:0.33708 1st Qu.:0.33784 1st Qu.:0.5200 1st Qu.:0.13495
#> Median :0.49833 Median :0.48673 Median :0.6267 Median :0.18097
#> Mean :0.49538 Mean :0.47435 Mean :0.6279 Mean :0.19049
#> 3rd Qu.:0.65542 3rd Qu.:0.60860 3rd Qu.:0.7302 3rd Qu.:0.23321
#> Max. :0.86167 Max. :0.84090 Max. :0.9725 Max. :0.50746
2️⃣Checking Variances
#> temp atemp hum windspeed
#> temp 0.033507667 0.029582662 0.003310151 -0.002240605
#> atemp 0.029582662 0.026556346 0.003249181 -0.002319254
#> hum 0.003310151 0.003249181 0.020286047 -0.002742811
#> windspeed -0.002240605 -0.002319254 -0.002742811 0.006005920
#> PC1 PC2 PC3 PC4
#> temp 0.73994694 0.10390141 -0.05872877866 -0.661992414
#> atemp 0.65904681 0.07475057 0.00009875324 0.748378005
#> hum 0.11830922 -0.97858880 -0.16830230039 -0.006420051
#> windspeed -0.06433315 0.16118561 -0.98398437817 0.040684000
# checking Variance by squaring the sdev value (showing how much information can be collect in each PC)
pca$sdev^2#> [1] 0.0605800482 0.0201381752 0.0054032881 0.0002344684
#> Importance of components:
#> PC1 PC2 PC3 PC4
#> Standard deviation 0.2461 0.1419 0.07351 0.01531
#> Proportion of Variance 0.7015 0.2332 0.06257 0.00272
#> Cumulative Proportion 0.7015 0.9347 0.99728 1.00000
if we would like to keep more than 90% information from our data, we can keep only PC1 and PC2
> observation on 2011-03-10 has extreme value in variable windspeed
> humidity has great contribution on PC2
> meanwhile
temp and atemp have great contribution on PC1