| Name | trips |
| Number of rows | 365069 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 6 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| from_station_name | 0 | 1.00 | 10 | 43 | 0 | 594 | 0 |
| to_station_name | 0 | 1.00 | 10 | 43 | 0 | 600 | 0 |
| usertype | 0 | 1.00 | 8 | 10 | 0 | 2 | 0 |
| gender | 19711 | 0.95 | 4 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| trip_id | 0 | 1.00 | 21960871.66 | 127175.00 | 21742443 | 21848765 | 21961829 | 22071823 | 22178528 |
| bikeid | 0 | 1.00 | 3429.48 | 1923.32 | 1 | 1777 | 3489 | 5157 | 6471 |
| tripduration | 0 | 1.00 | 1016.34 | 27913.51 | 61 | 326 | 524 | 866 | 10628400 |
| from_station_id | 0 | 1.00 | 198.09 | 153.49 | 2 | 76 | 170 | 287 | 665 |
| to_station_id | 0 | 1.00 | 198.58 | 154.47 | 2 | 76 | 168 | 287 | 665 |
| birthyear | 18023 | 0.95 | 1981.67 | 11.25 | 1900 | 1975 | 1985 | 1990 | 2003 |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| start_time | 0 | 1 | 2019-01-01 00:04:37 | 2019-03-31 23:53:48 | 2019-02-25 07:52:56 | 343022 |
| end_time | 0 | 1 | 2019-01-01 00:11:07 | 2019-06-17 16:04:35 | 2019-02-25 08:03:50 | 338367 |
| Usertype | Count |
|---|---|
| Customer | 5934 |
| Subscriber | 339423 |
We can see that the there are a lot more Subscribers using the bikes
as compared to Customers
| Usertype | Mean Duration |
|---|---|
| Customer | 37.0 mins |
| Subscriber | 13.9 mins |
We see that although there are a lot more Subscribers than Customers, the Customers spend much more time on a bike than the Subscribers
| gender | Count |
|---|---|
| Female | 66918 |
| Male | 278439 |
A disproportionate amount of our bike users are male. This indicates that we need to focus more on marketing the bikes to females.
| category | count |
|---|---|
| unique bikes | 4755 |
| unique start stations | 587 |
| unique end stations | 592 |
| gender | mean_duration |
|---|---|
| Female | 15 |
| Male | 14 |
This figure shows that although there are more male bikers than female, the female bikers spend more time on the bike than men do hence reinforcing the fact that the female riders could bring in more revenue.
This brings us to our first reccomendation of this case study.
Women riders spend more time on the bike even though they have a lot less rides than men. Cyclistic should focus on bringing more women to join our service.
The general trend of age vs duration is that as age increases, the amount of time on bike decreases. Therefore we should focus more on young people to add to our subscribers.
However we see a significant increase in bike usage for people that are 75 years or older. This could be because of the health benefits or riding a bike.
Running the code again without the outliers,
From this graph, we can see that the general trend of age vs duration is slightly decreasing for both customers and subscribers.
The mean duration of Subscribers is throughout less than that of customers. There are some interesting points to mention.
The mean duration of riders aged 16-20 have an unusually high durations on a bike as shaded in the graph.
The mean duration of riders aged 55-70 also have an unusually high duration on a bike as shaded in the graph.
This shows that very young people and very old people are alot more active on the bike and hence our business strategy should focus on increasing Subscribers in these categories.
This brings us to our 2nd reccomendation of this case study.
People aged 16-20 and 55-70 spend more time than other age groups. Marketing strategies should be devised to attract customers of these age groups.
#now lets check the duration by usertype
The overall duration shows that customers on average spend more time on the bike than subscribers. This means that subscribers mainly use their bikes for short routes while customers use the bikes for longer routes. This makes sense. Assuming that for a customer, a ticket costs the same no matter the amount of distance traveled from station to station. While the subscriber can be charged less for shorter routes and more for longer routes.
It could also be the case that the people who use longer routes are tourists and because the bike service is not available in their area, they don’t see a value in buying the subscription.
Make an incentive program in the subscription to attract people who
use the bikes for longer routes.
| Name | all_trips |
| Number of rows | 3879822 |
| Number of columns | 10 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 3 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| trip_id | 0 | 1 | 8 | 16 | 0 | 3879822 | 0 |
| bikeid | 0 | 1 | 1 | 11 | 0 | 6004 | 0 |
| from_station_name | 0 | 1 | 5 | 43 | 0 | 643 | 0 |
| to_station_name | 1 | 1 | 5 | 43 | 0 | 644 | 0 |
| usertype | 0 | 1 | 8 | 10 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| tripduration | 0 | 1 | 1333.90 | 28354.71 | -9.2 | 328 | 645 | 1225 | 9056633 |
| from_station_id | 0 | 1 | 202.90 | 157.14 | 1.0 | 77 | 174 | 291 | 675 |
| to_station_id | 1 | 1 | 203.76 | 157.19 | 1.0 | 77 | 174 | 291 | 675 |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| start_time | 0 | 1 | 2019-04-01 00:02:22 | 2020-03-31 23:51:34 | 2019-08-14 17:43:38 | 3362333 |
| end_time | 0 | 1 | 2019-04-01 00:09:48 | 2020-05-19 20:10:34 | 2019-08-14 18:02:04 | 3299507 |
## [1] "trip_id" "start_time" "end_time"
## [4] "bikeid" "tripduration" "from_station_id"
## [7] "from_station_name" "to_station_id" "to_station_name"
## [10] "usertype"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.20 5.50 10.80 22.23 20.40 150943.90
## # A tibble: 1 × 1
## `min(tripduration)`
## <dbl>
## 1 -9.2
| min(tripduration) |
|---|
| 3.016667 |
| statistic | Value |
|---|---|
| Mean | 22.40365 |
| Median | 10.80000 |
| Standard Deviation | 474.39737 |
Compared to part 1, this data has a significanly more bike trips.
We see that the user type and duration data is in line with the data presented in part 1. Customers use the bikes for longer routes while subscribers use the bikes for shorter routes.
The temporal trend shows that bike usage increases in summer.
Find which days have the most bike use.
The weekday analysis shows that people are using bikes more on the weekdays than on weekends. This indicates that a significant amount of people use bikes to commute to work.
Monthwise analysis of trips
## [1] "Apr 2019" "May 2019" "Jun 2019" "Jul 2019" "Aug 2019" "Sep 2019"
## [7] "Oct 2019" "Nov 2019" "Dec 2019" "Jan 2020" "Feb 2020" "Mar 2020"
This graph reinforces the fact that bike usage is at its peak during the
summer months.
The hourly analysis shows that bike usage is at peak from 7:00 am to 9:00 am and from 4:00 pm to 6:00 pm. This further reinforces the idea that people use the bikes to commute to work.
Make packages for people who commute to work
Summer is the most popular season when it comes to riding bikes
Combining Seasons and month
Based on the analysis of this case study, the following reccomendations are made.
Women riders spend more time on the bike even though they have a lot less rides than men. Cyclistic should focus on bringing more women to join our service.
People aged 16-20 and 55-70 spend more time than other age groups. Marketing strategies should be devised to attract customers of these age groups.
Make an incentive program in the subscription to attract people who use the bikes for longer routes.
Make packages for people who commute to work