Introduction

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

More About Cyclistic

Characters and team

  • Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

  • Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

  • Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.

  • Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Recent Growth

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes,and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Cyclistic’s Goal

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Ask Smart Questions

Guiding Questions

  1. How do annual members and casual riders use Cyclistic bikes differently?

  2. Why would casual riders buy Cyclistic annual memberships?

  3. How can Cyclistic use digital media to influence casual riders to become members?

Assignment

Moreno has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently

Deliverables

I will produce a report with the following deliverables:

  1. A clear statement of the business task

  2. A description of all data sources used

  3. Documentation of any cleaning or manipulation of data

  4. A summary of your analysis

  5. Supporting visualizations and key findings

  6. Your top three recommendations based on your analysis

Prepare

Steps

  1. Download the data for the last 12 months…

  2. Extract the data from zipped Folders.

  3. Import the data into R.

# downloading, extracting and importing 12 data sets...
# Loads relavent packages

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

# Loading the last 12 months and makes a new column with the month

July22 <- read.csv("202207-divvy-tripdata.csv") %>% 
  mutate(month_of = "2022-7-1")

Aug22 <- read.csv("202208-divvy-tripdata.csv") %>% 
  mutate(month_of = "2022-8-1")

Sep22 <- read.csv("202209-divvy-tripdata.csv") %>% 
  mutate(month_of = "2022-9-1")

Oct22 <- read.csv("202210-divvy-tripdata.csv") %>% 
  mutate(month_of = "2022-10-1")

Nov22 <- read.csv("202211-divvy-tripdata.csv") %>% 
  mutate(month_of = "2022-11-1")

Dec22 <- read.csv("202212-divvy-tripdata.csv") %>% 
  mutate(month_of = "2022-12-1")

Jan23 <- read.csv("202301-divvy-tripdata.csv") %>% 
  mutate(month_of = "2023-1-1")

Feb23 <- read.csv("202302-divvy-tripdata.csv") %>% 
  mutate(month_of = "2023-2-1")

Mar23 <- read.csv("202303-divvy-tripdata.csv") %>% 
  mutate(month_of = "2023-3-1")

Apr23 <- read.csv("202304-divvy-tripdata.csv") %>% 
  mutate(month_of = "2023-4-1")

May23 <- read.csv("202305-divvy-tripdata.csv") %>% 
  mutate(month_of = "2023-5-1")

June23 <- read.csv("202206-divvy-tripdata.csv") %>% 
  mutate(month_of = "2023-6-1")

str(c(July22, Aug22, Sep22, Oct22, Nov22, Dec22, Jan23, Feb23, Mar23, Apr23, May23, June23))
## List of 168
##  $ ride_id           : chr [1:823488] "954144C2F67B1932" "292E027607D218B6" "57765852588AD6E0" "B5B6BE44314590E6" ...
##  $ rideable_type     : chr [1:823488] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr [1:823488] "2022-07-05 08:12:47" "2022-07-26 12:53:38" "2022-07-03 13:58:49" "2022-07-31 17:44:21" ...
##  $ ended_at          : chr [1:823488] "2022-07-05 08:24:32" "2022-07-26 12:55:31" "2022-07-03 14:06:32" "2022-07-31 18:42:50" ...
##  $ start_station_name: chr [1:823488] "Ashland Ave & Blackhawk St" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" ...
##  $ start_station_id  : chr [1:823488] "13224" "15541" "15541" "15541" ...
##  $ end_station_name  : chr [1:823488] "Kingsbury St & Kinzie St" "Michigan Ave & 8th St" "Michigan Ave & 8th St" "Woodlawn Ave & 55th St" ...
##  $ end_station_id    : chr [1:823488] "KA1503000043" "623" "623" "TA1307000164" ...
##  $ start_lat         : num [1:823488] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:823488] -87.7 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num [1:823488] 41.9 41.9 41.9 41.8 41.9 ...
##  $ end_lng           : num [1:823488] -87.6 -87.6 -87.6 -87.6 -87.7 ...
##  $ member_casual     : chr [1:823488] "member" "casual" "casual" "casual" ...
##  $ month_of          : chr [1:823488] "2022-7-1" "2022-7-1" "2022-7-1" "2022-7-1" ...
##  $ ride_id           : chr [1:785932] "550CF7EFEAE0C618" "DAD198F405F9C5F5" "E6F2BC47B65CB7FD" "F597830181C2E13C" ...
##  $ rideable_type     : chr [1:785932] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr [1:785932] "2022-08-07 21:34:15" "2022-08-08 14:39:21" "2022-08-08 15:29:50" "2022-08-08 02:43:50" ...
##  $ ended_at          : chr [1:785932] "2022-08-07 21:41:46" "2022-08-08 14:53:23" "2022-08-08 15:40:34" "2022-08-08 02:58:53" ...
##  $ start_station_name: chr [1:785932] "" "" "" "" ...
##  $ start_station_id  : chr [1:785932] "" "" "" "" ...
##  $ end_station_name  : chr [1:785932] "" "" "" "" ...
##  $ end_station_id    : chr [1:785932] "" "" "" "" ...
##  $ start_lat         : num [1:785932] 41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num [1:785932] -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num [1:785932] 41.9 41.9 42 42 41.8 ...
##  $ end_lng           : num [1:785932] -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr [1:785932] "casual" "casual" "casual" "casual" ...
##  $ month_of          : chr [1:785932] "2022-8-1" "2022-8-1" "2022-8-1" "2022-8-1" ...
##  $ ride_id           : chr [1:701339] "5156990AC19CA285" "E12D4A16BF51C274" "A02B53CD7DB72DD7" "C82E05FEE872DF11" ...
##  $ rideable_type     : chr [1:701339] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr [1:701339] "2022-09-01 08:36:22" "2022-09-01 17:11:29" "2022-09-01 17:15:50" "2022-09-01 09:00:28" ...
##  $ ended_at          : chr [1:701339] "2022-09-01 08:39:05" "2022-09-01 17:14:45" "2022-09-01 17:16:12" "2022-09-01 09:10:32" ...
##  $ start_station_name: chr [1:701339] "" "" "" "" ...
##  $ start_station_id  : chr [1:701339] "" "" "" "" ...
##  $ end_station_name  : chr [1:701339] "California Ave & Milwaukee Ave" "" "" "" ...
##  $ end_station_id    : chr [1:701339] "13084" "" "" "" ...
##  $ start_lat         : num [1:701339] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:701339] -87.7 -87.6 -87.6 -87.7 -87.7 ...
##  $ end_lat           : num [1:701339] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:701339] -87.7 -87.6 -87.6 -87.7 -87.7 ...
##  $ member_casual     : chr [1:701339] "casual" "casual" "casual" "casual" ...
##  $ month_of          : chr [1:701339] "2022-9-1" "2022-9-1" "2022-9-1" "2022-9-1" ...
##  $ ride_id           : chr [1:558685] "A50255C1E17942AB" "DB692A70BD2DD4E3" "3C02727AAF60F873" "47E653FDC2D99236" ...
##  $ rideable_type     : chr [1:558685] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr [1:558685] "2022-10-14 17:13:30" "2022-10-01 16:29:26" "2022-10-19 18:55:40" "2022-10-31 07:52:36" ...
##  $ ended_at          : chr [1:558685] "2022-10-14 17:19:39" "2022-10-01 16:49:06" "2022-10-19 19:03:30" "2022-10-31 07:58:49" ...
##  $ start_station_name: chr [1:558685] "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
##  $ start_station_id  : chr [1:558685] "13290" "13288" "655" "KA1504000133" ...
##  $ end_station_name  : chr [1:558685] "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
##  $ end_station_id    : chr [1:558685] "KA1504000079" "13089" "TA1307000140" "620" ...
##  $ start_lat         : num [1:558685] 41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num [1:558685] -87.7 -87.7 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num [1:558685] 41.9 41.9 42 41.9 41.9 ...
##  $ end_lng           : num [1:558685] -87.6 -87.7 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr [1:558685] "member" "casual" "member" "member" ...
##  $ month_of          : chr [1:558685] "2022-10-1" "2022-10-1" "2022-10-1" "2022-10-1" ...
##  $ ride_id           : chr [1:337735] "BCC66FC6FAB27CC7" "772AB67E902C180F" "585EAD07FDEC0152" "91C4E7ED3C262FF9" ...
##  $ rideable_type     : chr [1:337735] "electric_bike" "classic_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr [1:337735] "2022-11-10 06:21:55" "2022-11-04 07:31:55" "2022-11-21 17:20:29" "2022-11-25 17:29:34" ...
##  $ ended_at          : chr [1:337735] "2022-11-10 06:31:27" "2022-11-04 07:46:25" "2022-11-21 17:34:36" "2022-11-25 17:45:15" ...
##  $ start_station_name: chr [1:337735] "Canal St & Adams St" "Canal St & Adams St" "Indiana Ave & Roosevelt Rd" "Indiana Ave & Roosevelt Rd" ...
##  $ start_station_id  : chr [1:337735] "13011" "13011" "SL-005" "SL-005" ...
##  $ end_station_name  : chr [1:337735] "St. Clair St & Erie St" "St. Clair St & Erie St" "St. Clair St & Erie St" "St. Clair St & Erie St" ...
##  $ end_station_id    : chr [1:337735] "13016" "13016" "13016" "13016" ...
##  $ start_lat         : num [1:337735] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:337735] -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num [1:337735] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:337735] -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ member_casual     : chr [1:337735] "member" "member" "member" "member" ...
##  $ month_of          : chr [1:337735] "2022-11-1" "2022-11-1" "2022-11-1" "2022-11-1" ...
##  $ ride_id           : chr [1:181806] "65DBD2F447EC51C2" "0C201AA7EA0EA1AD" "E0B148CCB358A49D" "54C5775D2B7C9188" ...
##  $ rideable_type     : chr [1:181806] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr [1:181806] "2022-12-05 10:47:18" "2022-12-18 06:42:33" "2022-12-13 08:47:45" "2022-12-13 18:50:47" ...
##  $ ended_at          : chr [1:181806] "2022-12-05 10:56:34" "2022-12-18 07:08:44" "2022-12-13 08:59:51" "2022-12-13 19:19:48" ...
##  $ start_station_name: chr [1:181806] "Clifton Ave & Armitage Ave" "Broadway & Belmont Ave" "Sangamon St & Lake St" "Shields Ave & 31st St" ...
##  $ start_station_id  : chr [1:181806] "TA1307000163" "13277" "TA1306000015" "KA1503000038" ...
##  $ end_station_name  : chr [1:181806] "Sedgwick St & Webster Ave" "Sedgwick St & Webster Ave" "St. Clair St & Erie St" "Damen Ave & Madison St" ...
##  $ end_station_id    : chr [1:181806] "13191" "13191" "13016" "13134" ...
##  $ start_lat         : num [1:181806] 41.9 41.9 41.9 41.8 41.9 ...
##  $ start_lng         : num [1:181806] -87.7 -87.6 -87.7 -87.6 -87.7 ...
##  $ end_lat           : num [1:181806] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:181806] -87.6 -87.6 -87.6 -87.7 -87.7 ...
##  $ member_casual     : chr [1:181806] "member" "casual" "member" "member" ...
##  $ month_of          : chr [1:181806] "2022-12-1" "2022-12-1" "2022-12-1" "2022-12-1" ...
##  $ ride_id           : chr [1:190301] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
##  $ rideable_type     : chr [1:190301] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr [1:190301] "2023-01-21 20:05:42" "2023-01-10 15:37:36" "2023-01-02 07:51:57" "2023-01-22 10:52:58" ...
##  $ ended_at          : chr [1:190301] "2023-01-21 20:16:33" "2023-01-10 15:46:05" "2023-01-02 08:05:11" "2023-01-22 11:01:44" ...
##  $ start_station_name: chr [1:190301] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
##  $ start_station_id  : chr [1:190301] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
##  $ end_station_name  : chr [1:190301] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
##  $ end_station_id    : chr [1:190301] "202480.0" "TA1308000002" "599" "TA1308000002" ...
##  $ start_lat         : num [1:190301] 41.9 41.8 42 41.8 41.8 ...
##  $ start_lng         : num [1:190301] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num [1:190301] 41.9 41.8 42 41.8 41.8 ...
##  $ end_lng           : num [1:190301] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr [1:190301] "member" "member" "casual" "member" ...
##  $ month_of          : chr [1:190301] "2023-1-1" "2023-1-1" "2023-1-1" "2023-1-1" ...
##  $ ride_id           : chr [1:190445] "CBCD0D7777F0E45F" "F3EC5FCE5FF39DE9" "E54C1F27FA9354FF" "3D561E04F739CC45" ...
##   [list output truncated]
# joining all tables into groups

joined_Table <- July22 %>% 
  full_join(Aug22) %>% 
  full_join(Sep22) %>% 
  full_join(Oct22) %>% 
  full_join(Nov22) %>% 
  full_join(Dec22) %>% 
  full_join(Jan23) %>% 
  full_join(Feb23) %>% 
  full_join(Mar23) %>% 
  full_join(Apr23) %>% 
  full_join(May23) %>% 
  full_join(June23)
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
Gpdf <- group_by(joined_Table, ride_id)

Notes

  1. This data has been licensed for public use, any and all PII is to be removed.

  2. This data is quite reliable, this is a lengthy data set, but it could be more original.

  3. Unfortunately this data is a couple years old.

Process

Steps

  1. Check data for errors.

  2. Choose your tools.

  3. Transform the data so you can work with it effectively

  4. Document the cleaning process.

# checking or data  errors

str(Gpdf)
## gropd_df [5,829,030 × 14] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5829030] "954144C2F67B1932" "292E027607D218B6" "57765852588AD6E0" "B5B6BE44314590E6" ...
##  $ rideable_type     : chr [1:5829030] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr [1:5829030] "2022-07-05 08:12:47" "2022-07-26 12:53:38" "2022-07-03 13:58:49" "2022-07-31 17:44:21" ...
##  $ ended_at          : chr [1:5829030] "2022-07-05 08:24:32" "2022-07-26 12:55:31" "2022-07-03 14:06:32" "2022-07-31 18:42:50" ...
##  $ start_station_name: chr [1:5829030] "Ashland Ave & Blackhawk St" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" ...
##  $ start_station_id  : chr [1:5829030] "13224" "15541" "15541" "15541" ...
##  $ end_station_name  : chr [1:5829030] "Kingsbury St & Kinzie St" "Michigan Ave & 8th St" "Michigan Ave & 8th St" "Woodlawn Ave & 55th St" ...
##  $ end_station_id    : chr [1:5829030] "KA1503000043" "623" "623" "TA1307000164" ...
##  $ start_lat         : num [1:5829030] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:5829030] -87.7 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num [1:5829030] 41.9 41.9 41.9 41.8 41.9 ...
##  $ end_lng           : num [1:5829030] -87.6 -87.6 -87.6 -87.6 -87.7 ...
##  $ member_casual     : chr [1:5829030] "member" "casual" "casual" "casual" ...
##  $ month_of          : chr [1:5829030] "2022-7-1" "2022-7-1" "2022-7-1" "2022-7-1" ...
##  - attr(*, "groups")= tibble [5,829,030 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ ride_id: chr [1:5829030] "00000179CF2C4FB5" "0000038F578A7278" "0000047373295F85" "000004C3185FDDE9" ...
##   ..$ .rows  : list<int> [1:5829030] 
##   .. ..$ : int 198295
##   .. ..$ : int 3220650
##   .. ..$ : int 123129
##   .. ..$ : int 411241
##   .. ..$ : int 817026
##   .. ..$ : int 4318802
##   .. ..$ : int 4647802
##   .. ..$ : int 5678983
##   .. ..$ : int 868468
##   .. ..$ : int 2008896
##   .. ..$ : int 5250720
##   .. ..$ : int 2530900
##   .. ..$ : int 2182141
##   .. ..$ : int 2031471
##   .. ..$ : int 2891352
##   .. ..$ : int 668916
##   .. ..$ : int 1848063
##   .. ..$ : int 5802143
##   .. ..$ : int 5722675
##   .. ..$ : int 1231405
##   .. ..$ : int 1732832
##   .. ..$ : int 3399487
##   .. ..$ : int 1862317
##   .. ..$ : int 3827817
##   .. ..$ : int 4361065
##   .. ..$ : int 4283729
##   .. ..$ : int 3201348
##   .. ..$ : int 5216764
##   .. ..$ : int 1606843
##   .. ..$ : int 3873943
##   .. ..$ : int 5651268
##   .. ..$ : int 1800450
##   .. ..$ : int 5719231
##   .. ..$ : int 3869144
##   .. ..$ : int 134244
##   .. ..$ : int 5367862
##   .. ..$ : int 199381
##   .. ..$ : int 3985032
##   .. ..$ : int 3538638
##   .. ..$ : int 4480460
##   .. ..$ : int 4806987
##   .. ..$ : int 2146109
##   .. ..$ : int 3792746
##   .. ..$ : int 2338144
##   .. ..$ : int 764534
##   .. ..$ : int 2973334
##   .. ..$ : int 3065507
##   .. ..$ : int 5459366
##   .. ..$ : int 2641924
##   .. ..$ : int 1932427
##   .. ..$ : int 5654267
##   .. ..$ : int 32160
##   .. ..$ : int 3856104
##   .. ..$ : int 272622
##   .. ..$ : int 4269110
##   .. ..$ : int 3395065
##   .. ..$ : int 5174023
##   .. ..$ : int 4488584
##   .. ..$ : int 2270782
##   .. ..$ : int 2006316
##   .. ..$ : int 3672780
##   .. ..$ : int 5139015
##   .. ..$ : int 2230362
##   .. ..$ : int 5370163
##   .. ..$ : int 4813389
##   .. ..$ : int 4485059
##   .. ..$ : int 5696136
##   .. ..$ : int 1019741
##   .. ..$ : int 2165643
##   .. ..$ : int 4904937
##   .. ..$ : int 1184329
##   .. ..$ : int 2999157
##   .. ..$ : int 2275918
##   .. ..$ : int 5204228
##   .. ..$ : int 3616756
##   .. ..$ : int 1351841
##   .. ..$ : int 726404
##   .. ..$ : int 2806506
##   .. ..$ : int 2607128
##   .. ..$ : int 1777403
##   .. ..$ : int 4984902
##   .. ..$ : int 5300067
##   .. ..$ : int 5045031
##   .. ..$ : int 4281036
##   .. ..$ : int 1720367
##   .. ..$ : int 1170080
##   .. ..$ : int 5444591
##   .. ..$ : int 3199678
##   .. ..$ : int 5088301
##   .. ..$ : int 2313441
##   .. ..$ : int 3464335
##   .. ..$ : int 2017472
##   .. ..$ : int 4029694
##   .. ..$ : int 4841786
##   .. ..$ : int 4794065
##   .. ..$ : int 836783
##   .. ..$ : int 2921866
##   .. ..$ : int 2674658
##   .. ..$ : int 1718414
##   .. .. [list output truncated]
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE
# appears started_at, ended_at, month_of are wrong data types... Fetching more information on other columns.

Selected_Joined_Table <- select(Gpdf, start_station_name, start_station_id, end_station_name, end_station_id)
## Adding missing grouping variables: `ride_id`
str(unique(Selected_Joined_Table))
## gropd_df [5,829,030 × 5] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5829030] "954144C2F67B1932" "292E027607D218B6" "57765852588AD6E0" "B5B6BE44314590E6" ...
##  $ start_station_name: chr [1:5829030] "Ashland Ave & Blackhawk St" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" ...
##  $ start_station_id  : chr [1:5829030] "13224" "15541" "15541" "15541" ...
##  $ end_station_name  : chr [1:5829030] "Kingsbury St & Kinzie St" "Michigan Ave & 8th St" "Michigan Ave & 8th St" "Woodlawn Ave & 55th St" ...
##  $ end_station_id    : chr [1:5829030] "KA1503000043" "623" "623" "TA1307000164" ...
##  - attr(*, "groups")= tibble [5,829,030 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ ride_id: chr [1:5829030] "00000179CF2C4FB5" "0000038F578A7278" "0000047373295F85" "000004C3185FDDE9" ...
##   ..$ .rows  : list<int> [1:5829030] 
##   .. ..$ : int 198295
##   .. ..$ : int 3220650
##   .. ..$ : int 123129
##   .. ..$ : int 411241
##   .. ..$ : int 817026
##   .. ..$ : int 4318802
##   .. ..$ : int 4647802
##   .. ..$ : int 5678983
##   .. ..$ : int 868468
##   .. ..$ : int 2008896
##   .. ..$ : int 5250720
##   .. ..$ : int 2530900
##   .. ..$ : int 2182141
##   .. ..$ : int 2031471
##   .. ..$ : int 2891352
##   .. ..$ : int 668916
##   .. ..$ : int 1848063
##   .. ..$ : int 5802143
##   .. ..$ : int 5722675
##   .. ..$ : int 1231405
##   .. ..$ : int 1732832
##   .. ..$ : int 3399487
##   .. ..$ : int 1862317
##   .. ..$ : int 3827817
##   .. ..$ : int 4361065
##   .. ..$ : int 4283729
##   .. ..$ : int 3201348
##   .. ..$ : int 5216764
##   .. ..$ : int 1606843
##   .. ..$ : int 3873943
##   .. ..$ : int 5651268
##   .. ..$ : int 1800450
##   .. ..$ : int 5719231
##   .. ..$ : int 3869144
##   .. ..$ : int 134244
##   .. ..$ : int 5367862
##   .. ..$ : int 199381
##   .. ..$ : int 3985032
##   .. ..$ : int 3538638
##   .. ..$ : int 4480460
##   .. ..$ : int 4806987
##   .. ..$ : int 2146109
##   .. ..$ : int 3792746
##   .. ..$ : int 2338144
##   .. ..$ : int 764534
##   .. ..$ : int 2973334
##   .. ..$ : int 3065507
##   .. ..$ : int 5459366
##   .. ..$ : int 2641924
##   .. ..$ : int 1932427
##   .. ..$ : int 5654267
##   .. ..$ : int 32160
##   .. ..$ : int 3856104
##   .. ..$ : int 272622
##   .. ..$ : int 4269110
##   .. ..$ : int 3395065
##   .. ..$ : int 5174023
##   .. ..$ : int 4488584
##   .. ..$ : int 2270782
##   .. ..$ : int 2006316
##   .. ..$ : int 3672780
##   .. ..$ : int 5139015
##   .. ..$ : int 2230362
##   .. ..$ : int 5370163
##   .. ..$ : int 4813389
##   .. ..$ : int 4485059
##   .. ..$ : int 5696136
##   .. ..$ : int 1019741
##   .. ..$ : int 2165643
##   .. ..$ : int 4904937
##   .. ..$ : int 1184329
##   .. ..$ : int 2999157
##   .. ..$ : int 2275918
##   .. ..$ : int 5204228
##   .. ..$ : int 3616756
##   .. ..$ : int 1351841
##   .. ..$ : int 726404
##   .. ..$ : int 2806506
##   .. ..$ : int 2607128
##   .. ..$ : int 1777403
##   .. ..$ : int 4984902
##   .. ..$ : int 5300067
##   .. ..$ : int 5045031
##   .. ..$ : int 4281036
##   .. ..$ : int 1720367
##   .. ..$ : int 1170080
##   .. ..$ : int 5444591
##   .. ..$ : int 3199678
##   .. ..$ : int 5088301
##   .. ..$ : int 2313441
##   .. ..$ : int 3464335
##   .. ..$ : int 2017472
##   .. ..$ : int 4029694
##   .. ..$ : int 4841786
##   .. ..$ : int 4794065
##   .. ..$ : int 836783
##   .. ..$ : int 2921866
##   .. ..$ : int 2674658
##   .. ..$ : int 1718414
##   .. .. [list output truncated]
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE
# end_station_id, start_station_id type chr, when type should be either num or int
# received a warning, it would be better to leave as chr to not remove lots of values.

#  -cleaning data-

joined_Table <- na.omit(distinct(joined_Table))

mut_Joined <- joined_Table %>% 
  mutate(started_at = as.Date(started_at), ended_at = as.Date(ended_at),
         month_of = as.Date(month_of))

# filtering/sampling the data because the dataset is to large, tried with a function but
# it wasn't loading.

July22 <- filter(joined_Table, month_of == "2022-7-1")
July22 <- July22[sample(nrow(July22), 100), ]

Aug22 <- filter(joined_Table, month_of == "2022-8-1")
Aug22 <- Aug22[sample(nrow(Aug22), 100), ]

Sep22 <- filter(joined_Table, month_of == "2022-9-1")
Sep22 <- Sep22[sample(nrow(Sep22), 100), ]

Oct22 <- filter(joined_Table, month_of == "2022-10-1")
Oct22 <- Oct22[sample(nrow(Oct22), 100), ]

Nov22 <- filter(joined_Table, month_of == "2022-11-1")
Nov22 <- Nov22[sample(nrow(Nov22), 100), ]

Dec22 <- filter(joined_Table, month_of == "2022-12-1")
Dec22 <- Dec22[sample(nrow(Dec22), 100), ]

Jan23 <- filter(joined_Table, month_of == "2023-1-1")
Jan23 <- Jan23[sample(nrow(Jan23), 100), ]

Feb23 <- filter(joined_Table, month_of == "2023-2-1")
Feb23 <- Feb23[sample(nrow(Feb23), 100), ]

Mar23 <- filter(joined_Table, month_of == "2023-3-1")
Mar23 <- Mar23[sample(nrow(Mar23), 100), ]

Apr23 <- filter(joined_Table, month_of == "2023-4-1")
Apr23 <- Apr23[sample(nrow(Apr23), 100), ]

May23 <- filter(joined_Table, month_of == "2023-5-1")
May23 <- May23[sample(nrow(May23), 100), ]

June23 <- filter(joined_Table, month_of == "2023-6-1")
June23 <- June23[sample(nrow(June23), 100), ]

Graph_TB <- joined_Table3 <- July22 %>% 
  full_join(Aug22) %>% 
  full_join(Sep22) %>% 
  full_join(Oct22) %>% 
  full_join(Nov22) %>% 
  full_join(Dec22) %>% 
  full_join(Jan23) %>% 
  full_join(Feb23) %>% 
  full_join(Mar23) %>% 
  full_join(Apr23) %>% 
  full_join(May23) %>% 
  full_join(June23)
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
## Joining with `by = join_by(ride_id, rideable_type, started_at, ended_at,
## start_station_name, start_station_id, end_station_name, end_station_id,
## start_lat, start_lng, end_lat, end_lng, member_casual, month_of)`
unique(joined_Table3$rideable_type)
## [1] "electric_bike" "classic_bike"  "docked_bike"
unique(joined_Table3$member_casual)
## [1] "member" "casual"
# Convert member_casual and month_of to numeric type using if-else

for (i in 1:nrow(joined_Table3)) 
{
if(class(joined_Table3$member_casual) == "character")
  
{
joined_Table3$member_casual  <- as.integer(ifelse((joined_Table3$member_casual) == "member", 1, 0))
}

if(class(joined_Table3$rideable_type) == "character")
{
joined_Table3$rideable_type <- as.integer(ifelse((joined_Table3$rideable_type) == "classic_bike", 1, 
                                     ifelse((joined_Table3$rideable_type) == "electric_bike", 2, 
                                            ifelse((joined_Table3$rideable_type) == "docked_bike", 3, NA))))
}
  
if(class(joined_Table3$month_of) == "character")
{
  joined_Table3$month_of <- as.integer(ifelse((joined_Table3$month_of) == "2022-7-1", 1,
                                   ifelse((joined_Table3$month_of) == "2022-8-1", 2,
                                          ifelse((joined_Table3$month_of) == "2022-9-1", 3,
                                                 ifelse((joined_Table3$month_of) == "2022-10-1", 4,
                                                        ifelse((joined_Table3$month_of) == "2022-11-1", 5,
                                                               ifelse((joined_Table3$month_of) == "2022-12-1", 6,
                                                                      ifelse((joined_Table3$month_of) == "2023-1-1", 7,
                                                                             ifelse((joined_Table3$month_of) == "2023-2-1", 8,
                                                                                    ifelse((joined_Table3$month_of) == "2023-3-1", 9,
                                                                                           ifelse((joined_Table3$month_of) == "2023-4-1", 10,
                                                                                                  ifelse((joined_Table3$month_of) == "2023-5-1", 11,
                                                                                                         ifelse((joined_Table3$month_of) == "2023-6-1", 12, NA)))))))))))))
}

}

# testing the code chunk

unique(joined_Table3$rideable_type)
## [1] 2 1 3
unique(joined_Table3$member_casual)
## [1] 1 0
unique(joined_Table3$month_of)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12

Notes

  1. I checked my data for errors, I checked my cleaning methods and they were good, along side my filters.

  2. I chose R already to start my RMD file, I did this because the data I am working with is to large to import into spreadsheets.

  3. My data has already been filtered and sorted to be worked with.

Analyze

Steps

  1. Aggregate your data so it’s useful and accessible.

  2. Organize and format your data.

  3. Perform calculations.

  4. Identify trends and relationships.

# Run ANOVA on cleaned and formatted data

my_anova1 <- aov(rideable_type ~ member_casual, data = joined_Table3)
my_anova2 <- aov(rideable_type ~ month_of, data = joined_Table3)
my_anova3 <- aov(member_casual ~ month_of, data = joined_Table3)

# Print summary of ANOVA to detirmine any possible relationships
summary(my_anova1)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## member_casual    1   17.6  17.588    61.3 1.07e-14 ***
## Residuals     1198  343.7   0.287                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(my_anova2)
##               Df Sum Sq Mean Sq F value Pr(>F)  
## month_of       1    1.1  1.1457   3.811 0.0512 .
## Residuals   1198  360.2  0.3007                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(my_anova3)
##               Df Sum Sq Mean Sq F value  Pr(>F)   
## month_of       1   1.57  1.5734   7.008 0.00822 **
## Residuals   1198 268.97  0.2245                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# creating graphs

graph1 <- ggplot(Graph_TB, aes(x = member_casual))+ 
    geom_bar(fill ="steelblue")+
  labs(title = "Membership Status")
  
graph2 <- ggplot(Graph_TB, aes(x = rideable_type))+ 
    geom_bar(fill ="darkolivegreen4")+
   labs(title = "Popular Bikes")
   

graph3 <- ggplot(Graph_TB, aes(x = rideable_type, fill = member_casual)) +
  geom_bar(stat = "count", position = "stack") +
  labs(title = "Who uses what?", x = "Rideable Type", y = "Count")

Notes

There’s a clear relationship to members/casual users and bike choice. It’s also seen that most people are already members, and casual users dock their bikes.

Share

Findings

In the following graph, we can see that there is a majority of members already…

In the next graph we have, it’s shown that the most popular bikes are electric.

In this last graph, it’s shown that members choose to use classic bikes and electric bikes more then docking their own.

Recommendations

Make docking fees higher

This may encourage customers to get a membership instead.

Launch campaigns

Show off the newest high tech electric bikes, and pedal bikes.

Utilize promotional campaigns.

Lowering the cost of membership can allow the people who wont buy at a higher price to finally get in.

End Notes

Thanks for reading, this is my second case study as a beginner in R, and a beginner certified Data Analyst.

Each case study I strive to grow with R, and Google.