INTRODUCTION

This capstone project is my first project in my Google Data Analytics Professional Certificate course. In this case study I will be analyzing a public dataset for a fictional company provided by the course. I will be using R programming language for this analysis because of its easy statistical analysis tools and data visualizations.

The following data analysis steps will be followed:

Ask,
Prepare,
Process,
Analyze,
Share,
Act.

The case study roadmap below will be followed on each step:

Code, when needed.
Key tasks.
Deliverables.  

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations. ASK

Three questions will guide the future marketing program:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Lily Moreno (the director of marketing and my manager) has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

Key tasks

Identify the business task

The main objective is to design marketing strategies aimed at converting casual riders to annual members by understanding how they differ.

Consider key stakeholders

Director of Marketing (Lily Moreno), Marketing Analytics team, Executive team.

Deliverable

A clear statement of the business task

To find the differences between the casual riders and annual members.

PREPARE

I will use Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license. Datasets are available here link.

Key tasks

Download data and store it appropriately.

Data has been downloaded and copies have been stored securely on my computer and here on Kaggle.
  1. Identify how it’s organized.

    The data is in CSV (comma-separated values) format, and there are a total of 13 columns.

  2. Sort and filter the data.

    For this analysis, I will be using data for the last 12 months (Feb 2021 - Jan 2022) because it is more current.

  3. Determine the credibility of the data.

Deliverable A description of all data sources used Main source of data provided by the Cylistic company.

Install and load necessary packages

install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")
install.packages("readr")
library(tidyverse)
library(lubridate)
library(ggplot2)
library(readr)

select directory for load file.

setwd("D:/Project/Project 1/Original")

#===================== # STEP 1: COLLECT DATA #=====================

# Upload Divvy datasets (csv files) here
y_2013 <- read.csv("Divvy_Trips_2013.csv")
q1_2_2014 <- read_csv("Divvy_Trips_2014_Q1Q2.csv")
q3_7_2014 <- read_csv("Divvy_Trips_2014-Q3-07.csv")
q3_8_9_2014 <- read_csv("Divvy_Trips_2014-Q3-07.csv")
q4_2014 <- read_csv("Divvy_Trips_2014-Q4.csv")
q1_2015 <- read_csv("Divvy_Trips_2014-Q4.csv")
q2_2015 <- read_csv("Divvy_Trips_2015-Q2.csv")
q3_7_2015 <- read_csv("Divvy_Trips_2015_07.csv")
q3_8_2015 <- read_csv("Divvy_Trips_2015_08.csv")
q3_9_2015 <- read_csv("Divvy_Trips_2015_09.csv")
q4_2015 <- read_csv("Divvy_Trips_2015_Q4.csv")
q1_2016 <- read_csv("Divvy_Trips_2016_Q1.csv")
q2_4_2016 <- read_csv("Divvy_Trips_2016_04.csv")
q2_5_2016 <- read_csv("Divvy_Trips_2016_05.csv")
q2_6_2016 <- read_csv("Divvy_Trips_2016_06.csv")
q3_2016 <- read_csv("Divvy_Trips_2016_Q3.csv")
q4_2016 <- read_csv("Divvy_Trips_2016_Q4.csv")
q1_2017 <- read_csv("Divvy_Trips_2017_Q1.csv")
q2_2017 <- read_csv("Divvy_Trips_2017_Q2.csv")
q3_2017 <- read_csv("Divvy_Trips_2017_Q3.csv")
q4_2017 <- read_csv("Divvy_Trips_2017_Q4.csv")
q1_2018 <- read_csv("Divvy_Trips_2018_Q1.csv")
q2_2018 <- read_csv("Divvy_Trips_2018_Q2.csv")
q3_2018 <- read_csv("Divvy_Trips_2018_Q3.csv")
q4_2018 <- read_csv("Divvy_Trips_2018_Q4.csv")
q1_2019 <- read_csv("Divvy_Trips_2019_Q1.csv")
q2_2019 <- read_csv("Divvy_Trips_2019_Q2.csv")
q3_2019 <- read_csv("Divvy_Trips_2019_Q3.csv")
q4_2019 <- read_csv("Divvy_Trips_2019_Q4.csv")
q1_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")
q2_4_2020 <- read_csv("202004-divvy-tripdata.csv")
q2_5_2020 <- read_csv("202005-divvy-tripdata.csv")
q2_6_2020 <- read_csv("202006-divvy-tripdata.csv")
q3_7_2020 <- read_csv("202301_divvy_tripdata.csv")
q3_8_2020 <- read_csv("202008_divvy_tripdata.csv")
q3_9_2020 <- read_csv("202009_divvy_tripdata.csv")
q4_10_2020 <- read_csv("202010-divvy-tripdata.csv")
q4_11_2020 <- read_csv("202011-divvy-tripdata.csv")
q4_12_2020 <- read_csv("202012-divvy-tripdata.csv")
q1_1_2021 <- read_csv("202101-divvy-tripdata.csv")
q1_2_2021 <- read_csv("202102-divvy-tripdata.csv")
q1_3_2021 <- read_csv("202103-divvy-tripdata.csv")
q2_4_2021 <- read_csv("202104-divvy-tripdata.csv")
q2_5_2021 <- read_csv("202105-divvy-tripdata.csv")
q2_6_2021 <- read_csv("202106-divvy-tripdata.csv")
q3_7_2021 <- read_csv("202107-divvy-tripdata.csv")
q3_8_2021 <- read_csv("202108-divvy-tripdata.csv")
q3_9_2021 <- read.csv("202109-divvy-tripdata.csv")
q4_10_2021 <- read.csv("202110-divvy-tripdata.csv")
q4_11_2021 <- read.csv("202111-divvy-tripdata.csv")
q4_12_2021 <- read.csv("202112-divvy-tripdata.csv")
q1_1_2022 <- read_csv("202201-divvy-tripdata.csv")
q1_2_2022 <- read_csv("202202-divvy-tripdata.csv")
q1_3_2022 <- read_csv("202203-divvy-tripdata.csv")
q2_4_2022 <- read_csv("202204_divvy_tripdata.csv")
q2_5_2022 <- read_csv("202205_divvy_tripdata.csv")
q2_6_2022 <- read_csv("202206_divvy_tripdata.csv")
q3_7_2022 <- read_csv("202207_divvy_tripdata.csv")
q3_8_2022 <- read_csv("202208_divvy_tripdata.csv")
q3_9_2022 <- read.csv("202209_divvy_tripdata.csv")
q4_10_2022 <- read.csv("202210-divvy-tripdata.csv")
q4_11_2022 <- read.csv("202211-divvy-tripdata.csv")
q4_12_2022 <- read.csv("202212-divvy-tripdata.csv")
q1_1_2023 <- read.csv("202301_divvy_tripdata.csv")
q1_2_2023 <- read.csv("202302_divvy_tripdata.csv")
q1_3_2023 <- read.csv("202303_divvy_tripdata.csv")

I ensured that the data sets have the same number of columns and same column names with some extra coumn names which remove further to remove unwanted colmn. Before going forward to merge them, I also checked for inconsistencies.

Merge monthly data set in to queterly for every years for easy refrence. Some data types not match so, also change data type for started_at and ended_at colmn for find out ride_duration for further analysis.

q3_2014 <- bind_rows(q3_7_2014,q3_8_9_2014)
q3_2015 <- bind_rows(q3_7_2015,q3_8_2015,q3_9_2015)
q2_2016 <- bind_rows(q2_4_2016,q2_5_2016,q2_6_2016)
q2_4_2020 <-  mutate(q2_5_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_4_2020 <-  mutate(q2_5_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_5_2020 <-  mutate(q2_5_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_5_2020 <-  mutate(q2_5_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_6_2020 <-  mutate(q2_6_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_6_2020 <-  mutate(q2_6_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2020 <- bind_rows(q2_4_2020,q2_5_2020,q2_6_2020)
q3_7_2020 <-  mutate(q3_7_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_7_2020 <-  mutate(q3_7_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2020 <-  mutate(q3_8_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2020 <-  mutate(q3_8_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2020 <-  mutate(q3_8_2020, start_station_id = as.character(start_station_id))
q3_8_2020 <-  mutate(q3_8_2020, end_station_id = as.character(end_station_id))
q3_9_2020 <-  mutate(q3_9_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2020 <-  mutate(q3_9_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2020 <-  mutate(q3_9_2020, start_station_id = as.character(start_station_id))
q3_9_2020 <-  mutate(q3_9_2020, end_station_id = as.character(end_station_id))
q3_2020 <- bind_rows(q3_7_2020,q3_8_2020,q3_9_2020)
q4_10_2020 <-  mutate(q4_10_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_10_2020 <-  mutate(q4_10_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_11_2020 <-  mutate(q4_11_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_11_2020 <-  mutate(q4_11_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_12_2020 <-  mutate(q4_12_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_12_2020 <-  mutate(q4_12_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_10_2020 <- mutate(q4_10_2020, start_station_id = as.character(start_station_id))
q4_11_2020 <- mutate(q4_11_2020, start_station_id = as.character(start_station_id))
q4_12_2020 <- mutate(q4_12_2020, start_station_id = as.character(start_station_id))
q4_10_2020 <- mutate(q4_10_2020, end_station_id = as.character(end_station_id))
q4_11_2020 <- mutate(q4_11_2020, end_station_id = as.character(end_station_id))
q4_12_2020 <- mutate(q4_12_2020, end_station_id = as.character(end_station_id))
q4_2020 <- bind_rows(q4_10_2020,q4_11_2020,q4_12_2020)
q1_2021 <- bind_rows(q1_1_2021,q1_2_2021,q1_3_2021)
q2_2021 <- bind_rows(q2_4_2021,q2_5_2021,q2_6_2021)
q3_9_2021 <-  mutate(q3_9_2021, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2021 <-  mutate(q3_9_2021, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2021 <- bind_rows(q3_7_2021,q3_8_2021,q3_9_2021)
q4_2021 <- bind_rows(q4_10_2021,q4_11_2021,q4_12_2021)
q1_2022 <- bind_rows(q1_1_2022,q1_2_2022,q1_3_2022)
q2_2022 <- bind_rows(q2_4_2022,q2_5_2022,q2_6_2022)
q3_7_2022 <-  mutate(q3_7_2022, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_7_2022 <-  mutate(q3_7_2022, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2022 <-  mutate(q3_8_2022, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2022 <-  mutate(q3_8_2022, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2022 <-  mutate(q3_9_2022, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2022 <-  mutate(q3_9_2022, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2022 <- bind_rows(q3_7_2022,q3_8_2022,q3_9_2022)
q4_2022 <- bind_rows(q4_10_2022,q4_11_2022,q4_12_2022)
q1_2023 <- bind_rows(q1_1_2023,q1_2_2023,q1_3_2023)

#==================================================== # STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE #==================================================== Compare column names each of the files While the names don’t have to be in the same order, they DO need to match perfectly before we can use a command to join them into one file

colnames(y_2013)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthday"         
colnames(q1_2_2014)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q3_2014)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q4_2014)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q1_2015)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q1_2016)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q1_2017)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q1_2018)
 [1] "01 - Rental Details Rental ID"                    "01 - Rental Details Local Start Time"            
 [3] "01 - Rental Details Local End Time"               "01 - Rental Details Bike ID"                     
 [5] "01 - Rental Details Duration In Seconds Uncapped" "03 - Rental Start Station ID"                    
 [7] "03 - Rental Start Station Name"                   "02 - Rental End Station ID"                      
 [9] "02 - Rental End Station Name"                     "User Type"                                       
[11] "Member Gender"                                    "05 - Member Details Member Birthday Year"        
colnames(q1_2019)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q2_2015)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q2_2016)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q2_2017)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q2_2018)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q2_2019)
 [1] "01 - Rental Details Rental ID"                    "01 - Rental Details Local Start Time"            
 [3] "01 - Rental Details Local End Time"               "01 - Rental Details Bike ID"                     
 [5] "01 - Rental Details Duration In Seconds Uncapped" "03 - Rental Start Station ID"                    
 [7] "03 - Rental Start Station Name"                   "02 - Rental End Station ID"                      
 [9] "02 - Rental End Station Name"                     "User Type"                                       
[11] "Member Gender"                                    "05 - Member Details Member Birthday Year"        
colnames(q3_2015)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q3_2016)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q3_2017)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q3_2018)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q3_2019)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q4_2015)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q4_2016)
 [1] "trip_id"           "starttime"         "stoptime"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q4_2017)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q4_2018)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q4_2019)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"            "tripduration"     
 [6] "from_station_id"   "from_station_name" "to_station_id"     "to_station_name"   "usertype"         
[11] "gender"            "birthyear"        
colnames(q1_2020)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q1_2022)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q1_2023)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q2_2020)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q2_2022)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q3_2020)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q3_2022)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q4_2020)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q4_2022)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q1_2021)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q2_2021)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q3_2021)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     
colnames(q4_2021)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"           "start_station_name"
 [6] "start_station_id"   "end_station_name"   "end_station_id"     "start_lat"          "start_lng"         
[11] "end_lat"            "end_lng"            "member_casual"     

Rename columns to make them consistent with q1_2020 to q1_2023 (as this will be the supposed going-forward table design for Divvy)

#Rename columns  to make them consistent with q1_2020 to q1_2023 (as this will be the supposed going-forward table design for Divvy)
(y_2013 <- rename(y_2013
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2_2014 <- rename(q1_2_2014
                  ,ride_id = trip_id
                  ,rideable_type = bikeid 
                  ,started_at = starttime  
                  ,ended_at = stoptime  
                  ,start_station_name = from_station_name 
                  ,start_station_id = from_station_id 
                  ,end_station_name = to_station_name 
                  ,end_station_id = to_station_id 
                  ,member_casual = usertype))

(q3_2014 <- rename(q3_2014
                     ,ride_id = trip_id
                     ,rideable_type = bikeid 
                     ,started_at = starttime  
                     ,ended_at = stoptime  
                     ,start_station_name = from_station_name 
                     ,start_station_id = from_station_id 
                     ,end_station_name = to_station_name 
                     ,end_station_id = to_station_id 
                     ,member_casual = usertype))
(q4_2014 <- rename(q4_2014
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2015 <- rename(q1_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))
(q1_2016 <- rename(q1_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2017 <- rename(q1_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2018 <- rename(q1_2018
                   ,ride_id = "01 - Rental Details Rental ID"
                   ,rideable_type = "01 - Rental Details Bike ID" 
                   ,started_at = "01 - Rental Details Local Start Time"  
                   ,ended_at = "01 - Rental Details Local End Time"  
                   ,start_station_name = "03 - Rental Start Station Name" 
                   ,start_station_id = "03 - Rental Start Station ID"
                   ,end_station_name = "02 - Rental End Station Name" 
                   ,end_station_id = "02 - Rental End Station ID"
                   ,member_casual = "User Type"))

(q1_2019 <- rename(q1_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2015 <- rename(q2_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2016 <- rename(q2_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2017 <- rename(q2_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2018 <- rename(q2_2018
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2019 <- rename(q2_2019
                   ,ride_id = "01 - Rental Details Rental ID"
                   ,rideable_type = "01 - Rental Details Bike ID" 
                   ,started_at = "01 - Rental Details Local Start Time"  
                   ,ended_at = "01 - Rental Details Local End Time"  
                   ,start_station_name = "03 - Rental Start Station Name" 
                   ,start_station_id = "03 - Rental Start Station ID"
                   ,end_station_name = "02 - Rental End Station Name" 
                   ,end_station_id = "02 - Rental End Station ID"
                   ,member_casual = "User Type"))

(q3_2015 <- rename(q3_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2016 <- rename(q3_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2017 <- rename(q3_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2018 <- rename(q3_2018
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2015 <- rename(q4_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2016 <- rename(q4_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2017 <- rename(q4_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2018 <- rename(q4_2018
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2019 <- rename(q4_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2019 <- rename(q3_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))
#Change Datatype of colmn, So they stack correctly
q1_2017 <- mutate(q1_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2017 <- mutate(q1_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2018 <- mutate(q1_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2018 <- mutate(q1_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2019 <- mutate(q1_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2019 <- mutate(q1_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2020 <- mutate(q1_2020,ride_id = as.character(ride_id))
q1_2017 <- mutate(q1_2017,ride_id = as.character(ride_id))
q1_2016 <- mutate(q1_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2016 <- mutate(q1_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2016 <- mutate(q1_2016,ride_id = as.character(ride_id))
q1_2015 <- mutate(q1_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2015 <- mutate(q1_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2015 <- mutate(q1_2015,ride_id = as.character(ride_id))
q1_2_2014 <- mutate(q1_2_2014,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,ride_id = as.character(ride_id))
y_2013 <- mutate(y_2013,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
y_2013 <- mutate(y_2013,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
y_2013 <- mutate(y_2013,ride_id = as.character(ride_id))
q1_2018 <- mutate(q1_2018,ride_id = as.character(ride_id))
q1_2019 <- mutate(q1_2019,ride_id = as.character(ride_id))
q1_2020 <- mutate(q1_2020,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2020 <- mutate(q1_2020,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,rideable_type = as.character(rideable_type))
y_2013 <- mutate(y_2013,rideable_type = as.character(rideable_type))
q1_2015 <- mutate(q1_2015,rideable_type = as.character(rideable_type))
q1_2016 <- mutate(q1_2016,rideable_type = as.character(rideable_type))
q1_2017 <- mutate(q1_2017,rideable_type = as.character(rideable_type))
q1_2018 <- mutate(q1_2018,rideable_type = as.character(rideable_type))
q1_2019 <- mutate(q1_2019,rideable_type = as.character(rideable_type))
q1_2022 <- mutate(q1_2022,rideable_type = as.character(rideable_type))
q1_2022 <- mutate(q1_2022,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2022 <- mutate(q1_2022,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,ride_id = as.character(ride_id))
q2_2015 <- mutate(q2_2015,ride_id = as.character(ride_id))
q2_2015 <- mutate(q2_2015,rideable_type = as.character(rideable_type))
q2_2015 <- mutate(q2_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2015 <- mutate(q2_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2016 <- mutate(q2_2016,ride_id = as.character(ride_id))
q2_2016 <- mutate(q2_2016,rideable_type = as.character(rideable_type))
q2_2016 <- mutate(q2_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2016 <- mutate(q2_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2017 <- mutate(q2_2017,ride_id = as.character(ride_id))
q2_2017 <- mutate(q2_2017,rideable_type = as.character(rideable_type))
q2_2017 <- mutate(q2_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2017 <- mutate(q2_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2018 <- mutate(q2_2018,ride_id = as.character(ride_id))
q2_2018 <- mutate(q2_2018,rideable_type = as.character(rideable_type))
q2_2018 <- mutate(q2_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2018 <- mutate(q2_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,ride_id = as.character(ride_id))
q2_2019 <- mutate(q2_2019,rideable_type = as.character(rideable_type))
q2_2019 <- mutate(q2_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2022 <- mutate(q2_2022,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2022 <- mutate(q2_2022,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2014 <- mutate(q3_2014,ride_id = as.character(ride_id))
q3_2014 <- mutate(q3_2014,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2014 <- mutate(q3_2014,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2014 <- mutate(q3_2014,rideable_type = as.character(rideable_type))
q3_2015 <- mutate(q3_2015,ride_id = as.character(ride_id))
q3_2015 <- mutate(q3_2015,rideable_type = as.character(rideable_type))
q3_2015 <- mutate(q3_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2015 <- mutate(q3_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2016 <- mutate(q3_2016,ride_id = as.character(ride_id))
q3_2016 <- mutate(q3_2016,rideable_type = as.character(rideable_type))
q3_2016 <- mutate(q3_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2016 <- mutate(q3_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2017 <- mutate(q3_2017,ride_id = as.character(ride_id))
q3_2017 <- mutate(q3_2017,rideable_type = as.character(rideable_type))
q3_2017 <- mutate(q3_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2017 <- mutate(q3_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2018 <- mutate(q3_2018,ride_id = as.character(ride_id))
q3_2018 <- mutate(q3_2018,rideable_type = as.character(rideable_type))
q3_2018 <- mutate(q3_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2018 <- mutate(q3_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2019 <- mutate(q3_2019,ride_id = as.character(ride_id))
q3_2019 <- mutate(q3_2019,rideable_type = as.character(rideable_type))
q3_2019 <- mutate(q3_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2019 <- mutate(q3_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2014 <- mutate(q4_2014,ride_id = as.character(ride_id))
q4_2014 <- mutate(q4_2014,rideable_type = as.character(rideable_type))
q4_2014 <- mutate(q4_2014,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2014 <- mutate(q4_2014,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2015 <- mutate(q4_2015,ride_id = as.character(ride_id))
q4_2015 <- mutate(q4_2015,rideable_type = as.character(rideable_type))
q4_2015 <- mutate(q4_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2015 <- mutate(q4_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2016 <- mutate(q4_2016,ride_id = as.character(ride_id))
q4_2016 <- mutate(q4_2016,rideable_type = as.character(rideable_type))
q4_2016 <- mutate(q4_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2016 <- mutate(q4_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2017 <- mutate(q4_2017,ride_id = as.character(ride_id))
q4_2017 <- mutate(q4_2017,rideable_type = as.character(rideable_type))
q4_2017 <- mutate(q4_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2017 <- mutate(q4_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2018 <- mutate(q4_2018,ride_id = as.character(ride_id))
q4_2018 <- mutate(q4_2018,rideable_type = as.character(rideable_type))
q4_2018 <- mutate(q4_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2018 <- mutate(q4_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2019 <- mutate(q4_2019,ride_id = as.character(ride_id))
q4_2019 <- mutate(q4_2019,rideable_type = as.character(rideable_type))
q4_2019 <- mutate(q4_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2019 <- mutate(q4_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2023 <- mutate(q1_2023,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2023 <- mutate(q1_2023,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2021 <- mutate(q4_2021,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2021 <- mutate(q4_2021,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2022 <- mutate(q4_2022,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2022 <- mutate(q4_2022,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
y_2013 <- mutate(y_2013,end_station_id = as.character(end_station_id))
y_2013 <- mutate(y_2013,start_station_id = as.character(start_station_id))
q1_2_2014 <- mutate(q1_2_2014,start_station_id = as.character(start_station_id))
q1_2_2014 <- mutate(q1_2_2014,end_station_id = as.character(end_station_id))
q3_2014 <- mutate(q3_2014,start_station_id = as.character(start_station_id))
q3_2014 <- mutate(q3_2014,end_station_id = as.character(end_station_id))
q4_2014 <- mutate(q4_2014,start_station_id = as.character(start_station_id))
q4_2014 <- mutate(q4_2014,end_station_id = as.character(end_station_id))
q1_2015 <- mutate(q1_2015,start_station_id = as.character(start_station_id))
q1_2015 <- mutate(q1_2015,end_station_id = as.character(end_station_id))
q2_2015 <- mutate(q2_2015,start_station_id = as.character(start_station_id))
q2_2015 <- mutate(q2_2015,end_station_id = as.character(end_station_id))
q3_2015 <- mutate(q3_2015,start_station_id = as.character(start_station_id))
q3_2015 <- mutate(q3_2015,end_station_id = as.character(end_station_id))
q4_2015 <- mutate(q4_2015,start_station_id = as.character(start_station_id))
q4_2015 <- mutate(q4_2015,end_station_id = as.character(end_station_id))
q1_2016 <- mutate(q1_2016,start_station_id = as.character(start_station_id))
q1_2016 <- mutate(q1_2016,end_station_id = as.character(end_station_id))
q2_2016 <- mutate(q2_2016,start_station_id = as.character(start_station_id))
q2_2016 <- mutate(q2_2016,end_station_id = as.character(end_station_id))
q3_2016 <- mutate(q3_2016,start_station_id = as.character(start_station_id))
q3_2016 <- mutate(q3_2016,end_station_id = as.character(end_station_id))
q4_2016 <- mutate(q4_2016,start_station_id = as.character(start_station_id))
q4_2016 <- mutate(q4_2016,end_station_id = as.character(end_station_id))
q1_2016 <- mutate(q1_2016,start_station_id = as.character(start_station_id))
q1_2017 <- mutate(q1_2017,start_station_id = as.character(start_station_id))
q1_2017 <- mutate(q1_2017,end_station_id = as.character(end_station_id))
q2_2017 <- mutate(q2_2017,start_station_id = as.character(start_station_id))
q2_2017 <- mutate(q2_2017,end_station_id = as.character(end_station_id))
q3_2017 <- mutate(q3_2017,start_station_id = as.character(start_station_id))
q3_2017 <- mutate(q3_2017,end_station_id = as.character(end_station_id))
q4_2017 <- mutate(q4_2017,start_station_id = as.character(start_station_id))
q4_2017 <- mutate(q4_2017,end_station_id = as.character(end_station_id))
q1_2018 <- mutate(q1_2018,start_station_id = as.character(start_station_id))
q1_2018 <- mutate(q1_2018,end_station_id = as.character(end_station_id))
q2_2018 <- mutate(q2_2018,start_station_id = as.character(start_station_id))
q2_2018 <- mutate(q2_2018,end_station_id = as.character(end_station_id))
q3_2018 <- mutate(q3_2018,start_station_id = as.character(start_station_id))
q3_2018 <- mutate(q3_2018,end_station_id = as.character(end_station_id))
q4_2018 <- mutate(q4_2018,start_station_id = as.character(start_station_id))
q4_2018 <- mutate(q4_2018,end_station_id = as.character(end_station_id))
q1_2019 <- mutate(q1_2019,start_station_id = as.character(start_station_id))
q1_2019 <- mutate(q1_2019,end_station_id = as.character(end_station_id))
q2_2019 <- mutate(q2_2019,start_station_id = as.character(start_station_id))
q2_2019 <- mutate(q2_2019,end_station_id = as.character(end_station_id))
q3_2019 <- mutate(q3_2019,start_station_id = as.character(start_station_id))
q3_2019 <- mutate(q3_2019,end_station_id = as.character(end_station_id))
q4_2019 <- mutate(q4_2019,start_station_id = as.character(start_station_id))
q4_2019 <- mutate(q4_2019,end_station_id = as.character(end_station_id))
q1_2020 <- mutate(q1_2020,start_station_id = as.character(start_station_id))
q1_2020 <- mutate(q1_2020,end_station_id = as.character(end_station_id))
q2_2020 <- mutate(q2_2020,start_station_id = as.character(start_station_id))
q2_2020 <- mutate(q2_2020,end_station_id = as.character(end_station_id))
q3_2020 <- mutate(q3_2020,start_station_id = as.character(start_station_id))
q3_2020 <- mutate(q3_2020,end_station_id = as.character(end_station_id))
q4_2020 <- mutate(q4_2020,start_station_id = as.character(start_station_id))
q4_2020 <- mutate(q4_2020,end_station_id = as.character(end_station_id))
# Remove unwanted colmme

y_2013 <- y_2013 %>%  
  select(-c(birthday, gender, "tripduration"))

q1_2_2014 <- q1_2_2014 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2014 <- q3_2014 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2014 <- q4_2014 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2015 <- q1_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q2_2015 <- q2_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2015 <- q3_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2015 <- q4_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2016 <- q1_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q2_2016 <- q2_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2016 <- q3_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2016 <- q4_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2017 <- q1_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q2_2017 <- q2_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2017 <- q3_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2017 <- q4_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2018 <- q1_2018 %>%  
  select(-c("01 - Rental Details Duration In Seconds Uncapped", "05 - Member Details Member Birthday Year", "Member Gender"))
q2_2018 <- q2_2018 %>%  
  select(-c(birthyear, gender, "tripduration"))
q3_2018 <- q3_2018 %>%  
  select(-c(birthyear, gender, "tripduration"))
q4_2018 <- q4_2018 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2019 <- q1_2019 %>%  
  select(-c(birthyear, gender, "tripduration"))
q2_2019 <- q2_2019 %>%  
  select(-c("01 - Rental Details Duration In Seconds Uncapped", "05 - Member Details Member Birthday Year", "Member Gender"))
q3_2019 <- q3_2019 %>%  
  select(-c(birthyear, gender, "tripduration"))
q4_2019 <- q4_2019 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2020 <- q1_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q2_2020 <- q2_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q3_2020 <- q3_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q4_2020 <- q4_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

q1_2021 <- q1_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q2_2021 <- q2_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q3_2021 <- q3_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q4_2021 <- q4_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

q1_2022 <- q1_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q2_2022 <- q2_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q3_2022 <- q3_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q4_2022 <- q4_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

q1_2023 <- q1_2023 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

Make Big Dataset

#### Stack individual quarter's data frames into one big data frame
all_trips <- bind_rows(y_2013,q1_2_2014,q1_2015,q1_2016,q1_2017,q1_2018,q1_2019,q1_2020,q1_2021,q1_2022,q1_2023,q2_2015,q2_2016,q2_2017,q2_2018,q2_2019,q2_2020,q2_2021,q2_2022,q3_2014,q3_2015,q3_2016,q3_2017,q3_2018,q3_2019,q3_2020,q3_2021,q3_2022,q4_2014,q4_2015,q4_2016,q4_2017,q4_2018,q4_2019,q4_2020,q4_2021,q4_2022)

#====================================================== # STEP 3: CLEAN UP AND ADD DATA TO PREPARE FOR ANALYSIS #======================================================

#PROCESS

Cleaning and preparing data for analysis

Key tasks

Check the data for errors.
Choose your tools.
Transform the data so you can work with it effectively.
Document the cleaning process.

Deliverable

Following code chunks will be used for this phase.

# Inspect the new table that has been created
colnames(all_trips)  #List of column names
[1] "ride_id"            "started_at"         "ended_at"           "rideable_type"      "start_station_id"  
[6] "start_station_name" "end_station_id"     "end_station_name"   "member_casual"     
nrow(all_trips)  #How many rows are in data frame?
[1] 36386876
dim(all_trips)  #Dimensions of the data frame?
[1] 36386876        9
head(all_trips)  #See the first 6 rows of data frame.  Also tail(all_trips)
str(all_trips)  #See list of columns and data types (numeric, character, etc)
'data.frame':   36386876 obs. of  9 variables:
 $ ride_id           : chr  "4118" "4275" "4291" "4316" ...
 $ started_at        : POSIXct, format: NA NA NA NA ...
 $ ended_at          : POSIXct, format: NA NA NA NA ...
 $ rideable_type     : chr  "480" "77" "77" "77" ...
 $ start_station_id  : chr  "85" "32" "32" "19" ...
 $ start_station_name: chr  "Michigan Ave & Oak St" "Racine Ave & Congress Pkwy" "Racine Ave & Congress Pkwy" "Loomis St & Taylor St" ...
 $ end_station_id    : chr  "28" "32" "19" "19" ...
 $ end_station_name  : chr  "Larrabee St & Menomonee St" "Racine Ave & Congress Pkwy" "Loomis St & Taylor St" "Loomis St & Taylor St" ...
 $ member_casual     : chr  "Customer" "Customer" "Customer" "Customer" ...
summary(all_trips)  #Statistical summary of data. Mainly for numerics
   ride_id            started_at                        ended_at                      rideable_type     
 Length:36386876    Min.   :2018-01-01 00:12:00.00   Min.   :2018-01-01 00:17:23.00   Length:36386876   
 Class :character   1st Qu.:2019-08-02 03:53:23.75   1st Qu.:2019-08-02 05:40:07.50   Class :character  
 Mode  :character   Median :2021-05-05 10:03:34.00   Median :2021-05-05 10:24:38.00   Mode  :character  
                    Mean   :2020-11-29 12:10:14.64   Mean   :2020-11-29 12:31:58.40                     
                    3rd Qu.:2022-04-28 16:27:38.50   3rd Qu.:2022-04-28 16:39:33.25                     
                    Max.   :2023-03-31 23:59:28.00   Max.   :2023-04-03 11:41:11.00                     
                    NA's   :13767584                 NA's   :13767584                                   
 start_station_id   start_station_name end_station_id     end_station_name   member_casual     
 Length:36386876    Length:36386876    Length:36386876    Length:36386876    Length:36386876   
 Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                                               
                                                                                               
                                                                                               
                                                                                               

Remove bias which are null values

is.na(all_trips)
all_trips <- na.omit(all_trips)

There are a few problems we will need to fix:

(1) In the “member_casual” column, there are two names for members (“member” and “Subscriber”) and two names for casual riders (“Customer” and “casual”). We will need to consolidate that from four to two labels.

(2) The data can only be aggregated at the ride-level, which is too granular. We will want to add some additional columns of data – such as day, month, year – that provide additional opportunities to aggregate the data.

(3) We will want to add a calculated field for length of ride since the 2020Q1 data did not have the “tripduration” column. We will add “ride_length” to the entire dataframe for consistency.

(4) There are some rides where tripduration shows up as negative, including several hundred rides where Divvy took bikes out of circulation for Quality Control reasons. We will want to delete these rides.

In the “member_casual” column, replace “Subscriber” with “member” and “Customer” with “casual”

Before 2020, Divvy used different labels for these two types of riders … we will want to make our dataframe consistent with their current nomenclature

N.B.: “Level” is a special property of a column that is retained even if a subset does not contain any values from a specific level

Begin by seeing how many observations fall under each usertype

table(all_trips$member_casual)

    casual   Customer     member Subscriber 
   5451378    1557793    8150672    5863293 
# Reassign to the desired values
all_trips <-  all_trips %>% 
  mutate(member_casual = recode(member_casual
                                ,"Subscriber" = "member"
                                ,"Customer" = "casual"))
# Check to make sure the proper number of observations were reassigned
table(all_trips$member_casual)

  casual   member 
 7009171 14013965 

Add columns that list the date, month, day, and year of each ride

This will allow us to aggregate ride data for each month, day, or year … before completing these operations we could only aggregate at the ride level

Add a “ride_length” calculation to all_trips (in seconds)

https://stat.ethz.ch/R-manual/R-devel/library/base/html/difftime.html

all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

Inspect the structure of the columns

str(all_trips)
'data.frame':   21023136 obs. of  15 variables:
 $ ride_id           : chr  "17536702" "17536703" "17536704" "17536705" ...
 $ started_at        : POSIXct, format: "2018-01-01 00:12:00" "2018-01-01 00:41:35" "2018-01-01 00:44:46" "2018-01-01 00:53:10" ...
 $ ended_at          : POSIXct, format: "2018-01-01 00:17:23" "2018-01-01 00:47:52" "2018-01-01 01:33:10" "2018-01-01 01:05:37" ...
 $ rideable_type     : chr  "3304" "5367" "4599" "2302" ...
 $ start_station_id  : chr  "69" "253" "98" "125" ...
 $ start_station_name: chr  "Damen Ave & Pierce Ave" "Winthrop Ave & Lawrence Ave" "LaSalle St & Washington St" "Rush St & Hubbard St" ...
 $ end_station_id    : chr  "159" "325" "509" "364" ...
 $ end_station_name  : chr  "Claremont Ave & Hirsch St" "Clark St & Winnemac Ave (Temp)" "Troy St & North Ave" "Larrabee St & Oak St" ...
 $ member_casual     : chr  "member" "member" "member" "member" ...
 $ date              : Date, format: "2018-01-01" "2018-01-01" "2018-01-01" "2018-01-01" ...
 $ month             : chr  "01" "01" "01" "01" ...
 $ day               : chr  "01" "01" "01" "01" ...
 $ year              : chr  "2018" "2018" "2018" "2018" ...
 $ day_of_week       : chr  "Monday" "Monday" "Monday" "Monday" ...
 $ ride_length       : 'difftime' num  323 377 2904 747 ...
  ..- attr(*, "units")= chr "secs"
 - attr(*, "na.action")= 'omit' Named int [1:15363740] 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, "names")= chr [1:15363740] "1" "2" "3" "4" ...

Convert “ride_length” from Factor to numeric so we can run calculations on the data

is.factor(all_trips$ride_length)
[1] FALSE
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)
[1] TRUE

Remove “bad” data

The dataframe includes a few hundred entries when bikes were taken out of docks and checked for quality by Divvy or ride_length was negative

We will create a new version of the dataframe (v2) since data is being removed

https://www.datasciencemadesimple.com/delete-or-drop-rows-in-r-with-conditions-2

all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

#===================================== # STEP 4: CONDUCT DESCRIPTIVE ANALYSIS #=====================================

# Descriptive analysis on ride_length (all figures in seconds)
mean(all_trips_v2$ride_length) #straight average (total ride length / rides)
[1] 1312.669
median(all_trips_v2$ride_length) #midpoint number in the ascending array of ride lengths
[1] 685
max(all_trips_v2$ride_length) #longest ride
[1] 14340041
min(all_trips_v2$ride_length) #shortest ride
[1] 0
# You can condense the four lines above to one line using summary() on the specific attribute
summary(all_trips_v2$ride_length)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
       0      390      685     1313     1250 14340041 
# Compare members and casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
# See the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
# Notice that the days of the week are out of order. Let's fix that.
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
# Now, let's run the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
# analyze ridership data by type and weekday
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n()                           #calculates the number of rides and average duration 
            ,average_duration = mean(ride_length)) %>%      # calculates the average duration
  arrange(member_casual, weekday)                               # sorts
`summarise()` has grouped output by 'member_casual'. You can override using the `.groups` argument.

#Now Times to Visualize the data

# Let's visualize the number of rides by rider type
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")
`summarise()` has grouped output by 'member_casual'. You can override using the `.groups` argument.

# Let's create a visualization for average duration
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")
`summarise()` has grouped output by 'member_casual'. You can override using the `.groups` argument.

#================================================= # STEP 5: EXPORT SUMMARY FILE FOR FURTHER ANALYSIS #=================================================

# Create a csv file that we will visualize in Excel, Tableau, or my presentation software
# N.B.: This file location is for a Mac. If you are working on a PC, change the file location accordingly (most likely "C:\Users\YOUR_USERNAME\Desktop\...") to export the data. You can read more here: https://datatofish.com/export-dataframe-to-csv-in-r/

counts <- aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
write.csv(counts, file = 'D:/Project/Project 1/final/avg_ride_length.csv')

#ACT

This phase will be carried out by the executive team, Director of Marketing (Lily Moreno) and the Marketing Analytics team based on my analysis. Conclusion

Casual riders go farther in terms of distance.
Casual riders travel for a longer time period.
Members ride less on the weekend compared to casual riders.

Deliverable

Your top three recommendations based on your analysis

Have a slash sale or promo for casual riders so they can acquire more bikes and indulge them in the benefits of being a member.
Encourage member riders to ride on weekends by giving them various coupons or extending their membership by a period of time.
Host fun biking competitions with prizes at intervals for members on the weekends. Since there are lot of casual riders on weekends,this will also attract them to get a membership.
---
title: "Divvy Cyclistic Notebook"
output:
  html_notebook: default
  word_document: default
  pdf_document: default
  html_document:
    df_print: paged
---
INTRODUCTION

This capstone project is my first project in my Google Data Analytics Professional Certificate course. In this case study I will be analyzing a public dataset for a fictional company provided by the course. I will be using R programming language for this analysis because of its easy statistical analysis tools and data visualizations.

The following data analysis steps will be followed:

    Ask,
    Prepare,
    Process,
    Analyze,
    Share,
    Act.
    
The case study roadmap below will be followed on each step:

    Code, when needed.
    Key tasks.
    Deliverables.  
    
Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
ASK

    Three questions will guide the future marketing program:

    How do annual members and casual riders use Cyclistic bikes differently?
    Why would casual riders buy Cyclistic annual memberships?
    How can Cyclistic use digital media to influence casual riders to become members?

Lily Moreno (the director of marketing and my manager) has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

Key tasks

    Identify the business task

    The main objective is to design marketing strategies aimed at converting casual riders to annual members by understanding how they differ.

    Consider key stakeholders

    Director of Marketing (Lily Moreno), Marketing Analytics team, Executive team.

Deliverable

    A clear statement of the business task

    To find the differences between the casual riders and annual members.

PREPARE

    I will use Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license. Datasets are available here link.

Key tasks

    Download data and store it appropriately.

    Data has been downloaded and copies have been stored securely on my computer and here on Kaggle.

2. Identify how it’s organized.

    The data is in CSV (comma-separated values) format, and there are a total of 13 columns.

3. Sort and filter the data.

    For this analysis, I will be using data for the last 12 months (Feb 2021 - Jan 2022) because it is more current.

4. Determine the credibility of the data.

Deliverable
  A description of all data sources used
  Main source of data provided by the Cylistic company.

Install and load necessary packages

```{r}
install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")
install.packages("readr")
```

```{r}
library(tidyverse)
library(lubridate)
library(ggplot2)
library(readr)
```

select directory for load file.
```{r}
setwd("D:/Project/Project 1/Original")
```
#=====================
# STEP 1: COLLECT DATA
#=====================
```{r}
# Upload Divvy datasets (csv files) here
y_2013 <- read.csv("Divvy_Trips_2013.csv")
q1_2_2014 <- read_csv("Divvy_Trips_2014_Q1Q2.csv")
q3_7_2014 <- read_csv("Divvy_Trips_2014-Q3-07.csv")
q3_8_9_2014 <- read_csv("Divvy_Trips_2014-Q3-07.csv")
q4_2014 <- read_csv("Divvy_Trips_2014-Q4.csv")
q1_2015 <- read_csv("Divvy_Trips_2014-Q4.csv")
q2_2015 <- read_csv("Divvy_Trips_2015-Q2.csv")
q3_7_2015 <- read_csv("Divvy_Trips_2015_07.csv")
q3_8_2015 <- read_csv("Divvy_Trips_2015_08.csv")
q3_9_2015 <- read_csv("Divvy_Trips_2015_09.csv")
q4_2015 <- read_csv("Divvy_Trips_2015_Q4.csv")
q1_2016 <- read_csv("Divvy_Trips_2016_Q1.csv")
q2_4_2016 <- read_csv("Divvy_Trips_2016_04.csv")
q2_5_2016 <- read_csv("Divvy_Trips_2016_05.csv")
q2_6_2016 <- read_csv("Divvy_Trips_2016_06.csv")
q3_2016 <- read_csv("Divvy_Trips_2016_Q3.csv")
q4_2016 <- read_csv("Divvy_Trips_2016_Q4.csv")
q1_2017 <- read_csv("Divvy_Trips_2017_Q1.csv")
q2_2017 <- read_csv("Divvy_Trips_2017_Q2.csv")
q3_2017 <- read_csv("Divvy_Trips_2017_Q3.csv")
q4_2017 <- read_csv("Divvy_Trips_2017_Q4.csv")
q1_2018 <- read_csv("Divvy_Trips_2018_Q1.csv")
q2_2018 <- read_csv("Divvy_Trips_2018_Q2.csv")
q3_2018 <- read_csv("Divvy_Trips_2018_Q3.csv")
q4_2018 <- read_csv("Divvy_Trips_2018_Q4.csv")
q1_2019 <- read_csv("Divvy_Trips_2019_Q1.csv")
q2_2019 <- read_csv("Divvy_Trips_2019_Q2.csv")
q3_2019 <- read_csv("Divvy_Trips_2019_Q3.csv")
q4_2019 <- read_csv("Divvy_Trips_2019_Q4.csv")
q1_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")
q2_4_2020 <- read_csv("202004-divvy-tripdata.csv")
q2_5_2020 <- read_csv("202005-divvy-tripdata.csv")
q2_6_2020 <- read_csv("202006-divvy-tripdata.csv")
q3_7_2020 <- read_csv("202301_divvy_tripdata.csv")
q3_8_2020 <- read_csv("202008_divvy_tripdata.csv")
q3_9_2020 <- read_csv("202009_divvy_tripdata.csv")
q4_10_2020 <- read_csv("202010-divvy-tripdata.csv")
q4_11_2020 <- read_csv("202011-divvy-tripdata.csv")
q4_12_2020 <- read_csv("202012-divvy-tripdata.csv")
q1_1_2021 <- read_csv("202101-divvy-tripdata.csv")
q1_2_2021 <- read_csv("202102-divvy-tripdata.csv")
q1_3_2021 <- read_csv("202103-divvy-tripdata.csv")
q2_4_2021 <- read_csv("202104-divvy-tripdata.csv")
q2_5_2021 <- read_csv("202105-divvy-tripdata.csv")
q2_6_2021 <- read_csv("202106-divvy-tripdata.csv")
q3_7_2021 <- read_csv("202107-divvy-tripdata.csv")
q3_8_2021 <- read_csv("202108-divvy-tripdata.csv")
q3_9_2021 <- read.csv("202109-divvy-tripdata.csv")
q4_10_2021 <- read.csv("202110-divvy-tripdata.csv")
q4_11_2021 <- read.csv("202111-divvy-tripdata.csv")
q4_12_2021 <- read.csv("202112-divvy-tripdata.csv")
q1_1_2022 <- read_csv("202201-divvy-tripdata.csv")
q1_2_2022 <- read_csv("202202-divvy-tripdata.csv")
q1_3_2022 <- read_csv("202203-divvy-tripdata.csv")
q2_4_2022 <- read_csv("202204_divvy_tripdata.csv")
q2_5_2022 <- read_csv("202205_divvy_tripdata.csv")
q2_6_2022 <- read_csv("202206_divvy_tripdata.csv")
q3_7_2022 <- read_csv("202207_divvy_tripdata.csv")
q3_8_2022 <- read_csv("202208_divvy_tripdata.csv")
q3_9_2022 <- read.csv("202209_divvy_tripdata.csv")
q4_10_2022 <- read.csv("202210-divvy-tripdata.csv")
q4_11_2022 <- read.csv("202211-divvy-tripdata.csv")
q4_12_2022 <- read.csv("202212-divvy-tripdata.csv")
q1_1_2023 <- read.csv("202301_divvy_tripdata.csv")
q1_2_2023 <- read.csv("202302_divvy_tripdata.csv")
q1_3_2023 <- read.csv("202303_divvy_tripdata.csv")
```
I ensured that the data sets have the same number of columns and same column names with some extra coumn names which remove further to remove unwanted colmn. Before going forward to merge them, I also checked for inconsistencies.

Merge monthly data set in to queterly for every years for easy refrence. Some data types not match so, also change data type for started_at and ended_at colmn for find out ride_duration for further analysis.

```{r}
q3_2014 <- bind_rows(q3_7_2014,q3_8_9_2014)
q3_2015 <- bind_rows(q3_7_2015,q3_8_2015,q3_9_2015)
q2_2016 <- bind_rows(q2_4_2016,q2_5_2016,q2_6_2016)
q2_4_2020 <-  mutate(q2_5_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_4_2020 <-  mutate(q2_5_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_5_2020 <-  mutate(q2_5_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_5_2020 <-  mutate(q2_5_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_6_2020 <-  mutate(q2_6_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_6_2020 <-  mutate(q2_6_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2020 <- bind_rows(q2_4_2020,q2_5_2020,q2_6_2020)
q3_7_2020 <-  mutate(q3_7_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_7_2020 <-  mutate(q3_7_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2020 <-  mutate(q3_8_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2020 <-  mutate(q3_8_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2020 <-  mutate(q3_8_2020, start_station_id = as.character(start_station_id))
q3_8_2020 <-  mutate(q3_8_2020, end_station_id = as.character(end_station_id))
q3_9_2020 <-  mutate(q3_9_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2020 <-  mutate(q3_9_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2020 <-  mutate(q3_9_2020, start_station_id = as.character(start_station_id))
q3_9_2020 <-  mutate(q3_9_2020, end_station_id = as.character(end_station_id))
q3_2020 <- bind_rows(q3_7_2020,q3_8_2020,q3_9_2020)
q4_10_2020 <-  mutate(q4_10_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_10_2020 <-  mutate(q4_10_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_11_2020 <-  mutate(q4_11_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_11_2020 <-  mutate(q4_11_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_12_2020 <-  mutate(q4_12_2020, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_12_2020 <-  mutate(q4_12_2020, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_10_2020 <- mutate(q4_10_2020, start_station_id = as.character(start_station_id))
q4_11_2020 <- mutate(q4_11_2020, start_station_id = as.character(start_station_id))
q4_12_2020 <- mutate(q4_12_2020, start_station_id = as.character(start_station_id))
q4_10_2020 <- mutate(q4_10_2020, end_station_id = as.character(end_station_id))
q4_11_2020 <- mutate(q4_11_2020, end_station_id = as.character(end_station_id))
q4_12_2020 <- mutate(q4_12_2020, end_station_id = as.character(end_station_id))
q4_2020 <- bind_rows(q4_10_2020,q4_11_2020,q4_12_2020)
q1_2021 <- bind_rows(q1_1_2021,q1_2_2021,q1_3_2021)
q2_2021 <- bind_rows(q2_4_2021,q2_5_2021,q2_6_2021)
q3_9_2021 <-  mutate(q3_9_2021, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2021 <-  mutate(q3_9_2021, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2021 <- bind_rows(q3_7_2021,q3_8_2021,q3_9_2021)
q4_2021 <- bind_rows(q4_10_2021,q4_11_2021,q4_12_2021)
q1_2022 <- bind_rows(q1_1_2022,q1_2_2022,q1_3_2022)
q2_2022 <- bind_rows(q2_4_2022,q2_5_2022,q2_6_2022)
q3_7_2022 <-  mutate(q3_7_2022, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_7_2022 <-  mutate(q3_7_2022, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2022 <-  mutate(q3_8_2022, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_8_2022 <-  mutate(q3_8_2022, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2022 <-  mutate(q3_9_2022, started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_9_2022 <-  mutate(q3_9_2022, ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2022 <- bind_rows(q3_7_2022,q3_8_2022,q3_9_2022)
q4_2022 <- bind_rows(q4_10_2022,q4_11_2022,q4_12_2022)
q1_2023 <- bind_rows(q1_1_2023,q1_2_2023,q1_3_2023)
```

#====================================================
# STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE
#====================================================
Compare column names each of the files
 While the names don't have to be in the same order, they DO need to match perfectly before we can use a command to join them into one file

```{r}
colnames(y_2013)
colnames(q1_2_2014)
colnames(q3_2014)
colnames(q4_2014)
colnames(q1_2015)
colnames(q1_2016)
colnames(q1_2017)
colnames(q1_2018)
colnames(q1_2019)
colnames(q2_2015)
colnames(q2_2016)
colnames(q2_2017)
colnames(q2_2018)
colnames(q2_2019)
colnames(q3_2015)
colnames(q3_2016)
colnames(q3_2017)
colnames(q3_2018)
colnames(q3_2019)
colnames(q4_2015)
colnames(q4_2016)
colnames(q4_2017)
colnames(q4_2018)
colnames(q4_2019)
colnames(q1_2020)
colnames(q1_2022)
colnames(q1_2023)
colnames(q2_2020)
colnames(q2_2022)
colnames(q3_2020)
colnames(q3_2022)
colnames(q4_2020)
colnames(q4_2022)
colnames(q1_2021)
colnames(q2_2021)
colnames(q3_2021)
colnames(q4_2021)
```
Rename columns  to make them consistent with q1_2020 to q1_2023 (as this will be the supposed going-forward table design for Divvy)

```{r}
#Rename columns  to make them consistent with q1_2020 to q1_2023 (as this will be the supposed going-forward table design for Divvy)
(y_2013 <- rename(y_2013
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2_2014 <- rename(q1_2_2014
                  ,ride_id = trip_id
                  ,rideable_type = bikeid 
                  ,started_at = starttime  
                  ,ended_at = stoptime  
                  ,start_station_name = from_station_name 
                  ,start_station_id = from_station_id 
                  ,end_station_name = to_station_name 
                  ,end_station_id = to_station_id 
                  ,member_casual = usertype))

(q3_2014 <- rename(q3_2014
                     ,ride_id = trip_id
                     ,rideable_type = bikeid 
                     ,started_at = starttime  
                     ,ended_at = stoptime  
                     ,start_station_name = from_station_name 
                     ,start_station_id = from_station_id 
                     ,end_station_name = to_station_name 
                     ,end_station_id = to_station_id 
                     ,member_casual = usertype))
(q4_2014 <- rename(q4_2014
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2015 <- rename(q1_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))
(q1_2016 <- rename(q1_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2017 <- rename(q1_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q1_2018 <- rename(q1_2018
                   ,ride_id = "01 - Rental Details Rental ID"
                   ,rideable_type = "01 - Rental Details Bike ID" 
                   ,started_at = "01 - Rental Details Local Start Time"  
                   ,ended_at = "01 - Rental Details Local End Time"  
                   ,start_station_name = "03 - Rental Start Station Name" 
                   ,start_station_id = "03 - Rental Start Station ID"
                   ,end_station_name = "02 - Rental End Station Name" 
                   ,end_station_id = "02 - Rental End Station ID"
                   ,member_casual = "User Type"))

(q1_2019 <- rename(q1_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2015 <- rename(q2_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2016 <- rename(q2_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2017 <- rename(q2_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2018 <- rename(q2_2018
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2019 <- rename(q2_2019
                   ,ride_id = "01 - Rental Details Rental ID"
                   ,rideable_type = "01 - Rental Details Bike ID" 
                   ,started_at = "01 - Rental Details Local Start Time"  
                   ,ended_at = "01 - Rental Details Local End Time"  
                   ,start_station_name = "03 - Rental Start Station Name" 
                   ,start_station_id = "03 - Rental Start Station ID"
                   ,end_station_name = "02 - Rental End Station Name" 
                   ,end_station_id = "02 - Rental End Station ID"
                   ,member_casual = "User Type"))

(q3_2015 <- rename(q3_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2016 <- rename(q3_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2017 <- rename(q3_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2018 <- rename(q3_2018
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2015 <- rename(q4_2015
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2016 <- rename(q4_2016
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = starttime  
                   ,ended_at = stoptime  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2017 <- rename(q4_2017
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2018 <- rename(q4_2018
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q4_2019 <- rename(q4_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2019 <- rename(q3_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))
```

```{r}
#Change Datatype of colmn, So they stack correctly
q1_2017 <- mutate(q1_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2017 <- mutate(q1_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2018 <- mutate(q1_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2018 <- mutate(q1_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2019 <- mutate(q1_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2019 <- mutate(q1_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2020 <- mutate(q1_2020,ride_id = as.character(ride_id))
q1_2017 <- mutate(q1_2017,ride_id = as.character(ride_id))
q1_2016 <- mutate(q1_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2016 <- mutate(q1_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2016 <- mutate(q1_2016,ride_id = as.character(ride_id))
q1_2015 <- mutate(q1_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2015 <- mutate(q1_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2015 <- mutate(q1_2015,ride_id = as.character(ride_id))
q1_2_2014 <- mutate(q1_2_2014,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,ride_id = as.character(ride_id))
y_2013 <- mutate(y_2013,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
y_2013 <- mutate(y_2013,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
y_2013 <- mutate(y_2013,ride_id = as.character(ride_id))
q1_2018 <- mutate(q1_2018,ride_id = as.character(ride_id))
q1_2019 <- mutate(q1_2019,ride_id = as.character(ride_id))
q1_2020 <- mutate(q1_2020,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2020 <- mutate(q1_2020,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,rideable_type = as.character(rideable_type))
y_2013 <- mutate(y_2013,rideable_type = as.character(rideable_type))
q1_2015 <- mutate(q1_2015,rideable_type = as.character(rideable_type))
q1_2016 <- mutate(q1_2016,rideable_type = as.character(rideable_type))
q1_2017 <- mutate(q1_2017,rideable_type = as.character(rideable_type))
q1_2018 <- mutate(q1_2018,rideable_type = as.character(rideable_type))
q1_2019 <- mutate(q1_2019,rideable_type = as.character(rideable_type))
q1_2022 <- mutate(q1_2022,rideable_type = as.character(rideable_type))
q1_2022 <- mutate(q1_2022,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2022 <- mutate(q1_2022,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2_2014 <- mutate(q1_2_2014,ride_id = as.character(ride_id))
q2_2015 <- mutate(q2_2015,ride_id = as.character(ride_id))
q2_2015 <- mutate(q2_2015,rideable_type = as.character(rideable_type))
q2_2015 <- mutate(q2_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2015 <- mutate(q2_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2016 <- mutate(q2_2016,ride_id = as.character(ride_id))
q2_2016 <- mutate(q2_2016,rideable_type = as.character(rideable_type))
q2_2016 <- mutate(q2_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2016 <- mutate(q2_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2017 <- mutate(q2_2017,ride_id = as.character(ride_id))
q2_2017 <- mutate(q2_2017,rideable_type = as.character(rideable_type))
q2_2017 <- mutate(q2_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2017 <- mutate(q2_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2018 <- mutate(q2_2018,ride_id = as.character(ride_id))
q2_2018 <- mutate(q2_2018,rideable_type = as.character(rideable_type))
q2_2018 <- mutate(q2_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2018 <- mutate(q2_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,ride_id = as.character(ride_id))
q2_2019 <- mutate(q2_2019,rideable_type = as.character(rideable_type))
q2_2019 <- mutate(q2_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2019 <- mutate(q2_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2022 <- mutate(q2_2022,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q2_2022 <- mutate(q2_2022,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2014 <- mutate(q3_2014,ride_id = as.character(ride_id))
q3_2014 <- mutate(q3_2014,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2014 <- mutate(q3_2014,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2014 <- mutate(q3_2014,rideable_type = as.character(rideable_type))
q3_2015 <- mutate(q3_2015,ride_id = as.character(ride_id))
q3_2015 <- mutate(q3_2015,rideable_type = as.character(rideable_type))
q3_2015 <- mutate(q3_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2015 <- mutate(q3_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2016 <- mutate(q3_2016,ride_id = as.character(ride_id))
q3_2016 <- mutate(q3_2016,rideable_type = as.character(rideable_type))
q3_2016 <- mutate(q3_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2016 <- mutate(q3_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2017 <- mutate(q3_2017,ride_id = as.character(ride_id))
q3_2017 <- mutate(q3_2017,rideable_type = as.character(rideable_type))
q3_2017 <- mutate(q3_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2017 <- mutate(q3_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2018 <- mutate(q3_2018,ride_id = as.character(ride_id))
q3_2018 <- mutate(q3_2018,rideable_type = as.character(rideable_type))
q3_2018 <- mutate(q3_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2018 <- mutate(q3_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2019 <- mutate(q3_2019,ride_id = as.character(ride_id))
q3_2019 <- mutate(q3_2019,rideable_type = as.character(rideable_type))
q3_2019 <- mutate(q3_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q3_2019 <- mutate(q3_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2014 <- mutate(q4_2014,ride_id = as.character(ride_id))
q4_2014 <- mutate(q4_2014,rideable_type = as.character(rideable_type))
q4_2014 <- mutate(q4_2014,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2014 <- mutate(q4_2014,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2015 <- mutate(q4_2015,ride_id = as.character(ride_id))
q4_2015 <- mutate(q4_2015,rideable_type = as.character(rideable_type))
q4_2015 <- mutate(q4_2015,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2015 <- mutate(q4_2015,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2016 <- mutate(q4_2016,ride_id = as.character(ride_id))
q4_2016 <- mutate(q4_2016,rideable_type = as.character(rideable_type))
q4_2016 <- mutate(q4_2016,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2016 <- mutate(q4_2016,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2017 <- mutate(q4_2017,ride_id = as.character(ride_id))
q4_2017 <- mutate(q4_2017,rideable_type = as.character(rideable_type))
q4_2017 <- mutate(q4_2017,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2017 <- mutate(q4_2017,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2018 <- mutate(q4_2018,ride_id = as.character(ride_id))
q4_2018 <- mutate(q4_2018,rideable_type = as.character(rideable_type))
q4_2018 <- mutate(q4_2018,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2018 <- mutate(q4_2018,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2019 <- mutate(q4_2019,ride_id = as.character(ride_id))
q4_2019 <- mutate(q4_2019,rideable_type = as.character(rideable_type))
q4_2019 <- mutate(q4_2019,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2019 <- mutate(q4_2019,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2023 <- mutate(q1_2023,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q1_2023 <- mutate(q1_2023,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2021 <- mutate(q4_2021,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2021 <- mutate(q4_2021,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2022 <- mutate(q4_2022,started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
q4_2022 <- mutate(q4_2022,ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC"))
y_2013 <- mutate(y_2013,end_station_id = as.character(end_station_id))
y_2013 <- mutate(y_2013,start_station_id = as.character(start_station_id))
q1_2_2014 <- mutate(q1_2_2014,start_station_id = as.character(start_station_id))
q1_2_2014 <- mutate(q1_2_2014,end_station_id = as.character(end_station_id))
q3_2014 <- mutate(q3_2014,start_station_id = as.character(start_station_id))
q3_2014 <- mutate(q3_2014,end_station_id = as.character(end_station_id))
q4_2014 <- mutate(q4_2014,start_station_id = as.character(start_station_id))
q4_2014 <- mutate(q4_2014,end_station_id = as.character(end_station_id))
q1_2015 <- mutate(q1_2015,start_station_id = as.character(start_station_id))
q1_2015 <- mutate(q1_2015,end_station_id = as.character(end_station_id))
q2_2015 <- mutate(q2_2015,start_station_id = as.character(start_station_id))
q2_2015 <- mutate(q2_2015,end_station_id = as.character(end_station_id))
q3_2015 <- mutate(q3_2015,start_station_id = as.character(start_station_id))
q3_2015 <- mutate(q3_2015,end_station_id = as.character(end_station_id))
q4_2015 <- mutate(q4_2015,start_station_id = as.character(start_station_id))
q4_2015 <- mutate(q4_2015,end_station_id = as.character(end_station_id))
q1_2016 <- mutate(q1_2016,start_station_id = as.character(start_station_id))
q1_2016 <- mutate(q1_2016,end_station_id = as.character(end_station_id))
q2_2016 <- mutate(q2_2016,start_station_id = as.character(start_station_id))
q2_2016 <- mutate(q2_2016,end_station_id = as.character(end_station_id))
q3_2016 <- mutate(q3_2016,start_station_id = as.character(start_station_id))
q3_2016 <- mutate(q3_2016,end_station_id = as.character(end_station_id))
q4_2016 <- mutate(q4_2016,start_station_id = as.character(start_station_id))
q4_2016 <- mutate(q4_2016,end_station_id = as.character(end_station_id))
q1_2016 <- mutate(q1_2016,start_station_id = as.character(start_station_id))
q1_2017 <- mutate(q1_2017,start_station_id = as.character(start_station_id))
q1_2017 <- mutate(q1_2017,end_station_id = as.character(end_station_id))
q2_2017 <- mutate(q2_2017,start_station_id = as.character(start_station_id))
q2_2017 <- mutate(q2_2017,end_station_id = as.character(end_station_id))
q3_2017 <- mutate(q3_2017,start_station_id = as.character(start_station_id))
q3_2017 <- mutate(q3_2017,end_station_id = as.character(end_station_id))
q4_2017 <- mutate(q4_2017,start_station_id = as.character(start_station_id))
q4_2017 <- mutate(q4_2017,end_station_id = as.character(end_station_id))
q1_2018 <- mutate(q1_2018,start_station_id = as.character(start_station_id))
q1_2018 <- mutate(q1_2018,end_station_id = as.character(end_station_id))
q2_2018 <- mutate(q2_2018,start_station_id = as.character(start_station_id))
q2_2018 <- mutate(q2_2018,end_station_id = as.character(end_station_id))
q3_2018 <- mutate(q3_2018,start_station_id = as.character(start_station_id))
q3_2018 <- mutate(q3_2018,end_station_id = as.character(end_station_id))
q4_2018 <- mutate(q4_2018,start_station_id = as.character(start_station_id))
q4_2018 <- mutate(q4_2018,end_station_id = as.character(end_station_id))
q1_2019 <- mutate(q1_2019,start_station_id = as.character(start_station_id))
q1_2019 <- mutate(q1_2019,end_station_id = as.character(end_station_id))
q2_2019 <- mutate(q2_2019,start_station_id = as.character(start_station_id))
q2_2019 <- mutate(q2_2019,end_station_id = as.character(end_station_id))
q3_2019 <- mutate(q3_2019,start_station_id = as.character(start_station_id))
q3_2019 <- mutate(q3_2019,end_station_id = as.character(end_station_id))
q4_2019 <- mutate(q4_2019,start_station_id = as.character(start_station_id))
q4_2019 <- mutate(q4_2019,end_station_id = as.character(end_station_id))
q1_2020 <- mutate(q1_2020,start_station_id = as.character(start_station_id))
q1_2020 <- mutate(q1_2020,end_station_id = as.character(end_station_id))
q2_2020 <- mutate(q2_2020,start_station_id = as.character(start_station_id))
q2_2020 <- mutate(q2_2020,end_station_id = as.character(end_station_id))
q3_2020 <- mutate(q3_2020,start_station_id = as.character(start_station_id))
q3_2020 <- mutate(q3_2020,end_station_id = as.character(end_station_id))
q4_2020 <- mutate(q4_2020,start_station_id = as.character(start_station_id))
q4_2020 <- mutate(q4_2020,end_station_id = as.character(end_station_id))
```
 
```{r}
# Remove unwanted colmme

y_2013 <- y_2013 %>%  
  select(-c(birthday, gender, "tripduration"))

q1_2_2014 <- q1_2_2014 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2014 <- q3_2014 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2014 <- q4_2014 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2015 <- q1_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q2_2015 <- q2_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2015 <- q3_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2015 <- q4_2015 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2016 <- q1_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q2_2016 <- q2_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2016 <- q3_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2016 <- q4_2016 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2017 <- q1_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q2_2017 <- q2_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q3_2017 <- q3_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q4_2017 <- q4_2017 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2018 <- q1_2018 %>%  
  select(-c("01 - Rental Details Duration In Seconds Uncapped", "05 - Member Details Member Birthday Year", "Member Gender"))
q2_2018 <- q2_2018 %>%  
  select(-c(birthyear, gender, "tripduration"))
q3_2018 <- q3_2018 %>%  
  select(-c(birthyear, gender, "tripduration"))
q4_2018 <- q4_2018 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2019 <- q1_2019 %>%  
  select(-c(birthyear, gender, "tripduration"))
q2_2019 <- q2_2019 %>%  
  select(-c("01 - Rental Details Duration In Seconds Uncapped", "05 - Member Details Member Birthday Year", "Member Gender"))
q3_2019 <- q3_2019 %>%  
  select(-c(birthyear, gender, "tripduration"))
q4_2019 <- q4_2019 %>%  
  select(-c(birthyear, gender, "tripduration"))

q1_2020 <- q1_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q2_2020 <- q2_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q3_2020 <- q3_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q4_2020 <- q4_2020 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

q1_2021 <- q1_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q2_2021 <- q2_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q3_2021 <- q3_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q4_2021 <- q4_2021 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

q1_2022 <- q1_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q2_2022 <- q2_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q3_2022 <- q3_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
q4_2022 <- q4_2022 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))

q1_2023 <- q1_2023 %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng))
```
 
Make Big Dataset

```{r}
#### Stack individual quarter's data frames into one big data frame
all_trips <- bind_rows(y_2013,q1_2_2014,q1_2015,q1_2016,q1_2017,q1_2018,q1_2019,q1_2020,q1_2021,q1_2022,q1_2023,q2_2015,q2_2016,q2_2017,q2_2018,q2_2019,q2_2020,q2_2021,q2_2022,q3_2014,q3_2015,q3_2016,q3_2017,q3_2018,q3_2019,q3_2020,q3_2021,q3_2022,q4_2014,q4_2015,q4_2016,q4_2017,q4_2018,q4_2019,q4_2020,q4_2021,q4_2022)
```


#======================================================
# STEP 3: CLEAN UP AND ADD DATA TO PREPARE FOR ANALYSIS
#======================================================

#PROCESS

Cleaning and preparing data for analysis

Key tasks

    Check the data for errors.
    Choose your tools.
    Transform the data so you can work with it effectively.
    Document the cleaning process.

 Deliverable
 
 Following code chunks will be used for this phase.
 
```{r}
# Inspect the new table that has been created
colnames(all_trips)  #List of column names
nrow(all_trips)  #How many rows are in data frame?
dim(all_trips)  #Dimensions of the data frame?
head(all_trips)  #See the first 6 rows of data frame.  Also tail(all_trips)
str(all_trips)  #See list of columns and data types (numeric, character, etc)
summary(all_trips)  #Statistical summary of data. Mainly for numerics

```
 
# Remove bias which are null values

```{r}
is.na(all_trips)
all_trips <- na.omit(all_trips)
```

# There are a few problems we will need to fix:
# (1) In the "member_casual" column, there are two names for members ("member" and "Subscriber") and two names for casual riders ("Customer" and "casual"). We will need to consolidate that from four to two labels.
# (2) The data can only be aggregated at the ride-level, which is too granular. We will want to add some additional columns of data -- such as day, month, year -- that provide additional opportunities to aggregate the data.
# (3) We will want to add a calculated field for length of ride since the 2020Q1 data did not have the "tripduration" column. We will add "ride_length" to the entire dataframe for consistency.
# (4) There are some rides where tripduration shows up as negative, including several hundred rides where Divvy took bikes out of circulation for Quality Control reasons. We will want to delete these rides.

# In the "member_casual" column, replace "Subscriber" with "member" and "Customer" with "casual"
# Before 2020, Divvy used different labels for these two types of riders ... we will want to make our dataframe consistent with their current nomenclature
# N.B.: "Level" is a special property of a column that is retained even if a subset does not contain any values from a specific level
# Begin by seeing how many observations fall under each usertype

```{r}
table(all_trips$member_casual)
```

```{r}
# Reassign to the desired values
all_trips <-  all_trips %>% 
  mutate(member_casual = recode(member_casual
                                ,"Subscriber" = "member"
                                ,"Customer" = "casual"))
```

```{r}
# Check to make sure the proper number of observations were reassigned
table(all_trips$member_casual)
```
# Add columns that list the date, month, day, and year of each ride
# This will allow us to aggregate ride data for each month, day, or year ... before completing these operations we could only aggregate at the ride level
# https://www.statmethods.net/input/dates.html more on date formats in R found at that link

```{r}
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
```

# Add a "ride_length" calculation to all_trips (in seconds)
# https://stat.ethz.ch/R-manual/R-devel/library/base/html/difftime.html

```{r}
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
```

# Inspect the structure of the columns

```{r}
str(all_trips)
```

# Convert "ride_length" from Factor to numeric so we can run calculations on the data

```{r}
is.factor(all_trips$ride_length)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)
```

# Remove "bad" data
# The dataframe includes a few hundred entries when bikes were taken out of docks and checked for quality by Divvy or ride_length was negative
# We will create a new version of the dataframe (v2) since data is being removed
# https://www.datasciencemadesimple.com/delete-or-drop-rows-in-r-with-conditions-2

```{r}
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
```

#=====================================
# STEP 4: CONDUCT DESCRIPTIVE ANALYSIS
#=====================================

```{r}
# Descriptive analysis on ride_length (all figures in seconds)
mean(all_trips_v2$ride_length) #straight average (total ride length / rides)
median(all_trips_v2$ride_length) #midpoint number in the ascending array of ride lengths
max(all_trips_v2$ride_length) #longest ride
min(all_trips_v2$ride_length) #shortest ride
```
```{r}
# You can condense the four lines above to one line using summary() on the specific attribute
summary(all_trips_v2$ride_length)

```
```{r}
# Compare members and casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
```
```{r}
# See the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
```
```{r}
# Notice that the days of the week are out of order. Let's fix that.
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
```

```{r}
# Now, let's run the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
```

```{r}
# analyze ridership data by type and weekday
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n()							#calculates the number of rides and average duration 
            ,average_duration = mean(ride_length)) %>% 		# calculates the average duration
  arrange(member_casual, weekday)								# sorts
```

#Now Times to Visualize the data

```{r}
# Let's visualize the number of rides by rider type
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")
```

```{r}
# Let's create a visualization for average duration
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")
```
#=================================================
# STEP 5: EXPORT SUMMARY FILE FOR FURTHER ANALYSIS
#=================================================

```{r}
# Create a csv file that we will visualize in Excel, Tableau, or my presentation software
# N.B.: This file location is for a Mac. If you are working on a PC, change the file location accordingly (most likely "C:\Users\YOUR_USERNAME\Desktop\...") to export the data. You can read more here: https://datatofish.com/export-dataframe-to-csv-in-r/

counts <- aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
write.csv(counts, file = 'D:/Project/Project 1/final/avg_ride_length.csv')
```

#ACT

This phase will be carried out by the executive team, Director of Marketing (Lily Moreno) and the Marketing Analytics team based on my analysis.
Conclusion

    Casual riders go farther in terms of distance.
    Casual riders travel for a longer time period.
    Members ride less on the weekend compared to casual riders.

Deliverable

    Your top three recommendations based on your analysis

    Have a slash sale or promo for casual riders so they can acquire more bikes and indulge them in the benefits of being a member.
    Encourage member riders to ride on weekends by giving them various coupons or extending their membership by a period of time.
    Host fun biking competitions with prizes at intervals for members on the weekends. Since there are lot of casual riders on weekends,this will also attract them to get a membership.

