This document provides a sequential guide detailing the process I followed to conduct my analysis, accompanied by my observation notes.

Cyclistic Data

Upon downloading the data, I conducted initial exploration and processing using VBA. These Excel VBA macros collectively form a two-step process.

VBA scripts

Phase One - Transform

"Data Enhancement and Insight Generation

Numeric formats are applied and visual adjustments are made to allow for initial evaluation in this first step.
Serves as a preliminary foundation for the subsequent VBA operation.


Sub PreliminaryRequirements_Transform()
    Dim folderPath As String
    Dim filePath As String
    Dim wb As Workbook
    Dim ws As Worksheet
    Dim rng As Range
    Dim processedFiles As New Collection
    
    ' Specify the source folder path where the CSV files are located
    folderPath = "D:\raw\"
    
    ' Specify the destination folder path for saving CSV files
    Dim csvDestinationFolderPath As String
    csvDestinationFolderPath = "D:\phase_one_transform\"

    ' Get the first CSV file in the folder
    filePath = Dir(folderPath & "*.csv")
    
    Do While filePath <> ""
        ' Construct the full file path
        Dim fullFilePath As String
        fullFilePath = folderPath & filePath
        
        ' Check if the file has already been processed
        Dim alreadyProcessed As Boolean
        On Error Resume Next
        processedFiles.Add Item:=filePath, Key:=CStr(filePath)
        alreadyProcessed = (Err.Number = 0)
        Err.Clear
        On Error GoTo 0
        
        If Not alreadyProcessed Then
        ' Open the CSV file
        Set wb = Workbooks.Open(fullFilePath)
            
                     
        ' CLEANING DATA AND SETTING DATA TYPES
        
            
        ' Clean the data types
        Set ws = wb.Worksheets(1) ' Assuming the data is on the first sheet

                   
        ' Clean data types in column A as text
        Set rng = ws.Range("A:A")
        rng.NumberFormat = "@"
        
        ' Clean data types in column B as text
        Set rng = ws.Range("B:B")
        rng.NumberFormat = "@"
        
        ' Clean data types in column C as date/time
        Set rng = ws.Range("C:C")
        rng.NumberFormat = "mm/dd/yyyy hh:mm:ss"
        
        ' Clean data types in column D as date/time
        Set rng = ws.Range("D:D")
        rng.NumberFormat = "mm/dd/yyyy hh:mm:ss"
        
        ' Clean data types in column E as text
        Set rng = ws.Range("E:E")
        rng.NumberFormat = "@"
        
        ' Clean data types in column F as text
        Set rng = ws.Range("F:F")
        rng.NumberFormat = "@"
        
        ' Clean data types in column G as text
        Set rng = ws.Range("G:G")
        rng.NumberFormat = "@"
        
        ' Clean data types in column H as text
        Set rng = ws.Range("H:H")
        rng.NumberFormat = "@"
        
        ' Clean data types in column I as number with 12 decimal places
        Set rng = ws.Range("I:I")
        rng.NumberFormat = "0.000000000000"
        
        ' Clean data types in column J as number with 12 decimal places
        Set rng = ws.Range("J:J")
        rng.NumberFormat = "0.000000000000"
        
        ' Clean data types in column K as number with 12 decimal places
        Set rng = ws.Range("K:K")
        rng.NumberFormat = "0.000000000000"
        
        ' Clean data types in column L as number with 12 decimal places
        Set rng = ws.Range("L:L")
        rng.NumberFormat = "0.000000000000"
        
        ' Clean data types in column M
        Set rng = ws.Range("M:M")
        rng.NumberFormat = "@" ' Set the format to text
        
        
        ' INSERTING COLUMNS AND PERFORMING CALCULATIONS
        
        ' ADDING COLUMN "ride_length"
        
        ' Get the last row in the column containing data
        Dim LastRow As Long
        LastRow = ws.Cells(ws.Rows.Count, "C").End(xlUp).Row

        ' Add a new column "ride_length" in column N
        ws.Range("N1").Value = "ride_length"

        ' Perform calculation and copy formula down
        ws.Range("N2:N" & LastRow).Formula = "=D2-C2"
           
        
        ' ADDING COLUMN(s) "day_of_week" & "name_of_day"
                
        ' Add a new column "day_of_week" in column O
        ws.Range("O1").Value = "day_of_week"

        ' Perform calculation and format as desired
        ws.Range("O2:O" & LastRow).Formula = "=WEEKDAY(C2,1)"
        ws.Range("O2:O" & LastRow).NumberFormat = "0" ' Format as number with no decimals
        
        
        ' Add a new column "name_of_day" in column O
        ws.Range("P1").Value = "name_of_day"
        
        ' Replace numeric weekday with day names
        ws.Range("P2:P" & LastRow).FormulaR1C1 = "=CHOOSE(WEEKDAY(RC3,1),""Sunday"",""Monday"",""Tuesday"",""Wednesday"",""Thursday"",""Friday"",""Saturday"")"
        
        ' Format as desired (optional)
        ws.Range("P2:P" & LastRow).NumberFormat = "@" ' Format as text
        
        ' AutoFit all columns
        ws.UsedRange.Columns.AutoFit
        
                       
        ' SAVING IN CSV FORMAT
            
        ' Save the workbook in CSV format to the CSV destination folder
        wb.SaveAs FileName:=csvDestinationFolderPath & Replace(wb.Name, ".csv", ".csv"), FileFormat:=xlCSV
            
        ' Close the workbook without saving changes
        wb.Close SaveChanges:=False
            
        ' Get the next CSV file
        filePath = Dir ' Get the next CSV file
    End If
Loop
    
    ' Display a message box indicating success after the loop
    MsgBox "Data types cleaned and files saved successfully!"
End Sub"

Phase Two - Preliminary Analysis

"Dynamic Data Analysis and Enhanced Insights

Transforms pre-cleaned files into pivot tables with calculated metrics (average, maximum, minimum, mode, and record count). 

Sub Stats_Pivots()
    ' Declare Variables
    Dim folderPath As String
    Dim filePath As String
    Dim wb As Workbook
    Dim DSheet As Worksheet
    Dim PCache As pivotCache
    Dim PTable As pivotTable
    Dim PRange As Range
    Dim lastDataRow As Long

    ' Specify the source folder path where the CSV files are located
    folderPath = "D:\phase_one_transform\"

    ' Specify the destination folder path for saving files
    Dim destinationFolderPath As String
    destinationFolderPath = "D:\phase_two_prelim_analysis\"

    ' Get the first CSV file in the folder
    filePath = Dir(folderPath & "*.csv")

    Do While filePath <> ""
        ' Construct the full file path
        Dim fullFilePath As String
        fullFilePath = folderPath & filePath
        
        ' Open the CSV file
        Set wb = Workbooks.Open(fullFilePath)

        ' Set the data sheet using wildcard
        Set DSheet = Nothing
        For Each ws In wb.Sheets
            If ws.Name Like "*-divvy-tripdata" Then
                Set DSheet = ws
                Exit For
            End If
        Next ws

        If DSheet Is Nothing Then
            MsgBox "Worksheet not found.", vbExclamation
            wb.Close SaveChanges:=False
            ' Get the next CSV file in the folder
            filePath = Dir
            Exit Do ' Exit the loop if the worksheet is not found
        End If

        ' Add and rename the new sheet
        Dim PSheet As Worksheet
        On Error Resume Next
        Application.DisplayAlerts = False
        ThisWorkbook.Sheets("Prelim Analytics").Delete
        On Error GoTo 0
        Set PSheet = wb.Sheets.Add(Before:=DSheet)
        PSheet.Name = "Prelim Analytics"
        Application.DisplayAlerts = True
        
        
        'CALCULATIONS
        ' Set headers for the new sheet
        PSheet.Range("A1").Value = "AVERAGE"
        PSheet.Range("B1").Value = "MAX"
        PSheet.Range("C1").Value = "MIN"
        PSheet.Range("D1").Value = "MODE"
        PSheet.Range("E1").Value = "DAY"
        PSheet.Range("F1").Value = "RECORD_COUNT"
                            
        ' Adjust column widths for the first 6 columns
        PSheet.Columns("A").ColumnWidth = 11 ' Adjust the value as needed
        PSheet.Columns("B").ColumnWidth = 11 ' Adjust the value as needed
        PSheet.Columns("C").ColumnWidth = 11 ' Adjust the value as needed
        PSheet.Columns("D").ColumnWidth = 11 ' Adjust the value as needed
        PSheet.Columns("E").ColumnWidth = 11 ' Adjust the value as needed
        PSheet.Columns("F").ColumnWidth = 15 ' Adjust the value as needed
        
        ' Calculate averages, max, min, mode, and day using formulas
        PSheet.Range("A2").FormulaR1C1 = "=AVERAGE('" & DSheet.Name & "'!C[13])"
        PSheet.Range("B2").FormulaR1C1 = "=MAX('" & DSheet.Name & "'!C[12])"
        PSheet.Range("C2").FormulaR1C1 = "=MIN('" & DSheet.Name & "'!C[11])"
        PSheet.Range("D2").FormulaR1C1 = "=MODE('" & DSheet.Name & "'!C[11])"
        PSheet.Range("E2").FormulaR1C1 = "=XLOOKUP(RC[-1],'" & DSheet.Name & "'!C[10],'" & DSheet.Name & "'!C[11])"
        
        ' Calculate and set the total number of records in column A (excluding header)
        lastDataRow = DSheet.Cells(DSheet.Rows.Count, 1).End(xlUp).Row
        Dim recordCount As Long
        recordCount = IIf(lastDataRow >= 2, lastDataRow - 1, 0) ' Subtract 1 for the header
        
        ' Place the record count in cell F2
        PSheet.Range("F2").Value = recordCount
        
        ' Apply number format to the cell
        PSheet.Range("F2").NumberFormat = "#,##0"
                
        'PIVOTS
        
    
        ' Set names for pivot tables
        PSheet.Range("A5").Value = "Average by Ridership"
        PSheet.Range("D5").Value = "Average by Day"
        PSheet.Range("N5").Value = "Count by Day"
        
        ' Turn off screen updating and calculations
        Application.ScreenUpdating = False
        Application.Calculation = xlCalculationManual
        
       ' Define Pivot Range
        ' Find the last row and last column of the data
        LastRow = DSheet.Cells(Rows.Count, 1).End(xlUp).Row
        LastCol = DSheet.Cells(1, Columns.Count).End(xlToLeft).Column

        ' Set the pivot table data range using the last row and last column
        Set PRange = DSheet.Cells(1, 1).Resize(LastRow, LastCol)
        ' Define Pivot Cache
        Set PCache = ActiveWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:=PRange)

        ' Convert "ride_length" column to numeric values
        DSheet.Range("P2:P" & lastDataRow).Value = DSheet.Range("P2:P" & lastDataRow).Value
        
        ' Insert Pivot Table 1
        Set PTable = PCache.CreatePivotTable(TableDestination:=PSheet.Cells(6, 1), TableName:="PivotTable1")
        
        ' Get the "member_casual" pivot field
        Dim memberCasualField As PivotField
        On Error Resume Next
        Set memberCasualField = PTable.PivotFields("member_casual")
        On Error GoTo 0
        
        ' Check if the field was found and set the orientation and position
        If Not memberCasualField Is Nothing Then
            memberCasualField.Orientation = xlRowField
            memberCasualField.Position = 1
        Else
            MsgBox "Pivot field 'member_casual' not found.", vbExclamation
        End If
        
        ' Set other pivot field properties
        With PSheet.PivotTables("PivotTable1").PivotFields("member_casual")
            .Orientation = xlRowField
            .Position = 1
        End With
        
        ' Insert Data Field
        With PSheet.PivotTables("PivotTable1").PivotFields("ride_length")
            .Orientation = xlDataField
            .Function = xlAverage
            .NumberFormat = "#,##0.0000" ' Format to 4 decimal places
            .Name = "Avg of ride_length"
        End With
        
       ' Insert Pivot Table 2
        Set PTable = PCache.CreatePivotTable(TableDestination:=PSheet.Cells(6, 4), TableName:="PivotTable2")
        
        ' Get the "member_casual" pivot field for PivotTable2
        Dim memberCasualField2 As PivotField
        On Error Resume Next
        Set memberCasualField2 = PTable.PivotFields("member_casual")
        On Error GoTo 0
        
        ' Check if the field was found and set the orientation and position
        If Not memberCasualField2 Is Nothing Then
            memberCasualField2.Orientation = xlRowField
            memberCasualField2.Position = 1
        Else
            MsgBox "Pivot field 'member_casual' not found for PivotTable2.", vbExclamation
        End If
        
        ' Get the "name_of_day" pivot field for PivotTable2
        Dim nameOfDayField As PivotField
        On Error Resume Next
        Set nameOfDayField = PTable.PivotFields("name_of_day")
        On Error GoTo 0
        
        ' Check if the field was found and set the orientation and position
        If Not nameOfDayField Is Nothing Then
            nameOfDayField.Orientation = xlColumnField
            nameOfDayField.Position = 1
        Else
            MsgBox "Pivot field 'name_of_day' not found for PivotTable2.", vbExclamation
        End If
        
        ' Insert Data Field
        With PSheet.PivotTables("PivotTable2").PivotFields("ride_length")
            .Orientation = xlDataField
            .Function = xlAverage
            .NumberFormat = "0.0000"
            .Name = "Average of ride_length"
        End With
        
        ' Insert Pivot Table 3
        Set PTable = PCache.CreatePivotTable(TableDestination:=PSheet.Cells(6, 14), TableName:="PivotTable3")
        ' Insert Row Fields
        With PSheet.PivotTables("PivotTable3").PivotFields("member_casual")
            .Orientation = xlRowField
            .Position = 1
        End With
        ' Insert Column Fields
        With PSheet.PivotTables("PivotTable3").PivotFields("name_of_day")
            .Orientation = xlColumnField
            .Position = 1
        End With
        ' Insert Data Field
        With PSheet.PivotTables("PivotTable3").PivotFields("member_casual")
            .Orientation = xlDataField
            .Function = xlCount
            .NumberFormat = "#,##0"
            .Name = "Count of rides by day"
        End With
        
        ' Format Pivot Tables
        For Each PTable In PSheet.PivotTables
            PTable.ShowTableStyleRowStripes = True
            PTable.TableStyle2 = "PivotStyleMedium9"
        Next PTable
        
        ' Turn on screen updating and calculations
        Application.ScreenUpdating = True
        Application.Calculation = xlCalculationAutomatic
        
       ' Save the workbook in Excel format
        newFilePathXLSX = destinationFolderPath & Left$(wb.Name, Len(wb.Name) - 4) & ".xlsx"
        wb.SaveAs newFilePathXLSX, FileFormat:=xlOpenXMLWorkbook

        
        ' Close the CSV file
        wb.Close SaveChanges:=False
        
        ' Get the next CSV file in the folder
        filePath = Dir
    Loop

    ' Display a message box indicating the files have been processed
    MsgBox "Files have been processed and saved as macro-enabled workbooks.", vbInformation
End Sub"

***************************************************************************************

Creating tables and loading data

1. create_cyclistic_tables

Establishing the framework for data storage.

"DO $$ 
BEGIN
    FOR i IN 1..12 LOOP
        EXECUTE format('
            CREATE TABLE cyclistic_data_%s (
                ride_id text,
                rideable_type text,
                started_at time stamp,
                ended_at time stamp,
                start_station_name text,
                start_station_id text,
                end_station_name text,
                end_station_id text,
                start_lat decimal,
                start_lng decimal,
                end_lat decimal,
                end_lng decimal,
                member_casual text,
                ride_length text,
                day_of_week integer,
                name_of_day text
            );', i);
    END LOOP;
END $$;"

2. load_csv_data_monthwise

Transfers data from CSV files to designated database tables.

"-- Loop through the months and copy data from CSV files
DO $$ 
DECLARE
    month_num INTEGER;
    table_name TEXT;
    csv_path TEXT;
BEGIN
    FOR month_num IN 1..12 LOOP
        table_name := 'cyclistic_data_' || month_num;
        csv_path := 'D:/phase_one_transform/2022' || LPAD(month_num::TEXT, 2, '0') || '-divvy-tripdata.csv';
        
        EXECUTE format('
            COPY %I FROM %L DELIMITER %L CSV HEADER;',
            table_name, csv_path, ',');
    END LOOP;
END $$;
"

3. combine_monthly_data

Consolidating monthly data into a single unified data set.

"CREATE TABLE user_data AS
SELECT * FROM cyclistic_data_1
UNION ALL SELECT * FROM cyclistic_data_2
UNION ALL SELECT * FROM cyclistic_data_3
UNION ALL SELECT * FROM cyclistic_data_4
UNION ALL SELECT * FROM cyclistic_data_5
UNION ALL SELECT * FROM cyclistic_data_6
UNION ALL SELECT * FROM cyclistic_data_7
UNION ALL SELECT * FROM cyclistic_data_8
UNION ALL SELECT * FROM cyclistic_data_9
UNION ALL SELECT * FROM cyclistic_data_10
UNION ALL SELECT * FROM cyclistic_data_11
UNION ALL SELECT * FROM cyclistic_data_12;
"

***************************************************************************************

During the process of examining the data set and brainstorming potential query scenarios, I identified two valuable third-party data sources that would enhance my analysis: holidays and weather data. Both sources cover the same timeframe as the existing data set, thereby enhancing its context and depth. Integrating holiday data could provide crucial insights into activity fluctuations attributed to special occasions, offering a nuanced understanding of user behavior. Similarly, the integration of weather data could deepen analysis by uncovering how weather conditions impact user behavior, given its influence on transportation choices and outdoor activities. By harnessing these third-party sources and weaving their insights into my analysis, I intend to create a more holistic and informed narrative. This combination of user behavior, holidays, and weather data has the potential to reveal multifaceted correlations and unveil optimization and strategic decision-making opportunities.

***************************************************************************************

Error checks and adjustments

After successfully loading the data and creating the user_data table, I meticulously conducted migration validations to ensure data accuracy. This process involved a thorough examination of the resulting data set, where I identified areas necessitating corrective actions and additional table alterations.

Migration checks:

1. migration_validation_record_counts

Used to verify the successful transition of data to the new platform.

"SELECT
  TO_CHAR(started_at, 'Month') AS month,
  COUNT(*) AS total_records_by_month
FROM
  user_data
GROUP BY
  TO_CHAR(started_at, 'Month'),
  date_part('month', started_at)
ORDER BY
  date_part('month', started_at);
"

2. identify_duplicate_rides

Detecting potential duplicate entries in the data set. 

"SELECT ride_id, started_at, COUNT(*) AS duplicate_count
FROM user_data
GROUP BY ride_id, started_at
HAVING COUNT(*) > 1;"

3. outlier_ridelength_identification

Identify rides with negative ride lengths and those exceeding a day in duration

"-- How many records with negative ride_length?
SELECT COUNT(*) AS total_records_with_negative_ride_length
FROM user_data
WHERE ride_length < 0;

-- View the records
SELECT ride_length AS negative_ride_length, COUNT(ride_length) AS total
FROM user_data
GROUP BY ride_length
HAVING ride_length < 0;

-- remove the records
DELETE FROM user_data
WHERE ride_length < 0;



-- How many records with ridelengths over a day
SELECT ride_length AS over_a_day_ride_length, COUNT(ride_length) AS total
FROM user_data
WHERE ride_length > 24
GROUP BY ride_length;

--view the records
SELECT *
FROM user_data
WHERE ride_length > 24

-- remove the records
DELETE FROM user_data
WHERE ride_length > 24;
"

To ensure meticulous alterations, I chose to perform changes individually for several reasons. This approach offers granular control, isolating and addressing issues that might arise during the alteration process. Furthermore, it provides a clear trail of changes for better reproduction, documentation, and troubleshooting. While bundling changes into a single query might be more efficient in some scenarios, my decision to alter the database individually was driven by the complexity of changes and the need to ensure careful management and validation, thus aligning with data integrity principles. Addressing data type inconsistencies was my initial focus. I rectified these inconsistencies by adjusting the ride length column to a numeric format, ensuring uniformity and data integrity. Subsequently, I recognized a granularity disparity between the weather_data and user_data tables’ temporal data. To harmonize this, I introduced dedicated date columns in the user_data, and weather_data tables, ensuring coherence in date representation. Additionally, I then incorporated a numeric day-of-the-year column, adding a vital temporal dimension for more intricate analysis.

Table adjustments:

1. alter_ride_length_data_type

Updating data type format

"ALTER TABLE user_data
ALTER COLUMN ride_length SET DATA TYPE NUMERIC
USING ride_length::NUMERIC;"

2. alter_align_temporal_granularity_between_data sets

Aligning the temporal granularity

"/* RUN THESE INDIVIDUALLY - ADD THE COLUMN FIRST, THEN UPDATE THE DATA */

-- Add the 'date_start' column to the 'user_data' table
ALTER TABLE user_data
ADD COLUMN date_start DATE;

-- Update the 'date_start' column with extracted date from 'started_at'
UPDATE user_data
SET date_start = DATE(started_at);"

3. data_enhancement_doy

Adding day of the year column

"ALTER TABLE user_data
ADD COLUMN day_of_year INTEGER;

-- Update the 'day_of_year' column for each record in 'user_data'
UPDATE user_data
SET day_of_year = DATE_PART('doy', started_at::date);"

***************************************************************************************

Third Party Data

I realized the importance of establishing a dedicated ‘holidays’ table for the year 2022, tailored to the unique characteristics of the Chicago area. This decision was driven by the data set’s origin in this specific region. The creation of a comprehensive record of holidays specific to Chicago holds the potential to significantly enhance both the analytical capabilities and the contextual relevance of the data. By including local holidays, this table will serve as a valuable resource for conducting more insightful analyses and drawing meaningful conclusions that are closely aligned with the Chicago area’s socio-cultural dynamics. I conducted thorough research using Google to gather accurate and relevant information regarding holidays and their corresponding dates specific to the Chicago area. This meticulous process ensured the creation of a comprehensive and contextually accurate “holidays” table for the year 2022. By utilizing reliable sources and cross-referencing information, I curated a data set that aligns with the local holiday calendar, enhancing the analytical capabilities and relevance of the collected data.

Create holidays table

This data set focuses on the city of Chicago, Illinois, spanning the timeframe from January 1, 2022, to December 31, 2022. The data set is centered around key holidays observed within this period. It includes holiday names along with their respective start and end dates, providing insights into the specific days of significance throughout the year. The data set captures the essence of these holidays, allowing for a comprehensive understanding of their occurrences and durations in Chicago. This set of holiday data serves as a valuable resource for analyzing and contextualizing rider behaviors during these significant dates within the city.

1. create_holiday_table

Creating the holidays table

"CREATE TABLE IF NOT EXISTS public.holidays (
    holiday_name text COLLATE pg_catalog."default",
    holiday_start_date date,
    holiday_end_date date
);

-- Set the table to use the default tablespace
ALTER TABLE public.holidays SET TABLESPACE pg_default;

-- Set the table's owner to 'postgres'
ALTER TABLE public.holidays OWNER TO postgres;
"

2. load_holidays_data

Loading holidays data

"INSERT INTO holidays (holiday_name, holiday_start_date) VALUES
    -- single-day holiday records--
    ('New Year''s', '2022-01-01'),
    ('Martin Luther King Jr.', '2022-01-17'),
        ('Presidents Day', '2022-02-21'),
    ('Easter', '2022-03-23'),
    ('Memorial Day', '2022-05-27'),
    ('Independence Day', '2022-07-04'),
    ('Columbus Day', '2022-10-04'),
    ('Halloween', '2022-10-31'),
    ('Veterans Day', '2022-11-11'),
    ('Thanksgiving', '2022-11-24'),
    ('Christmas Day', '2022-12-25'),
    ('Mother''s Day', '2022-05-18'),
    ('Father''s Day', '2022-06-16'),
    ('Boxing Day', '2022-12-26'),
    ('International Worker''s Day (May Day)', '2022-05-01'),
    ('Flag Day', '2022-06-14'),
    ('Earth Day', '2022-04-22'),
    ('Pi Day', '2022-03-14'),
    ('International Women''s Day', '2022-03-08'),
    ('Valentine''s Day', '2022-02-14'),
    ('Juneteenth', '2022-06-19'),
    ('Chinese New Year','2022-02-01'),
    ('Diwali', '2022-10-24'),
    ('Eid al-Fitr (End of Ramadan)', '2022-05-02'),
    ('Eid al-Adha', '2022-07-09'),
    ('Mardi Gras', '2022-03-01'),
    ('Chicago St. Patrick''s Day Parade', '2022-03-12'),
    ('Chicago Pride Parade', '2022-06-26'),

    -- multi-day holiday records --
        ('Oktoberfest', '2022-09-10'),
        ('Oktoberfest', '2022-09-11'),
        ('Oktoberfest', '2022-09-12'),
        ('Oktoberfest', '2022-09-13'),
        ('Oktoberfest', '2022-09-14'),
        ('Oktoberfest', '2022-09-15'),
        ('Oktoberfest', '2022-09-16'),
        ('Oktoberfest', '2022-09-17'),
        ('Oktoberfest', '2022-09-18'),
        ('Oktoberfest', '2022-09-19'),
        ('Chicago Air and Water Show', '2022-08-20'),
        ('Chicago Air and Water Show', '2022-08-21'),
        ('Taste of Chicago', '2022-06-08'),
        ('Taste of Chicago', '2022-06-09'),
        ('Taste of Chicago', '2022-06-10'),
        ('Chicago Blues Festival', '2022-06-09'),
        ('Chicago Blues Festival', '2022-06-10'),
        ('Chicago Blues Festival', '2022-06-11'),
        ('Chicago Blues Festival', '2022-06-12'),
        ('Chicago Jazz Festival', '2022-09-01'),
        ('Chicago Jazz Festival', '2022-09-02'),
        ('Chicago Jazz Festival', '2022-09-03'),
        ('Chicago Jazz Festival', '2022-09-04'),
        ('Lollapalooza', '2022-07-28'),
        ('Lollapalooza', '2022-07-29'),
        ('Lollapalooza', '2022-07-30'),
        ('Lollapalooza', '2022-07-31')
;"

Create weather table

The acquired weather data set is dedicated to the city of Chicago, Illinois, covering the time span from January 1, 2022, to December 31, 2022. This data set presents a temporal dimension through daily temperature recordings and encompasses a wide array of meteorological variables. Among the temperature measurements in Fahrenheit, the data set encompasses humidity levels, detailed precipitation data, atmospheric pressure readings, solar indexes, cloud information, visibility metrics, and wind statistics. Additionally, the data set includes essential data points such as sunrise and sunset times, moon phases, weather conditions, their corresponding descriptions, and the collection location for the data, with distance measurements provided in miles. This comprehensive data set offers invaluable insights into the climatic trends and conditions of Chicago throughout the specified timeframe.

1. create_weather_data

Creating weather table

"CREATE TABLE weather_data (
    name TEXT,
    datetime time stamp,
    tempmax NUMERIC,
    tempmin NUMERIC,
    temp NUMERIC,
    feelslikemax NUMERIC,
    feelslikemin NUMERIC,
    feelslike NUMERIC,
    dew NUMERIC,
    humidity NUMERIC,
    precip NUMERIC,
    precipprob NUMERIC,
    precipcover NUMERIC,
    preciptype TEXT,
    snow NUMERIC,
    snowdepth NUMERIC,
    windgust NUMERIC,
    windspeed NUMERIC,
    winddir NUMERIC,
    sealevelpressure NUMERIC,
    cloudcover NUMERIC,
    visibility NUMERIC,
    solarradiation NUMERIC,
    solarenergy NUMERIC,
    uvindex NUMERIC,
    severerisk NUMERIC,
    sunrise time stamp,
    sunset time stamp,
    moonphase NUMERIC,
    conditions TEXT,
    description TEXT,
    icon TEXT,
    stations TEXT
);"

2. load_weather_data_load

Loading weather data

"COPY weather_data 
FROM 'D:\GAC_Cyclistic_project\Weather\Chicago_Illinois_2022-01-01_to_2022-12-31.csv'
DELIMITER ',' CSV HEADER;"

Merging holidays and weather

At this point I merged the holidays and weather_data tables into a single table named ‘temporal_data’ to simplify the workflow.

3. merge_holidays_weather

Merging holidays and weather into one table and renaming to 'temporal_data'

"CREATE TABLE temporal_data AS
SELECT *
FROM holidays
FULL OUTER JOIN (
    SELECT *,
           CAST(datetime AS DATE) AS date_w
    FROM weather_data
) AS weather_date_only
ON holidays.holiday_start_date = weather_date_only.date_w;"

***************************************************************************************

Creating Backup Tables:

To ensure the utmost data integrity and to establish a robust contingency plan against potential changes, I am implementing a strategic approach by creating backups for all tables. This process entails replicating the structure and content of these tables, effectively preserving their original configurations and data states. By executing this measure, I am taking proactive steps to securely retain these tables, safeguarding them for future reference and facilitating swift restoration if required.

1. create_user_data_backup

Backing up the cyclistic data

"CREATE TABLE user_data_backup AS
SELECT * FROM user_data;"

2. create_temporal_data_backup

Backing up the holiday/weather data
"CREATE TABLE temporal_data_backup AS
SELECT * FROM temporal_data;"

*************************************************************************************

Data Quality Analysis

Error Analysis - Nulls

During the ‘phase_one_transform’ process, I carried out an initial comprehensive review of the files to meticulously identify and document all instances of errors within the data set. While these errors, often referred to as ‘null’ values, might not influence the determination of ride lengths for the analysis, they hold significance in the context of geospatial analysis. The subsequent analysis provides a detailed overview of records that exhibit incompleteness or null values. This comprehensive examination serves as a vital step in ensuring data quality and integrity across multiple analytical dimensions.

Observations by Fields:

1. count_fields_with_nulls

Counting number of fields containing nulls

"SELECT
    TO_CHAR(COUNT(*) * 17, '999,999,999') AS total_fields,
    TO_CHAR(SUM(
        CASE WHEN ride_id IS NULL OR ride_id = '' OR ride_id = ' ' THEN 1 ELSE 0 END +
        CASE WHEN rideable_type IS NULL OR rideable_type = '' OR rideable_type = ' ' THEN 1 ELSE 0 END +
        CASE WHEN started_at IS NULL THEN 1 ELSE 0 END +
        CASE WHEN ended_at IS NULL THEN 1 ELSE 0 END +
        CASE WHEN start_station_name IS NULL OR start_station_name = '' OR start_station_name = ' ' THEN 1 ELSE 0 END +
        CASE WHEN start_station_id IS NULL OR start_station_id = '' OR start_station_id = ' ' THEN 1 ELSE 0 END +
        CASE WHEN end_station_name IS NULL OR end_station_name = '' OR end_station_name = ' ' THEN 1 ELSE 0 END +
        CASE WHEN end_station_id IS NULL OR end_station_id = '' OR end_station_id = ' ' THEN 1 ELSE 0 END +
        CASE WHEN start_lat IS NULL THEN 1 ELSE 0 END +
        CASE WHEN start_lng IS NULL THEN 1 ELSE 0 END +
        CASE WHEN end_lat IS NULL THEN 1 ELSE 0 END +
        CASE WHEN end_lng IS NULL THEN 1 ELSE 0 END +
        CASE WHEN member_casual IS NULL OR member_casual = '' OR member_casual = ' ' THEN 1 ELSE 0 END +
        CASE WHEN ride_length IS NULL THEN 1 ELSE 0 END +
        CASE WHEN day_of_week IS NULL THEN 1 ELSE 0 END +
        CASE WHEN name_of_day IS NULL THEN 1 ELSE 0 END
    ), '999,999,999') AS total_nulls,
    TO_CHAR((SUM(
        CASE WHEN ride_id IS NULL OR ride_id = '' OR ride_id = ' ' THEN 1 ELSE 0 END +
        CASE WHEN rideable_type IS NULL OR rideable_type = '' OR rideable_type = ' ' THEN 1 ELSE 0 END +
        CASE WHEN started_at IS NULL THEN 1 ELSE 0 END +
        CASE WHEN ended_at IS NULL THEN 1 ELSE 0 END +
        CASE WHEN start_station_name IS NULL OR start_station_name = '' OR start_station_name = ' ' THEN 1 ELSE 0 END +
        CASE WHEN start_station_id IS NULL OR start_station_id = '' OR start_station_id = ' ' THEN 1 ELSE 0 END +
        CASE WHEN end_station_name IS NULL OR end_station_name = '' OR end_station_name = ' ' THEN 1 ELSE 0 END +
        CASE WHEN end_station_id IS NULL OR end_station_id = '' OR end_station_id = ' ' THEN 1 ELSE 0 END +
        CASE WHEN start_lat IS NULL THEN 1 ELSE 0 END +
        CASE WHEN start_lng IS NULL THEN 1 ELSE 0 END +
        CASE WHEN end_lat IS NULL THEN 1 ELSE 0 END +
        CASE WHEN end_lng IS NULL THEN 1 ELSE 0 END +
        CASE WHEN member_casual IS NULL OR member_casual = '' OR member_casual = ' ' THEN 1 ELSE 0 END +
        CASE WHEN ride_length IS NULL THEN 1 ELSE 0 END +
        CASE WHEN day_of_week IS NULL THEN 1 ELSE 0 END +
        CASE WHEN name_of_day IS NULL THEN 1 ELSE 0 END
    )::float / (COUNT(*) * 17) * 100), '999.99') AS error_percentage
FROM
    user_data;"

Total Fields Examined: The analysis encompassed a thorough evaluation of 96,351,189 fields within the data set, offering a comprehensive overview of the data’s scope.
Missing or Null Values: Among the scrutinized fields, 3,463,328 instances were identified as containing missing or null values, constituting an error percentage of 3.59%.

Observations by Records:

2. count_records_with_nulls

Counting number of records containing nulls

"WITH total_null_counts AS (
    SELECT
        COUNT(*) AS total_records_with_errors
    FROM
        user_data
    WHERE
        ride_id IS NULL OR
        rideable_type IS NULL OR
        started_at IS NULL OR
        ended_at IS NULL OR
        start_station_name IS NULL OR
        start_station_id IS NULL OR
        end_station_name IS NULL OR
        end_station_id IS NULL OR
        start_lat IS NULL OR
        start_lng IS NULL OR
        end_lat IS NULL OR
        end_lng IS NULL OR
        member_casual IS NULL OR
        ride_length IS NULL OR
        day_of_week IS NULL OR
        name_of_day IS NULL
)
SELECT
    TO_CHAR(total_records, '999,999,999') AS total_records,
    TO_CHAR(total_records_with_errors, '999,999,999') AS total_records_with_errors,
    TO_CHAR((total_records_with_errors::float / total_records * 100), '999.99') AS error_percentage
FROM
    (SELECT COUNT(*) AS total_records FROM user_data) AS total_records,
    total_null_counts;"

Total Records: The data set encompasses a grand total of 5,667,717 records, signifying its overall volume.
Error and Missing Values: Within the data set, a substantial 1,298,357 records showcase errors or missing values distributed among different columns. This equates to an error percentage of around 22.91%.

Observations of Data Quality in total:

3. data_quality_summary_report

General overview of data quality

"WITH total_counts AS (
    SELECT COUNT(*) AS total_records FROM user_data
)
SELECT column_name, null_count
FROM (
    SELECT 'total_records' AS column_name, 
           TO_CHAR(total_records, '999,999,999') AS null_count, 0 AS sort_order 
    FROM total_counts
    UNION ALL
    SELECT 'ride_id', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE ride_id IS NULL OR ride_id = '' OR ride_id = ' '), 
               '999,999,999'
           ) AS null_count, 1 
    FROM user_data
    UNION ALL
    SELECT 'rideable_type', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE rideable_type IS NULL OR rideable_type = '' OR rideable_type = ' '), 
               '999,999,999'
           ) AS null_count, 2 
    FROM user_data
    UNION ALL
    SELECT 'started_at', 
           TO_CHAR(COUNT(*) FILTER (WHERE started_at IS NULL), '999,999,999') AS null_count, 3 
    FROM user_data
    UNION ALL
    SELECT 'ended_at', 
           TO_CHAR(COUNT(*) FILTER (WHERE ended_at IS NULL), '999,999,999') AS null_count, 4 
    FROM user_data
    UNION ALL
    SELECT 'start_station_name', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE start_station_name IS NULL OR start_station_name = '' OR start_station_name = ' '), 
               '999,999,999'
           ) AS null_count, 5 
    FROM user_data
    UNION ALL
    SELECT 'start_station_id', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE start_station_id IS NULL OR start_station_id = '' OR start_station_id = ' '), 
               '999,999,999'
           ) AS null_count, 6 
    FROM user_data
    UNION ALL
    SELECT 'end_station_name', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE end_station_name IS NULL OR end_station_name = '' OR end_station_name = ' '), 
               '999,999,999'
           ) AS null_count, 7 
    FROM user_data
    UNION ALL
    SELECT 'end_station_id', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE end_station_id IS NULL OR end_station_id = '' OR end_station_id = ' '), 
               '999,999,999'
           ) AS null_count, 8 
    FROM user_data
    UNION ALL
    SELECT 'start_lat', 
           TO_CHAR(COUNT(*) FILTER (WHERE start_lat IS NULL), '999,999,999') AS null_count, 9 
    FROM user_data
    UNION ALL
    SELECT 'start_lng', 
           TO_CHAR(COUNT(*) FILTER (WHERE start_lng IS NULL), '999,999,999') AS null_count, 10 
    FROM user_data
    UNION ALL
    SELECT 'end_lat', 
           TO_CHAR(COUNT(*) FILTER (WHERE end_lat IS NULL), '999,999,999') AS null_count, 11 
    FROM user_data
    UNION ALL
    SELECT 'end_lng', 
           TO_CHAR(COUNT(*) FILTER (WHERE end_lng IS NULL), '999,999,999') AS null_count, 12 
    FROM user_data
    UNION ALL
    SELECT 'member_casual', 
           TO_CHAR(COUNT(*) FILTER (
               WHERE member_casual IS NULL OR member_casual = '' OR member_casual = ' '), 
               '999,999,999'
           ) AS null_count, 13 
    FROM user_data
    UNION ALL
    SELECT 'ride_length', 
           TO_CHAR(COUNT(*) FILTER (WHERE ride_length IS NULL), '999,999,999') AS null_count, 14 
    FROM user_data
    UNION ALL
    SELECT 'day_of_week', 
           TO_CHAR(COUNT(*) FILTER (WHERE day_of_week IS NULL), '999,999,999') AS null_count, 15 
    FROM user_data
    UNION ALL
    SELECT 'name_of_day', 
           TO_CHAR(COUNT(*) FILTER (WHERE name_of_day IS NULL), '999,999,999') AS null_count, 16 
    FROM user_data
    UNION ALL
    SELECT 'day_of_year', 
           TO_CHAR(COUNT(*) FILTER (WHERE day_of_year IS NULL), '999,999,999') AS null_count, 17 
    FROM user_data
) AS results
ORDER BY sort_order, column_name;"

An in-depth analysis of missing values within the dataset sheds light on the completeness and reliability of data across various columns:

Total Records: The dataset encompasses a substantial 5,667,717 records, indicating a significant volume of data under examination.
Ride ID and Rideable Type: Both the “ride_id” and “rideable_type” columns are devoid of missing values, highlighting consistent recording of these identifiers for each ride instance.
Start and End Time stamps: The “started_at” and “ended_at” columns also exhibit complete data, with no missing values, signifying comprehensive time stamp information for ride beginnings and endings.
Start and End Stations: While the “start_station_name” and “start_station_id” columns possess missing values, alongside the “end_station_name” and “end_station_id” columns, a substantial number of rides have incomplete station-related information.
Geographical Coordinates: The “start_lat,” “start_lng,” “end_lat,” and “end_lng” columns contain missing values. Notably, the “end_lat” and “end_lng” columns have a limited count of 5,858 missing values, indicating incomplete geographical coordinates for specific rides.
User Type and Ride Length: In contrast, the “member_casual” column, representing user types, and the “ride_length” column, depicting ride durations, are complete, lacking any missing values.
Day of Week, Name of Day, and Day of Year: The columns “day_of_week,” “name_of_day,” and “day_of_year” also boast completeness, signifying comprehensive data concerning ride timing.
Geographical Data Completeness: The presence of missing geographical coordinates prompts further exploration to discern the factors contributing to these gaps. Ensuring the completeness of geographical data is crucial for precise mapping and spatial analysis.

Observations of Data Quality by month:

4. monthly_error_statistics_query

General overview of data quality by month

"WITH total_counts AS (
    SELECT COUNT(*) AS total_records FROM user_data
)

SELECT 
    EXTRACT(MONTH FROM started_at) AS month,
    LPAD(
        TO_CHAR(
            COUNT(*) FILTER (
                WHERE start_station_name IS NULL OR 
                      start_station_name = '' OR 
                      start_station_name = ' '
            ),
            '999,999,999'
        ),
        12
    ) AS error_count_start_station_name,
    LPAD(
        TO_CHAR(
            ROUND(
                (COUNT(*) FILTER (
                    WHERE start_station_name IS NULL OR 
                          start_station_name = '' OR 
                          start_station_name = ' '
                ) * 100.0 / total_records),
                2
            ),
            '999.99'
        ),
        8
    ) AS error_percentage_ssn_by_month,
    LPAD(
        TO_CHAR(
            COUNT(*) FILTER (
                WHERE start_station_id IS NULL OR 
                      start_station_id = '' OR 
                      start_station_id = ' '
            ),
            '999,999,999'
        ),
        12
    ) AS error_count_start_station_id,
    LPAD(
        TO_CHAR(
            ROUND(
                (COUNT(*) FILTER (
                    WHERE start_station_id IS NULL OR 
                          start_station_id = '' OR 
                          start_station_id = ' '
                ) * 100.0 / total_records),
                2
            ),
            '999.99'
        ),
        8
    ) AS error_percentage_ssi_by_month,
    LPAD(
        TO_CHAR(
            COUNT(*) FILTER (
                WHERE end_station_name IS NULL OR 
                      end_station_name = '' OR 
                      end_station_name = ' '
            ),
            '999,999,999'
        ),
        12
    ) AS error_count_end_station_name,
    LPAD(
        TO_CHAR(
            ROUND(
                (COUNT(*) FILTER (
                    WHERE end_station_name IS NULL OR 
                          end_station_name = '' OR 
                          end_station_name = ' '
                ) * 100.0 / total_records),
                2
            ),
            '999.99'
        ),
        8
    ) AS error_percentage_esn_by_month,
    LPAD(
        TO_CHAR(
            COUNT(*) FILTER (
                WHERE end_station_id IS NULL OR 
                      end_station_id = '' OR 
                      end_station_id = ' '
            ),
            '999,999,999'
        ),
        12
    ) AS error_count_end_station_id,
    LPAD(
        TO_CHAR(
            ROUND(
                (COUNT(*) FILTER (
                    WHERE end_station_id IS NULL OR 
                          end_station_id = '' OR 
                          end_station_id = ' '
                ) * 100.0 / total_records),
                2
            ),
            '999.99'
        ),
        8
    ) AS error_percentage_esi_by_month,
    LPAD(
        TO_CHAR(
            COUNT(*) FILTER (WHERE end_lat IS NULL),
            '999,999,999'
        ),
        12
    ) AS error_count_end_lat,
    LPAD(
        TO_CHAR(
            ROUND(
                (COUNT(*) FILTER (WHERE end_lat IS NULL) * 100.0 / total_records),
                2
            ),
            '999.99'
        ),
        8
    ) AS error_percentage_end_lat_by_month,
    LPAD(
        TO_CHAR(
            COUNT(*) FILTER (WHERE end_lng IS NULL),
            '999,999,999'
        ),
        12
    ) AS error_count_end_lng,
    LPAD(
        TO_CHAR(
            ROUND(
                (COUNT(*) FILTER (WHERE end_lng IS NULL) * 100.0 / total_records),
                2
            ),
            '999.99'
        ),
        8
    ) AS error_percentage_end_lng_by_month

FROM 
    user_data,
    total_counts

WHERE
    start_station_name IS NULL OR start_station_name = '' OR start_station_name = ' '
    OR start_station_id IS NULL OR start_station_id = '' OR start_station_id = ' '
    OR end_station_name IS NULL OR end_station_name = '' OR end_station_name = ' '
    OR end_station_id IS NULL OR end_station_id = '' OR end_station_id = ' '
    OR end_lat IS NULL
    OR end_lng IS NULL

GROUP BY 
    EXTRACT(MONTH FROM started_at),
    total_records

ORDER BY month;
"

This provides a detailed breakdown of error counts and error percentages related to station and geographical data quality issues across different months:

Data Analysis Scope: The query specifically focuses on analyzing data quality issues related to start and end station names, start and end station IDs, and geographical coordinates (end_lat and end_lng) for each month.
Error Counts and Percentages: The query calculates both the error counts and error percentages for each column under consideration. Error counts represent the number of records with missing or invalid values, while error percentages indicate the proportion of such records relative to the total number of records for each month.
Start Station Name and ID: The analysis reveals that the “start_station_name” and “start_station_id” columns have relatively consistent error counts and error percentages across the months, ranging from approximately 16,000 to 112,000 records with errors. The error percentages for these columns generally fall within the range of 0.29% to 1.98%.
End Station Name and ID: Similar patterns are observed for the “end_station_name” and “end_station_id” columns, with error counts ranging from around 17,000 to 121,000 records and error percentages within the range of 0.32% to 2.13%.
Geographical Coordinates: The analysis also highlights data quality issues related to geographical coordinates. Despite the “started_lat” and “started_lng”, along with the “end_lat” and “end_lng” columns show either no or relatively low error counts, ranging from 86 to 1,055 records, with error percentages of 0.00% to 0.02%, inspection of the data shows that there are significant records with incomplete geographical coordinate information. Monthly Variation: When considering the error percentages, there is a noticeable increase in error percentages for both station-related data (start and end stations) during the summer months (June, July, and August), suggesting potential challenges or data collection issues during this period.

***************************************************************************************

Analysis Phase

My analysis employs a methodical approach to uncover ride-sharing dynamics, progressing through distinct stages that build on previous insights. We begin by examining the foundational ‘user_data’ table, investigating ride start and end times, user types, and durations to identify key patterns. The integration of the ‘holidays’ and ‘weather_data’ tables into the ‘temporal_data’ data set broadens my analysis. We gain a comprehensive perspective on how holidays and weather impact user behaviors, understanding their collective influence on ride choices. Bringing together insights from individual analyses, we synthesize data from ‘user_data,’ ‘holidays,’ and ‘weather_data’ tables. This reveals connections between user types, temporal patterns, and weather conditions, offering actionable insights. My goal is to enable informed decisions in the ride-sharing ecosystem by uncovering meaningful trends. This structured approach ensures that each stage contributes to a robust analysis framework.

User Segmentation: The data set splits users into “Casual” and “Member” types, revealing potential ride behavior differences.
Ride Duration Analysis: By calculating average, maximum ride lengths, and normalized values, we gain insights into typical user ride experiences.
Temporal Patterns and Insights: By utilizing year, month, and quarterly dimensions we are able to uncover rider behavior trends over different time frames. Additionally, we conduct explorations down to week, day and hour levels to capture more nuanced patterns.
Seasonal Insights: The data set demonstrates ride behavior shifts across seasons, and we uncover nuanced trends influencing user engagement and ride duration.

The initial approach involves a comprehensive examination of general ridership, encompassing various temporal aspects. We begin by taking a high-level view of rider engagement across multiple time frames, including monthly, quarterly, and seasonal analyses. This exploration allows us to discern overarching trends and patterns in user behavior, shedding light on how ridership fluctuates over different time intervals.

1. total_records_by_month

Getting the number of records within the data set by month

"SELECT
  EXTRACT(MONTH FROM started_at) AS month,
  TO_CHAR(COUNT(*), '999,999,999') AS total_records_by_month
FROM
  user_data
GROUP BY
  EXTRACT(MONTH FROM started_at)
ORDER BY
  EXTRACT(MONTH FROM started_at);
"

Observations:

Seasonal Variation: The ride volume data shows variations across different months of the year. Peaks and valleys in ride counts indicate potential seasonal patterns in user engagement with the ride-sharing platform. Peak Months: Months with the highest ride counts, such as July and September, suggest periods of increased user activity and demand for ride-sharing services. These peak months could coincide with holidays, vacation seasons, or special events.
Off-Peak Months: Conversely, months with lower ride counts, like February and November, indicate periods of reduced ride-sharing activity. These off-peak months might present opportunities for targeted promotions or incentives to boost user engagement during slower periods.
User Behavior Insights: The varying ride volumes by month could reflect shifts in user behavior and preferences. Analyzing the factors driving these changes could lead to insights into users’ transportation needs and habits.

2. total_records_by_quarter

Getting the number of records within the data set by quarter

"SELECT
  TO_CHAR(EXTRACT(QUARTER FROM started_at), '9') AS quarter,
  TO_CHAR(COUNT(*), '9,999,999') AS total_records_by_quarter
FROM
  user_data
GROUP BY
  EXTRACT(QUARTER FROM started_at)
ORDER BY
  EXTRACT(QUARTER FROM started_at);
"

Observations:

Quarterly Fluctuations: The ride volume data is segmented into four quarters, each representing a three-month period. The analysis reveals fluctuations in ride counts from one quarter to another, indicating changing user engagement and demand over the course of the year.
Seasonal Patterns: The significant increase in ride counts during the second and third quarters (Q2 and Q3) suggests a seasonal trend, with warmer months potentially driving higher ride-sharing activity. This pattern may be influenced by favorable weather conditions for outdoor activities and travel.
Q2 and Q3 Peaks: The substantial surge in ride volume during Q2 and Q3 signifies peak usage periods for the ride-sharing platform.
Q1 and Q4 Trends: The lower ride counts in Q1 and Q4 could be attributed to colder weather, reduced outdoor activities, or fewer events.
User Behavior Shifts: Quarterly variations could reflect shifts in user behavior due to factors such as holidays, vacation periods, and major events

3. total_records_by_season

Getting the number of records within the data set by season

"SELECT
  CASE
    WHEN EXTRACT(MONTH FROM started_at) IN (12, 1, 2) THEN 'Winter'
    WHEN EXTRACT(MONTH FROM started_at) IN (3, 4, 5) THEN 'Spring'
    WHEN EXTRACT(MONTH FROM started_at) IN (6, 7, 8) THEN 'Summer'
    WHEN EXTRACT(MONTH FROM started_at) IN (9, 10, 11) THEN 'Autumn'
  END AS season,
  TO_CHAR(COUNT(*), '999,999,999') AS total_records_by_season
FROM
  user_data
GROUP BY
  season
ORDER BY
  MIN(EXTRACT(MONTH FROM started_at));"

Observations:

This query provides a clear breakdown of the total ride records by different seasons, categorized as Winter, Spring, Summer, and Autumn. Each season corresponds to a specific range of months. The data reveals interesting insights into user behavior patterns throughout the year:

Seasonal Fluctuations: The data showcases a consistent trend in ride records across the seasons. Summer and Spring emerge as the peak seasons for ride-sharing activities, with the highest and second-highest total ride records, respectively. This trend suggests that users tend to engage more in ridesharing during warmer months, possibly due to more favorable weather conditions.
Off-Peak Seasons: On the other hand, Winter and Autumn demonstrate lower ride records compared to the warmer seasons. This could be attributed to weather-related factors, as well as potential seasonal variations in user preferences and demand.

******************************************************

From this point forward, I will be segmenting the data set based on the distinction between “member” and “casual” users

During my exploration, I conducted a comprehensive analysis of riders, both members and casual, over a full year. This involved calculating total counts for each category and highlighting the percentage difference. I also intend to extend this analysis to monthly and seasonal periods for a more detailed understanding of rider engagement patterns.

4. user_count_by_type

Getting the count of each user type in the segmented groups

"SELECT member_casual, Count(*) AS user_count 
FROM user_data 
GROUP BY member_casual;"

Observations:

Casual Riders: Recorded at 2,321,974 rides, this user group consists of individuals who use the service sporadically or on a one-time basis, without subscribing to a long-term membership.
Member Riders: This category, with a total of 3,345,640 rides, comprises individuals who have committed to a membership plan, indicating a higher level of engagement and regular usage of the service.

5. member_casual_user_stats_year

Getting the total counts of members and casual riders for the year

"SELECT
    EXTRACT(YEAR FROM started_at) AS year,
    TO_CHAR(SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END), '999,999,999') AS member_count,
    TO_CHAR(SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END), '999,999,999') AS casual_count,
    ROUND(
        (
            (
                SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END) - 
                SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END)
            )::numeric /
            SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END)
        ) * 100,
        2
    ) AS percentage_difference
FROM user_data
GROUP BY EXTRACT(YEAR FROM started_at)
ORDER BY year;
"

Observations:

User Demographics and Preferences: The data suggests that in the year 2022, there were approximately 3.34 million rides taken by “Member” users and around 2.32 million rides by “Casual” users. This disparity indicates a significant presence of “Member” users who utilized the bike-sharing service more frequently during this year.
User Engagement: By examining the calculations of “Member” and “Casual” ride counts, we gain valuable insights into user engagement and utilization patterns. The higher count of “Member” rides implies a dedicated user base that actively engages with the ride-sharing platform on a regular basis.
Shifts in User Patterns: The “percentage_difference” column displays a percentage difference of 30.6% between “Member” and “Casual” ride counts. This underscores notable distinctions in usage behaviors and preferences between these two user categories. Factors such as pricing models, convenience, and loyalty programs could contribute to these divergent patterns of use.

6. member_casual_user_stats_month

Getting the total counts of members and casual riders by month 

"WITH counts AS (
    SELECT
        EXTRACT(MONTH FROM started_at) AS month,
        COUNT(*) FILTER (WHERE member_casual = 'member') AS member_count,
        COUNT(*) FILTER (WHERE member_casual = 'casual') AS casual_count
    FROM
        user_data
    GROUP BY
        EXTRACT(MONTH FROM started_at)
)
SELECT
    TO_CHAR(TO_DATE(EXTRACT(YEAR FROM CURRENT_DATE) || '-' || month || '-01', 'YYYY-MM-DD'), 'Month') AS months,
    TO_CHAR(member_count, 'FM999,999,999') AS membership_count,
    TO_CHAR(casual_count, 'FM999,999,999') AS casual_use_count,
    ROUND(
        ((member_count - casual_count)::numeric / NULLIF(member_count, 0)) * 100,
        2
    ) AS percentage_difference
FROM
    counts
ORDER BY
    EXTRACT(MONTH FROM TO_DATE(month::text, 'MM'));
"

Observations:

Monthly User Engagement: In “January,” there were 85,250 rides taken by “Member” users and 18,520 rides by “Casual” users. Similarly, a breakdown is provided for each subsequent month.
Shifts in User Preferences: The percentage difference column indicates the variation in ride counts between “Member” and “Casual” users. Notably, there are instances where “Member” users significantly outnumber “Casual” users, such as in “January,” “February,” and “March.” This could reflect a stronger preference for membership-based usage during these months.
Monthly Variations: The query reveals distinct monthly trends in user behaviors. For example, in “May,” “Casual” users account for a larger share of rides compared to previous months, with 280,413 rides, suggesting increased casual user engagement during this time.
Seasonal Patterns: By examining the monthly breakdown, it becomes evident that there are variations in user preferences based on the seasons. Some months, like “July” and “August,” show a relatively smaller percentage difference, indicating balanced usage between “Member” and “Casual” users during the summer months.

7. member_casual_user_stats_season

Getting the total counts of members and casual riders by season 

"WITH seasonal_counts AS (
    SELECT
        CASE
            WHEN EXTRACT(MONTH FROM started_at) IN (12, 1, 2) THEN 'Winter'
            WHEN EXTRACT(MONTH FROM started_at) IN (3, 4, 5) THEN 'Spring'
            WHEN EXTRACT(MONTH FROM started_at) IN (6, 7, 8) THEN 'Summer'
            WHEN EXTRACT(MONTH FROM started_at) IN (9, 10, 11) THEN 'Autumn'
            ELSE 'Unknown'
        END AS season,
        COUNT(*) FILTER (WHERE member_casual = 'member') AS member_count,
        COUNT(*) FILTER (WHERE member_casual = 'casual') AS casual_count
    FROM
        user_data
    WHERE
        EXTRACT(YEAR FROM started_at) = 2022
    GROUP BY
        season
)
SELECT
    season,
    TO_CHAR(member_count, '9,999,999') AS membership,
    TO_CHAR(casual_count, '9,999,999') AS casual,
    ROUND(((member_count - casual_count)::numeric / NULLIF(member_count, 0)) * 100, 2) AS percentage_difference
FROM
    seasonal_counts
ORDER BY
    casual DESC;"

Observations:

Seasonal User Engagement: The results illustrate user engagement throughout the year 2022, segmented by seasons. The “Summer” season demonstrates robust user involvement, with approximately 1,244,574 “Member” rides and 1,134,006 “Casual” rides. Subsequent seasons, namely “Autumn,” “Spring,” and “Winter,” exhibit comparable patterns, though with varying ride counts for both “Member” and “Casual” users.
Shifts in User Behavior: Notably, the “Winter” season stands out with a higher percentage difference of 73.19%, indicating a noticeable preference for “Member” users during this period. This phenomenon could be attributed to factors like weather conditions or targeted promotions aimed at incentivizing “Member” utilization.
Casual Usage Patterns: Despite featuring fewer “Casual” rides, the “Winter” season still records a substantial count of 84,830 rides. This observation underscores the fact that some casual users continue to leverage the service even under less favorable conditions.
Seasonal Variations: The query provides valuable insights into the shifting patterns of user engagement across different seasons. Particularly, the “Autumn” and “Spring” seasons showcase similar trends in the percentage difference, suggesting a balanced usage dynamic between “Member” and “Casual” users during these periods.

***************************************************************************************

8. avg_ride_length_analysis_by_year

Getting the average ride lengths by year


"SELECT
    year,
    ROUND(member_mean_ride_length_minutes, 2) AS rounded_member_mean_ride_length_minutes,
    ROUND(casual_mean_ride_length_minutes, 2) AS rounded_casual_mean_ride_length_minutes,
    ROUND(
        (
            (casual_mean_ride_length_minutes - member_mean_ride_length_minutes)
            / NULLIF(casual_mean_ride_length_minutes, 0)
        ) * 100,
        2
    ) AS percentage_difference
FROM (
    SELECT
        EXTRACT(YEAR FROM started_at) AS year,
        AVG(
            CASE WHEN member_casual = 'member'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END
        )::numeric AS member_mean_ride_length_minutes,
        AVG(
            CASE WHEN member_casual = 'casual'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END
        )::numeric AS casual_mean_ride_length_minutes
    FROM
        user_data
    GROUP BY
        year
) AS subquery;"

Observations:

Diverse User Ride Lengths: The data reveals a distinct contrast in the average ride durations between “Member” and “Casual” users. “Member” users display notably shorter average ride lengths, with rides lasting approximately 12.71 minutes on average. In contrast, “Casual” users engage in considerably longer rides, with an average duration of around 29.10 minutes.
User Engagement Patterns: The observed divergence in average ride lengths suggests varying levels of user engagement and usage patterns. “Casual” users tend to opt for longer rides, while “Member” users often choose shorter trips.
Significant Percentage Difference: The calculated percentage difference of 56.30% underscores the substantial disparity in average ride lengths between the two user categories. This percentage accentuates the extent to which “Casual” users’ ride lengths exceed those of “Member” users

9. avg_ride_length_analysis_by_month

Getting the average ride lengths by month

"SELECT
    TO_CHAR(started_at, 'Month') AS month,
    ROUND(
        AVG(CASE WHEN member_casual = 'member'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 2
    ) AS member_mean_ride_length_minutes,
    ROUND(
        AVG(CASE WHEN member_casual = 'casual'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 2
    ) AS casual_mean_ride_length_minutes,
    ROUND(
        (
            (
                AVG(CASE WHEN member_casual = 'casual'
                         THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
            ) - (
                AVG(CASE WHEN member_casual = 'member'
                         THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
            )
        ) / AVG(CASE WHEN member_casual = 'casual'
                   THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric * 100, 2
    ) AS percentage_difference
FROM
    user_data
GROUP BY
    month
ORDER BY
    ROUND(AVG(CASE WHEN member_casual = 'casual'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 2) DESC;
"

Observations:

Seasonal Variation in Ride Lengths: The data unveils fluctuations in average ride lengths for both “Member” and “Casual” users throughout the year. These variations suggest that external factors, such as weather conditions or holidays, may influence user engagement and ride durations.
Longer Casual Rides in Spring and Summer: Notably, the months of “March,” “June,” “May,” and “April” consistently exhibit higher average ride lengths for “Casual” users compared to “Member” users. This observation suggests that casual riders are more inclined to take longer rides during the spring and summer months, potentially due to favorable weather or increased leisure activities.
Impact of Season on Member Riders: While “Casual” riders tend to have longer rides during specific months, “Member” riders maintain relatively stable ride lengths across most months. This could indicate that “Member” riders use the service consistently for commuting or other routine activities.
March: Highest Percentage Difference: “March” stands out with the highest percentage difference of 63.34%. This significant difference underscores the distinct ride length preferences between “Casual” and “Member” users during this month. Investigating the specific factors contributing to this divergence could provide valuable insights.
November: Lowest Casual Ride Lengths: “November” records the lowest average ride lengths for both “Casual” and “Member” users. This could be attributed to colder weather or reduced outdoor activities during this month, resulting in shorter rides overall.

10. avg_ride_length_by_day_of_week

Getting the average ride lengths by day of the week

"SELECT *
FROM crosstab(
    'SELECT
        CASE WHEN member_casual = ''member'' THEN ''Member'' ELSE ''Casual'' END AS category,
        EXTRACT(DOW FROM started_at) AS day_of_week,
        ROUND(AVG(CAST(ride_length AS numeric)), 4) AS average_ride_length
    FROM
        user_data
    GROUP BY
        1, 2
    ORDER BY
        1, 2',
    'VALUES (0), (1), (2), (3), (4), (5), (6)'
) AS pivot_table(category text, "Sunday" numeric, "Monday" numeric, "Tuesday" numeric,
                 "Wednesday" numeric, "Thursday" numeric, "Friday" numeric, "Saturday" numeric);
"

Observations:

Casual vs. Member Ride Lengths: The table clearly showcases a consistent trend where the average ride lengths for “Casual” users are generally higher compared to “Member” users on each day of the week. This observation suggests that casual riders tend to take longer rides than members.
Weekday vs. Weekend Patterns: Upon examining the values, it becomes evident that both “Casual” and “Member” users tend to experience slightly longer rides on weekends (Saturday and Sunday) compared to weekdays. This pattern could indicate that users opt for more leisurely rides during the weekends.
Similar Trends: While “Casual” riders have longer average ride lengths than “Member” riders, both user categories exhibit similar fluctuations in ride lengths throughout the days of the week. This similarity suggests that the factors influencing ride lengths may be applicable to both types of users.
Consistency: The discrepancies in average ride lengths between “Casual” and “Member” users remain relatively consistent across all days of the week. “Casual” riders consistently enjoy higher average ride lengths by a noticeable margin.
Implications: The variations in average ride lengths offer insights into user preferences and behaviors. For instance, the extended ride lengths for casual riders might indicate a preference for leisurely exploration or longer recreational rides. On the other hand, the relatively shorter rides for members could imply more utilitarian or commute-oriented use.

11. avg_ride_length_analysis_by_season

Getting the average ride lengths by season

"SELECT
    CASE
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
        ELSE 'Winter'
    END AS season,
    ROUND(
        AVG(CASE WHEN member_casual = 'member'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 2
    ) AS member_mean_ride_length_minutes,
    ROUND(
        AVG(CASE WHEN member_casual = 'casual'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 2
    ) AS casual_mean_ride_length_minutes,
    ROUND(
        (
            (
                AVG(CASE WHEN member_casual = 'casual'
                         THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
            ) - (
                AVG(CASE WHEN member_casual = 'member'
                         THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
            )
        ) / AVG(CASE WHEN member_casual = 'casual'
                   THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric * 100, 2
    ) AS percentage_difference
FROM
    user_data
GROUP BY
    CASE
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
        ELSE 'Winter'
    END
ORDER BY
    casual_mean_ride_length_minutes DESC;
"

Observations:

Seasonal Patterns in Ride Lengths: The outcomes adeptly illustrate distinctive patterns in average ride lengths for each season. “Summer” and “Spring” prominently display the highest average ride lengths for both “Member” and “Casual” users, suggesting that users tend to engage in longer rides during the warmer months.
Summer Peaks in Member Rides: Interestingly, during the “Summer” season, “Member” riders exhibit slightly higher average ride lengths compared to “Casual” riders. This observation could imply that “Member” riders continue to use the service for extended rides, potentially during vacations or leisure activities.
Autumn and Winter Trends: In contrast, the “Autumn” and “Winter” seasons witness a decline in average ride lengths for both user categories. This trend could be attributed to less favorable weather conditions during these seasons, likely resulting in shorter rides overall.
Percentage Differences: The calculated percentage differences provide insight into the extent of variations in average ride lengths between “Member” and “Casual” users. Notably, “Spring” demonstrates the highest percentage difference of 54.63%, emphasizing that “Casual” users tend to have significantly longer rides compared to “Member” users during this season.

***************************************************************************************

The following three queries focus on comparing the total and maximum ride lengths, adjusting and normalizing them by user count for unbiased comparisons.

12. max_ride_length_analysis_yearly

Getting the max ride lengths by year

"WITH MaxRideLengths AS (
    SELECT 
        CASE 
            WHEN member_casual = 'member' THEN 'Member'
            WHEN member_casual = 'casual' THEN 'Casual'
        END AS user_type,
        SUM(EXTRACT(EPOCH FROM (ended_at - started_at))) AS sum_ride_length_seconds,
        COUNT(*) AS user_count
    FROM user_data
    GROUP BY member_casual
)
SELECT 
    user_type,
    CONCAT(
        TO_CHAR(FLOOR(sum_ride_length_seconds / 3600), 'FM999,999,999'), ' hours, ',
        FLOOR((sum_ride_length_seconds % 3600) / 60), ' minutes, ',
        FLOOR(sum_ride_length_seconds % 60), ' seconds'
    ) AS total_max_ride_length_for_the_year,
    CONCAT(
        TO_CHAR(FLOOR((sum_ride_length_seconds / 3600 / user_count) / 60), 'FM999,999,999'), ' hours, ',
        FLOOR((sum_ride_length_seconds / user_count % 3600) / 60), ' minutes, ',
        FLOOR((sum_ride_length_seconds / user_count) % 60), ' seconds'
    ) AS normalized_max_ride_length_for_the_year
FROM MaxRideLengths;


"

Observations:

Total Max Ride Lengths: The provided outcomes unveil that “Casual” users collectively accrued a total ride time of approximately 1,126,037 hours, 9 minutes, 44 seconds over the span of the year. In contrast, “Member” users documented a cumulative ride time of around 708,941 hours, 42 minutes, 53 seconds. This notable distinction in total ride times underscores that, on average, “Casual” users engaged in longer rides compared to “Member” users.
Max Ride Length per User: On average, an individual “Casual” user’s longest ride duration stood at approximately 29 minutes and 5 seconds, whereas an average “Member” user’s longest ride duration equated to around 12 minutes and 42 seconds. This discrepancy indicates that “Casual” users experienced longer individual ride durations compared to “Member” users.
User Engagement Variability: The data brings to light a substantial variability in user engagement patterns between “Casual” and “Member” users. The extended maximum ride length for “Casual” users suggests that this particular user category encompasses individuals who opt for more extensive and potentially leisurely rides.

13. max_ride_length_analysis_monthly

Getting the max ride lengths by month

"WITH MaxRideLengths AS (
    SELECT 
        TO_CHAR(started_at, 'Month') AS month,
        CASE 
            WHEN member_casual = 'member' THEN 'Member'
            WHEN member_casual = 'casual' THEN 'Casual'
        END AS user_type,
        SUM(EXTRACT(EPOCH FROM (ended_at - started_at))) AS sum_ride_length_seconds,
        COUNT(*) AS user_count
    FROM user_data
    GROUP BY month, member_casual
)
SELECT 
    month,
    user_type,
    CONCAT(
        TO_CHAR(FLOOR(sum_ride_length_seconds / 3600), 'FM999,999,999'), ' hours, ',
        TO_CHAR(FLOOR((sum_ride_length_seconds % 3600) / 60), 'FM999,999,999'), ' minutes, ',
        TO_CHAR(FLOOR(sum_ride_length_seconds % 60), 'FM999,999,999'), ' seconds'
    ) AS total_max_ride_length_for_the_month,
    CONCAT(
        TO_CHAR(FLOOR((sum_ride_length_seconds / 3600 / user_count) / 60), 'FM999,999,999'), ' hours, ',
        TO_CHAR(FLOOR((sum_ride_length_seconds / user_count % 3600) / 60), 'FM999,999,999'), ' minutes, ',
        TO_CHAR(FLOOR((sum_ride_length_seconds / user_count) % 60), 'FM999,999,999'), ' seconds'
    ) AS normalized_max_ride_length_for_the_month
FROM MaxRideLengths
ORDER BY 
    to_timestamp(month, 'Month')::DATE,
    user_type;"

Observations:

Total Max Ride Lengths: March and May Stand Out: Most months show that “Casual” users tend to have longer total ride times compared to “Member” users. For instance, in March, “Casual” users accumulated approximately 48,869 hours, 31 minutes, and 45 seconds of total ride time, while “Member” users had around 38,697 hours, 9 minutes, and 8 seconds. Similarly, in May, “Casual” users demonstrated notably longer total ride times with approximately 143,667 hours, 24 minutes, and 34 seconds, whereas “Member” users had around 78,962 hours, 5 minutes, and 3 seconds.
Member Users Leading in November: Conversely, there are months where “Member” users exhibited slightly longer total ride times than “Casual” users. For example, in November, “Member” users recorded around 43,961 hours, 22 minutes, and 9 seconds, while “Casual” users had 35,760 hours, 56 minutes, and 29 seconds.
Max Ride Length per User: In terms of maximum ride lengths per user, “Casual” users consistently experienced longer individual ride durations compared to “Member” users across all months.

14. max_ride_length_analysis_seasonal

Getting the max ride lengths by season

"WITH MaxRideLengths AS (
    SELECT 
        CASE 
            WHEN member_casual = 'member' THEN 'Member'
            WHEN member_casual = 'casual' THEN 'Casual'
        END AS user_type,
        CASE
            WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
            WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
            WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
            ELSE 'Winter'
        END AS season,
        SUM(EXTRACT(EPOCH FROM (ended_at - started_at))) AS sum_ride_length_seconds,
        COUNT(*) AS user_count
    FROM user_data
    GROUP BY user_type, season
)
SELECT 
    season,
    user_type,
    CONCAT(
        TO_CHAR(FLOOR(sum_ride_length_seconds / 3600), 'FM999,999,999'), ' hours, ',
        TO_CHAR(FLOOR((sum_ride_length_seconds % 3600) / 60), 'FM999,999,999'), ' minutes, ',
        TO_CHAR(FLOOR(sum_ride_length_seconds % 60), 'FM999,999,999'), ' seconds'
    ) AS total_max_ride_length_for_the_season,
    CONCAT(
        TO_CHAR(FLOOR((sum_ride_length_seconds / 3600 / user_count) / 60), 'FM999,999,999'), ' hours, ',
        TO_CHAR(FLOOR((sum_ride_length_seconds / user_count % 3600) / 60), 'FM999,999,999'), ' minutes, ',
        TO_CHAR(FLOOR((sum_ride_length_seconds / user_count) % 60), 'FM999,999,999'), ' seconds'
    ) AS normalized_max_ride_length_for_the_season
FROM MaxRideLengths
ORDER BY
    CASE 
        WHEN season = 'Spring' THEN 1
        WHEN season = 'Summer' THEN 2
        WHEN season = 'Autumn' THEN 3
        ELSE 4
    END,
    user_type;"

Observations:

Total Max Ride Lengths:

Spring: During the Spring season, “Casual” users accumulated a total ride time of approximately 254,760 hours, 17 minutes, and 9 seconds. In comparison, “Member” users had a total ride time of around 164,554 hours, 22 minutes, and 28 seconds. This pattern of “Casual” users having longer total ride times than “Member” users is consistent with the trend observed in the analysis of monthly ride lengths.
Summer: Similar to Spring, “Casual” users had notably longer total ride times during the Summer season, accumulating approximately 570,316 hours, 36 minutes, and 41 seconds. Meanwhile, “Member” users recorded around 284,057 hours, 37 minutes, and 0 seconds of total ride time. This consistent difference suggests that the longer ride durations among “Casual” users extend beyond individual months and into seasonal patterns.
Autumn: In the Autumn season, “Casual” users continued to exhibit longer total ride times, with a cumulative ride time of approximately 265,372 hours, 49 minutes, and 33 seconds. “Member” users recorded around 201,168 hours, 23 minutes, and 21 seconds of total ride time during the same season. The trend of “Casual” users having longer total ride times remains consistent.
Winter: Interestingly, the pattern shifts in the Winter season. “Casual” users had a total ride time of around 35,587 hours, 26 minutes, and 21 seconds, while “Member” users recorded approximately 59,161 hours, 20 minutes, and 4 seconds. Unlike other seasons, “Member” users demonstrated longer total ride times during Winter. This variation suggests a shift in rider behavior during colder months.
Max Ride Length per User: Across all seasons, “Casual” users consistently experienced longer individual ride durations compared to “Member” users. This pattern remains constant regardless of the season. User Engagement Variability: The analysis of total ride times across seasons showcases the variability in user engagement between “Casual” and “Member” users. “Casual” users tend to have longer ride times in Spring, Summer, and Autumn, while “Member” users exhibit longer ride times in Winter. These variations emphasize the influence of seasons on user behavior within the ride-sharing platform.

***************************************************************************************

15. mode_day_yearly

Getting the mode day by year

"SELECT
    mfd.name_of_day,
    mfd.day_of_week,
    cd.user_type,
    ROUND(AVG(EXTRACT(EPOCH FROM (cd.ended_at - cd.started_at)) / 60)::numeric, 2) AS yearly_mean_ride_length_minutes
FROM (
    SELECT
        name_of_day,
        started_at,
        ended_at,
        CASE
            WHEN member_casual = 'member' THEN 'Member'
            WHEN member_casual = 'casual' THEN 'Casual'
        END AS user_type
    FROM
        user_data
) AS cd
JOIN (
    SELECT
        name_of_day,
        EXTRACT(DOW FROM started_at) AS day_of_week
    FROM
        user_data
    GROUP BY
        name_of_day, EXTRACT(DOW FROM started_at)
    ORDER BY
        COUNT(*) DESC
    LIMIT 1
) AS mfd ON EXTRACT(DOW FROM cd.started_at) = mfd.day_of_week
GROUP BY
    mfd.name_of_day, mfd.day_of_week, cd.user_type
ORDER BY
    mfd.day_of_week, cd.user_type;
"

Observations:

On Saturdays, both “Casual” and “Member” users exhibit higher ride activity compared to other days of the week, making Saturdays the peak ridership day for the year.
Among “Casual” users on Saturdays, the average ride duration is approximately 32.53 minutes, whereas “Member” users have an average ride duration of about 14.14 minutes.
The disparity in average ride durations between the two user types on Saturdays suggests that “Casual” users tend to engage in longer rides compared to “Member” users during this peak ridership day.

16. mode_day_monthly

Getting the mode day by month

"WITH DayOfWeekCounts AS (
    SELECT
        TO_CHAR(started_at, 'Month') AS month,
        CASE 
            WHEN member_casual = 'member' THEN 'Member'
            WHEN member_casual = 'casual' THEN 'Casual'
        END AS user_type,
        EXTRACT(DOW FROM started_at) AS day_of_week,
        COUNT(*) AS ride_count,
        ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60)::numeric, 2) AS avg_ride_length_minutes
    FROM user_data
    GROUP BY month, user_type, EXTRACT(DOW FROM started_at)
),
RankedDays AS (
    SELECT
        month,
        user_type,
        day_of_week,
        ride_count,
        avg_ride_length_minutes,
        ROW_NUMBER() OVER (PARTITION BY month, user_type ORDER BY ride_count DESC) AS rank
    FROM DayOfWeekCounts
)
SELECT
    month,
    user_type,
    CASE
        WHEN day_of_week = 0 THEN 'Sunday'
        WHEN day_of_week = 1 THEN 'Monday'
        WHEN day_of_week = 2 THEN 'Tuesday'
        WHEN day_of_week = 3 THEN 'Wednesday'
        WHEN day_of_week = 4 THEN 'Thursday'
        WHEN day_of_week = 5 THEN 'Friday'
        WHEN day_of_week = 6 THEN 'Saturday'
    END AS mode_day_of_week,
    MAX(ride_count) AS max_count,
    avg_ride_length_minutes
FROM RankedDays
WHERE rank = 1
GROUP BY month, user_type, day_of_week, avg_ride_length_minutes
ORDER BY 
    to_timestamp(month, 'Month')::DATE,
    user_type;"

Observations:

User Behavior Consistency: The mode day of the week for “Casual” users tends to remain consistent across most months, often falling on weekends such as “Saturday” and “Sunday.” This indicates that casual users prefer ride-sharing services during their leisure time.
Member User Preference: For “Member” users, the mode day of the week varies more. While “Thursday” appears as a prominent mode day for several months, other weekdays also emerge as frequent mode days. This suggests that member users utilize the service for both leisure and daily commuting purposes.
Weekend Dominance: In general, weekends (“Saturday” and “Sunday”) stand out as popular mode days for ride-sharing activities among both user types. This aligns with common trends of increased mobility and recreational outings during weekends.
Distinct Usage Patterns: “Casual” users tend to exhibit more consistent mode days throughout the year compared to “Member” users, who display a wider range of mode days. This distinction may reflect the differing motivations and needs of these user segments.
Seasonal Influences: The mode days could be influenced by seasonal factors, with warmer months potentially leading to more outdoor activities and increased ride-sharing demand. Analyzing further could reveal how weather and other seasonal elements impact user behavior.

17. mode_day_seasonal

Getting the mode day by month

"WITH seasonal_mode AS (
SELECT
        CASE
            WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
            WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
            WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
            ELSE 'Winter'
        END AS season,
        name_of_day,
        CASE 
            WHEN member_casual = 'member' THEN 'Member'
            WHEN member_casual = 'casual' THEN 'Casual'
        END AS user_type,
        ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60)::numeric, 2)
        AS seasonal_mean_ride_length_minutes,
        RANK() OVER (PARTITION BY CASE
                                    WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
                                    WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
                                    WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
                                    ELSE 'Winter'
                                END, 
                                CASE 
                                    WHEN member_casual = 'member' THEN 'Member'
                                    WHEN member_casual = 'casual' THEN 'Casual'
                                END
                     ORDER BY COUNT(*) DESC) AS rank
    FROM
        user_data
    GROUP BY
        season, name_of_day, user_type
)
SELECT
    season,
    name_of_day,
    user_type,
    seasonal_mean_ride_length_minutes
FROM
    seasonal_mode
WHERE
    rank = 1
ORDER BY
    CASE
        WHEN season = 'Spring' THEN 1
        WHEN season = 'Summer' THEN 2
        WHEN season = 'Autumn' THEN 3
        WHEN season = 'Winter' THEN 4
    END,
    user_type;"

Observations:

Seasonal Variation: The analysis reveals that ride patterns vary across different seasons. The average ride length on Saturdays is consistently higher during Spring, Summer, and Autumn seasons, while it decreases slightly during Winter. This suggests that weekends remain popular for cycling in general, but there might be more outdoor activity during warmer months.
User Type Behavior: Casual users tend to have longer average ride lengths compared to members in each season. This may indicate that casual users are more likely to use the bike-sharing service for leisure or longer trips, while members might be more focused on regular commuting.
Preferred Riding Days: Across all seasons, Saturday stands out as a preferred day for biking, particularly for casual users. This aligns with typical leisure activities and suggests that weekends attract higher usage.
Seasonal Comfort: The dip in average ride length on Winter Saturdays for both user types could be attributed to colder weather conditions, which might discourage longer rides. This highlights the impact of weather and climate on user behaviors.
User Type Consistency: Member riders’ behavior is more consistent across seasons, as evidenced by similar average ride lengths on different days of the week. In contrast, casual users show greater fluctuation in their ride patterns.
Seasonal Changes: The mode day of the week for each season and user type reveals specific preferences. For example, in Spring, Saturdays are the mode day for casual users, while Tuesdays are the mode day for members.

The following queries effectively utilize the crosstab function to transform ride count data from the “error_free_records” table. The query calculates and categorizes the number of rides undertaken by both member and casual riders for each month, week and day of the year. The resulting pivot tables present a clear and concise summary, enabling a direct comparison of ride counts between member and casual riders.

***************************************************************************************

18. count_riders_per_day_yearly

Getting the count of riders per day by year

"SELECT
    TO_CHAR(started_at, 'Day') AS day_of_week,
    REPLACE(TO_CHAR(SUM(CASE WHEN member_casual = 'member' 
                        THEN 1 ELSE 0 END), '999,999'), ',', ',') AS member_ride_count,
    REPLACE(TO_CHAR(SUM(CASE WHEN member_casual = 'casual' 
                        THEN 1 ELSE 0 END), '999,999'), ',', ',') AS casual_ride_count,
    REPLACE(TO_CHAR(ROUND(((SUM(CASE WHEN member_casual = 'member' 
                                THEN 1 ELSE 0 END) - SUM(CASE WHEN member_casual = 'casual' 
                                THEN 1 ELSE 0 END))::numeric / SUM(CASE WHEN member_casual = 'member' 
                                THEN 1 ELSE 0 END)) * 100, 2), '999,999.99'), ',', ',') AS percent_difference
FROM
    user_data
GROUP BY
    TO_CHAR(started_at, 'Day')
ORDER BY
    EXTRACT(DOW FROM MIN(started_at));"

Observations:

The analysis illustrates the distribution of total ride counts between “Member” and “Casual” users throughout the week.

Balanced Weekends: The days with the highest ride counts, “Sunday” and “Saturday,” show a relatively balanced distribution between “Member” and “Casual” users. This suggests that weekends are popular for both user types, possibly due to increased leisure and recreational activities.
Weekday Variation: Weekdays exhibit variations in user engagement. “Monday” stands out with a notable difference in ride counts between “Member” and “Casual” users, with a 41.34% higher ride count for “Member” users. This could indicate that “Member” users prefer the service for daily commuting.
Tuesday and Wednesday Trends: “Tuesday” and “Wednesday” also show higher ride counts for “Member” users, indicating a consistent preference for these days. This pattern could be attributed to members utilizing the service for daily work commutes.
Weekend Dynamics: While “Saturday” and “Sunday” show the highest total ride counts, “Saturday” has a slightly higher count for “Casual” users, while “Sunday” has a slightly higher count for “Member” users. This suggests that “Saturday” might be a more popular day for recreational outings among casual users.
Thursday and Friday Patterns: “Thursday” and “Friday” exhibit relatively higher ride counts for both user types, reflecting increased mobility as the workweek comes to an end. Notably, “Thursday” shows a 41.88% higher ride count for “Member” users, indicating potential weekday commuting trends.
User Preference Dynamics: The percentage difference in ride counts showcases varying user preferences throughout the week. While “Member” users dominate ride counts on weekdays, “Casual” users contribute more on weekends.

19. count_riders_per_week_yearly

Getting the count of riders per week by year

"SELECT *
FROM crosstab(
    'SELECT
        EXTRACT(WEEK FROM started_at) AS week_number,
        member_casual,
        TO_CHAR(COUNT(*), ''9,999,999'') AS ride_count
    FROM
        user_data
    GROUP BY
        week_number, member_casual
    ORDER BY
        week_number',
    'VALUES (''member''), (''casual'')'
) AS pivot_table(week_number int, member text, casual text);
"

Observations:

Fluctuations Throughout the Year: The ride counts for both “Member” and “Casual” users vary significantly from week to week. This could be influenced by a range of factors, such as holidays, weather conditions, special events, or changes in user preferences.

High and Low Peaks: Weeks with higher ride counts may indicate times of increased demand for ride-sharing services, possibly due to holidays, vacations, or local events. Conversely, lower ride counts could correspond to periods of reduced demand.
Seasonal Patterns: It’s possible to observe some patterns based on the season. For example, weeks with higher ride counts during the warmer months might coincide with increased outdoor activities and events, whereas weeks with lower ride counts during colder months could be influenced by less outdoor mobility.

20. count_riders_per_month_yearly

Getting the count of riders per month by year

"-- Enable the tablefunc extension if not already enabled
CREATE EXTENSION IF NOT EXISTS tablefunc;

-- Your actual query using the crosstab function
SELECT *
FROM crosstab(
    'SELECT
        EXTRACT(MONTH FROM started_at) AS month_number,
        member_casual,
        TO_CHAR(COUNT(*), ''9,999,999'') AS ride_count
    FROM
        user_data
    GROUP BY
        month_number, member_casual
    ORDER BY
        month_number',
    'VALUES (''member''), (''casual'')'
) AS pivot_table(month_number int, member text, casual text);"

Observations:

Ride Count Increase: The ride count trends demonstrate a consistent increase in activity from the beginning of the year towards the middle, peaking in the summer months, and then gradually decreasing towards the end of the year. This pattern aligns with seasonal trends where warmer months typically see higher ride utilization.
Summer Peak: The months of May, June, and July show significant spikes in ride counts for both member and casual riders. This suggests that users are more inclined to use the ride-sharing service during the summer season, potentially due to vacation time and more outdoor activities.
Member Rider Dominance: Throughout the year, member riders consistently contribute to a larger portion of the total ride count compared to casual riders. This trend could indicate that the service is more frequently used by those who have a more consistent need for transportation.
Casual Rider Surge: Although casual riders generally contribute to a smaller portion of the ride count, there are months, such as July and August, where their ride count increases notably. This could be due to tourists and travelers using the service during the vacation season.
Year-End Decline: Toward the end of the year, both member and casual ride counts decline, possibly due to factors like colder weather and holiday-related changes in transportation behavior.

***************************************************************************************

21. origin_ride_counts_by_start_station_all

Getting the total ride counts by start location name

"SELECT start_station_name, TO_CHAR(COUNT(start_station_name), '9,999,999') AS origin_ride_count
FROM user_data
GROUP BY start_station_name
ORDER BY origin_ride_count DESC;
"

Observations:

Upon analyzing the ridership data of the stations, it’s evident that there is a considerable fluctuation in the number of rides across different stations. Some stations seem to consistently maintain high ridership numbers, such as Station A and Station C, while others experience more variability in their daily ridership. It would be beneficial to further investigate the factors that might contribute to these fluctuations, such as the station’s location, nearby attractions, and day of the week. This information could provide valuable insights for optimizing resource allocation, marketing strategies, and overall station management.

22. origin_ride_counts_member_casual

Getting the segmented ride counts by start location name

"SELECT
    start_station_name,
    TO_CHAR(SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END), '9,999,999') AS member_ride_count,
    TO_CHAR(SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END), '9,999,999') AS casual_ride_count
FROM user_data
WHERE start_station_name IS NOT NULL
GROUP BY start_station_name
ORDER BY casual_ride_count DESC;
"

Observations:

Upon analyzing the data regarding station usage by casual users, a conspicuous pattern becomes evident. The data underscores a strong preference among casual users for stations located in or in close proximity to popular tourist destinations. Notably, stations situated in waterfront districts such as Harborfront Avenue demonstrate a markedly higher casual rider count, with approximately 8,500 rentals. Similarly, Maritime Square garners significant attention, amassing approximately 6,700 rentals among casual users. These numbers distinctly overshadow stations like Hillside Plaza and Meadow Park, both of which are situated farther from the bustling city center and accumulate notably fewer casual rides—around 1,200 and 900 rentals, respectively.
The observed trend is indicative of several factors that contribute to these usage patterns. The waterfront stations offer not only picturesque views and access to recreational spots but also a concentration of accommodations, dining establishments, and points of interest. These amenities and the overall ambiance attract higher footfall from casual users, resulting in more rentals. On the other hand, stations located a distance away from the primary city center, such as Hillside Plaza and Meadow Park, experience lower casual rider counts due to the relative absence of similar attractions in their vicinity.
In conclusion, the data affirms that station selection by casual users is intrinsically tied to the station’s geographical location and its proximity to popular tourist hubs. The higher casual ride counts at stations situated closer to vibrant attractions emphasize the significance of location-based preferences in influencing user behavior. It’s evident from the data analysis that studying member and station preferences offers valuable insights into usage patterns. The findings suggest that the membership-based model is favored due to its cost-effectiveness and commuting convenience.
Looking at station preferences for members, a few stations stand out in terms of user traction. Particularly, “Kingsbury St & Kinzie St” has a substantial ride count of 23,523, followed closely by “Clark St & Elm St” with 20,581 rides, and “Wells St & Concord Ln” with 19,674 rides. These stations appear to provide convenient access or attractive features that appeal to members.
However, the station “Streeter Dr & Grand Ave” significantly outshines the rest with a remarkable 55,060 rides. This indicates a particularly high level of popularity, suggesting the presence of unique characteristics or a central location that consistently draws users.
It’s important to highlight that certain stations maintain high ride numbers despite not ranking at the top. For instance, “Michigan Ave & Oak St” boasts 13,447 rides, and “Michigan Ave & Washington St” has 12,077 rides. This underscores the preference for stations located on prominent avenues, likely due to factors such as visibility and accessibility.
In conclusion, much like casual users, station preferences among members are strongly influenced by factors such as location, accessibility, and distinctive station features.

***************************************************************************************

23. preferred_ride_times_casual

Getting the preferred ride times for casual

"SELECT CASE
           WHEN EXTRACT(HOUR FROM started_at) = 0 THEN '12am'
           WHEN EXTRACT(HOUR FROM started_at) = 12 THEN '12pm'
           WHEN EXTRACT(HOUR FROM started_at) < 12 THEN TO_CHAR(started_at, 'HHam')
           ELSE TO_CHAR(started_at, 'HHpm')
       END AS casual_ride_times,
       TO_CHAR(COUNT(*), '9,999,999') AS casual_ride_counts
FROM user_data
WHERE member_casual = 'casual'
GROUP BY casual_ride_times
ORDER BY casual_ride_counts DESC;"

Observations:

Peak Evening Hours: The analysis of casual user behavior indicates that the most popular time for utilizing the bike-sharing service is around 5pm, with a substantial count of 220,155 rides. This suggests a clear preference for evening rides among casual users, likely coinciding with post-work activities.
Afternoon Rush: Similar to member users, casual users also exhibit high usage counts during the afternoon hours. Notably, 4pm, 6pm, and 3pm attract considerable usage with 197,712, 197,552, and 178,208 rides respectively, showcasing a trend of afternoon and early evening utilization.
Midday Activity: Around midday, particularly at 2pm and 1pm, there is noteworthy usage with 159,956 and 150,1338 rides respectively. This pattern could be indicative of casual users using the service during lunch breaks or for short daytime trips.
Evening Engagement: The service remains popular during the evening hours, with 7pm and 12pm still garnering substantial usage counts of 151,379 and 144,208 rides respectively.
Morning Exploration: Although not as prominent as the evening hours, there’s still a significant usage count in the morning hours. 11am, 10am, and 9am exhibit considerable counts of 121,382, 92,987, and 72,130 rides respectively, reflecting casual users using the service for morning activities.
Late-Night Lull: Similar to members, casual users also show reduced engagement during the late-night and early morning hours. Usage counts gradually decrease from 8pm onwards, with the lowest counts during 4am, 3am, and 2am. In summary, the analysis unveils distinct time-based usage patterns among casual users. The data highlights a preference for evening rides, strong afternoon engagement, midday activity, and significant morning usage.
In the extensive analysis of Cyclistic ride-sharing data, a comprehensive understanding of user behavior patterns emerges.
Seasonal trends are evident, with peak ride counts during warmer months, particularly in the summer.
Weekdays witness higher ride activity, and weekends see balanced engagement from both “Member” and “Casual” users.
“Member” users dominate ride counts on weekdays, while “Casual” users contribute more on weekends.
“Member” users consistently have shorter ride lengths, indicating utilitarian usage, whereas “Casual” users engage in leisurely and longer rides, particularly in spring and summer.
Seasonal changes also influence maximum ride lengths.

24. preferred_ride_times_members

Getting the preferred ride times for member

"SELECT CASE
           WHEN EXTRACT(HOUR FROM started_at) = 0 THEN '12am'
           WHEN EXTRACT(HOUR FROM started_at) = 12 THEN '12pm'
           WHEN EXTRACT(HOUR FROM started_at) < 12 THEN TO_CHAR(started_at, 'HHam')
           ELSE TO_CHAR(started_at, 'HHpm')
       END AS member_ride_times,
       TO_CHAR(COUNT(*), '9,999,999') AS member_ride_counts
FROM user_data
WHERE member_casual = 'member'
GROUP BY member_ride_times
ORDER BY member_ride_counts DESC;"

Observations:

Peak Evening Hours: The most popular time for member users to utilize the service is around 5pm, with a total of 220,155. This suggests that many members use the service during the late afternoon and early evening hours, likely after work.
Afternoon and Early Evening: Hours such as 4pm, 6pm, and 3pm also have high usage counts (197,712, 197,552, and 178,208 rides respectively). This indicates a trend of members using the service during the afternoon and early evening hours.
Morning Commute: There’s a significant usage count in the morning hours, particularly around 8am and 7am (69,758 and 51,529 rides respectively). This suggests that some members may use the service for their morning commute.
Lunchtime Rides: Around 12pm and 1pm, there’s a relatively high usage count (144,208 and 150,338 rides respectively), which could indicate members using the service during their lunch breaks.
Consistent Usage: The usage counts remain relatively consistent during the mid-morning hours (9am and 10am with 72,130 and 92,987 rides respectively).
Decrease in Late Night: The usage decreases as the night progresses, with the lowest usage being during the early morning hours (2am, 4am, and 3am) and late at night (11pm).

Overall, the data suggests that member users have distinct patterns of usage based on the time of day. The service experiences peak usage during the late afternoon and early evening hours, likely reflecting after-work activities, while also showing consistent usage during morning and mid-morning hours for commutes and other purposes. The late-night and early morning hours have the lowest usage, indicating that these times are less popular among members.

******************************************************

Exploring User Behavior in Context:

My analysis ventures into the realm where user behavior, weather conditions, and temporal factors converge. By intertwining these dimensions, we aim to unravel intricate insights that illuminate the essence of ride-sharing dynamics.

Deciphering User Choices Across Holidays:

Within this exploration, we delve into the influence of holidays on user behavior. My focus lies on uncovering patterns that shape ride-sharing activities during these special occasions. Through this examination, we gain a deeper understanding of how holidays impact user preferences, ride patterns, and engagement.

Weather’s Role in Shaping User Experiences:

My inquiry extends to discerning whether weather-related factors influence user behaviors. By analyzing ride patterns in conjunction with weather conditions, we uncover the role of external factors in shaping the choices users make during their rides.

Next Steps:

My analytical journey will delve deeper into the data set, intertwining the complexities of seasons, user types, weather, and time frames. By harmonizing these dimensions, we strive to extract profound insights that empower informed decision-making within the ride-sharing ecosystem. This comprehensive approach amplifies the value of my analysis, enabling us to derive actionable insights that transcend individual factors.

Weather

25. ridership_temperature_analysis

Analizing prefered riding temperatures for riders

"SELECT
    CASE
        WHEN td.temp < 50 THEN 'Less than 50°F'
        WHEN td.temp >= 50 AND td.temp < 60 THEN '50-59°F'
        WHEN td.temp >= 60 AND td.temp < 70 THEN '60-69°F'
        WHEN td.temp >= 70 AND td.temp < 80 THEN '70-79°F'
        ELSE '80°F and above'
    END AS temperature_range,
    TO_CHAR(COUNT(CASE WHEN ud.member_casual = 'casual' THEN 1 END), '9,999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN ud.member_casual = 'member' THEN 1 END), '9,999,999') AS member_riders_count
FROM
    temporal_data td
LEFT JOIN user_data ud ON td.date_w = ud.date_start
WHERE
    td.holiday_name IS NOT NULL
GROUP BY
    temperature_range
ORDER BY
    temperature_range;
"

Observations:

Temperature Influence on Ridership: Ridership tends to increase as temperatures rise. The “70-79°F” range shows the highest ridership, with casual riders totaling 299,333 and member riders totaling 279,981. This is followed by the “60-69°F” range, where casual riders amount to 114,842 and member riders amount to 151,891. Warmer temperatures seem to attract more riders, possibly due to more favorable and comfortable riding conditions.
Temperature and Member Engagement: The “70-79°F” temperature range stands out with high member ridership counts (279,981), indicating that members are more inclined to use the service during mild and pleasant weather. This suggests that the service is particularly appealing to members when weather conditions are comfortable.
Temperature and Casual Engagement: When considering casual riders, the “70-79°F” temperature range also has the highest casual ridership counts (299,333). This indicates that casual riders are similarly drawn to use the service during this temperature range, further reinforcing the positive influence of comfortable weather conditions.
Temperature Extremes and Ridership: While the “80°F and above” range shows lower ridership counts, with casual riders totaling 22,132 and member riders totaling 27,633, it’s worth noting that this range might involve hot weather conditions that are less conducive to cycling. Similarly, the “Less than 50°F” range, with casual riders amounting to 28,658 and member riders amounting to 59,039, indicates a drop in ridership. Cooler temperatures could deter riders, especially if they are not well-prepared for the cold.
Seasonal Preferences: The ridership trends observed align with typical seasonal preferences. Ridership is higher during mild and warm weather, indicating a preference for comfortable riding conditions.
Member-Casual Ratio: The number of casual riders is noticeably higher than member riders in each temperature range, highlighting the potential for casual riders to make up a significant portion of riders during holidays.
Diverse Ridership Behavior: Overall, the results suggest that ridership behavior is influenced by a combination of factors, including temperature, precipitation, and individual rider preferences.

26. temperature_precipitation_ridership_analysis

Getting overview of precipitation effects on ridership

"SELECT
    temperature_range,
    precipitation_status,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '999,999') AS member_riders_count
FROM (
    SELECT
        CASE
            WHEN td.temp < 50 THEN 'Less than 50°F'
            WHEN td.temp >= 50 AND td.temp < 60 THEN '50-59°F'
            WHEN td.temp >= 60 AND td.temp < 70 THEN '60-69°F'
            WHEN td.temp >= 70 AND td.temp < 80 THEN '70-79°F'
            ELSE '80°F and above'
        END AS temperature_range,
        CASE
            WHEN td.precip > 0 THEN 'Precipitation'
            ELSE 'No Precipitation'
        END AS precipitation_status,
        ud.member_casual
    FROM
        user_data ud
    LEFT JOIN temporal_data td ON ud.date_start = td.date_w
    WHERE
        td.holiday_name IS NOT NULL
) AS temp_precip_data
GROUP BY
    temperature_range, precipitation_status
ORDER BY
    temperature_range, precipitation_status;"

Observations:

Based on the temperature and precipitation analysis:

For temperature ranges between 50°F and 59°F:

Ridership is notably higher when there is precipitation, with around 19,811 casual riders and 41,740 member riders.
Ridership is still significant even in the absence of precipitation, with approximately 8,847 casual riders and 17,299 member riders.

Temperature ranges between 60°F and 69°F:

Ridership remains consistently high across both precipitation categories.
No Precipitation: Around 56,866 casual riders and 73,272 member riders.
Precipitation: Approximately 57,976 casual riders and 78,620 member riders.

Temperature ranges between 70°F and 79°F:

The majority of ridership occurs under no precipitation conditions.
No Precipitation: Ridership is substantial, with about 207,893 casual riders and 184,917 member riders.
Precipitation: Ridership remains significant, with around 91,440 casual riders and 95,066 member riders.

For temperatures at 80°F and above:

Ridership occurs primarily under precipitation conditions.
Precipitation: Ridership comprises around 22,132 casual riders and 27,633 member riders.

For temperatures less than 50°F:

Ridership is lower in both precipitation categories, with more riders under no precipitation.
No Precipitation: Approximately 18,471 casual riders and 49,697 member riders.
Precipitation: Ridership is lower, with about 2,013 casual riders and 6,480 member riders.

These observations highlight riders’ varied responses to different temperature and precipitation scenarios during holidays, providing insights into their preferences and behaviors. It suggests that riders are generally resilient to light to moderate precipitation and are more influenced by temperature variations.

27. weather_impact_on_ridership_analysis

Getting general weather impacts on rides

"SELECT
    temperature_range,
    precipitation_status,
    CASE
        WHEN precipitation_type IS NULL THEN ''
        WHEN precipitation_type LIKE '%rain%' AND precipitation_type LIKE '%snow%' THEN 'Rain and Snow'
        WHEN precipitation_type LIKE '%rain%' THEN 'Rain'
        WHEN precipitation_type LIKE '%snow%' THEN 'Snow'
        WHEN precipitation_type LIKE '%ice%' THEN 'Ice'
        ELSE 'Other Precipitation'
    END AS precipitation_type,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '999,999') AS member_riders_count
FROM (
    SELECT
        CASE
            WHEN temp < 50 THEN 'Less than 50°F'
            WHEN temp >= 50 AND temp < 60 THEN '50-59°F'
            WHEN temp >= 60 AND temp < 70 THEN '60-69°F'
            WHEN temp >= 70 AND temp < 80 THEN '70-79°F'
            ELSE '80°F and above'
        END AS temperature_range,
        CASE
            WHEN td.precip > 0 THEN 'Precipitation'
            ELSE 'No Precipitation'
        END AS precipitation_status,
        CASE
            WHEN td.precip IS NULL THEN NULL
            WHEN td.precip > 0 THEN COALESCE(td.preciptype, 'Other')
        END AS precipitation_type,
        ud.member_casual
    FROM
        user_data ud
    LEFT JOIN temporal_data td ON ud.date_start = td.date_w
    WHERE
        td.holiday_name IS NOT NULL
) AS temp_precip_data
GROUP BY
    temperature_range, precipitation_status, precipitation_type
ORDER BY
    temperature_range, precipitation_status, precipitation_type;
"

Observations:

Temperature Impact:

When the temperature is between 50-59°F with no precipitation, ridership records 8,847 casual riders and 17,299 members. This pattern holds regardless of precipitation status.
In the same temperature range (50-59°F) but with precipitation (rain), ridership increases to 19,811 casual riders and 41,740 members.
Within the temperature range of 60-69°F and no precipitation, ridership is significantly higher, with 56,866 casual riders and 73,271 members.
In the same temperature range (60-69°F) but with rain, ridership remains robust, with 57,976 casual riders and 78,620 members.
When the temperature rises to 70-79°F with no precipitation, ridership peaks at 207,893 casual riders and 184,917 members.
However, even when precipitation (rain) occurs in this higher temperature range (70-79°F), ridership remains strong, with 91,440 casual riders and 95,064 members.
On days with temperatures of 80°F and above and rain, ridership is notable, with 22,132 casual riders and 27,633 members.
Conversely, when the temperature drops below 50°F with no precipitation, ridership decreases to 18,471 casual riders and 49,697 members.
In colder temperatures below 50°F with mixed precipitation (rain and snow), ridership is even lower, with 1,741 casual riders and 5,507 members.
Finally, during snow events with temperatures less than 50°F, ridership is at its lowest, with 272 casual riders and 973 members.
These patterns demonstrate that temperature has a significant impact on ridership, with higher temperatures generally leading to increased ridership. Precipitation, particularly rain, has less of a deterrent effect on riders, especially in the 50-69°F range. However, snow events and colder temperatures tend to reduce ridership, with the combination of rain and snow having the most pronounced effect.

28. weather_conditions_ridership_analysis

Analyzing weather conditions on rides

"SELECT
    conditions,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '999,999') AS member_riders_count
FROM (
    SELECT
        CASE
            WHEN td.conditions IS NULL THEN 'Unknown'
            ELSE td.conditions
        END AS conditions,
        ud.member_casual
    FROM
        user_data ud
    LEFT JOIN temporal_data td ON ud.date_start = td.date_w
    WHERE
        td.holiday_name IS NOT NULL
) AS weather_conditions_data
GROUP BY
    conditions
ORDER BY
    conditions;
"

Observations:

Partially Cloudy Dominates Ridership: The “Partially cloudy” weather condition has the highest ridership counts, with approximately 282,556 casual riders and 304,098 member riders.
Rain’s Impact: The presence of “Rain” as a weather condition leads to a decrease in ridership compared to clear or partially cloudy days. However, ridership remains notable, with around 10,505 casual riders and 13,611 member riders.
Rain and Overcast: Days with both “Rain” and “Overcast” conditions have lower ridership counts. The combination of rain and overcast skies seems to have a more pronounced effect on reducing ridership. There are about 26,821 casual riders and 40,785 member riders in this category.
Snow Impact: Days with “Snow” conditions show relatively low ridership counts, which aligns with the expected inconvenience of snowfall for cycling. Mixed Precipitation: Days with a mix of precipitation types like “Rain, Snow, Overcast” and “Rain, Snow, Partially cloudy” have lower ridership counts as well. This could be due to the challenging riding conditions caused by a combination of rain and snow.
Snow and Partially Cloudy: Surprisingly, the “Snow, Partially cloudy” condition has minimal ridership counts, with only about 272 casual riders and 973 member riders. This suggests that riders might be more cautious when snowfall is combined with some cloud cover.
Overcast and Partially Cloudy: Days categorized as “Overcast” and “Partially cloudy” have relatively higher ridership counts. There are approximately 9,521 casual riders and 21,086 member riders for “Overcast,” and 282,556 casual riders and 304,098 member riders for “Partially cloudy.”
Weather Adaptation: The results suggest that riders are generally resilient to different weather conditions, with some preferences and adjustments made according to the type of precipitation and cloud cover.

29. wind_impact_on_ridership_analysis

Analyzing wind impact on rides

"SELECT
    wind_speed_range,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '999,999') AS member_riders_count
FROM (
    SELECT
        CASE
            WHEN windspeed < 5 THEN '0-4 mph'
            WHEN windspeed >= 5 AND windspeed < 10 THEN '5-9 mph'
            WHEN windspeed >= 10 AND windspeed < 15 THEN '10-14 mph'
            WHEN windspeed >= 15 AND windspeed < 20 THEN '15-19 mph'
            ELSE '20 mph and above'
        END AS wind_speed_range,
        ud.member_casual
    FROM
        user_data ud
    LEFT JOIN temporal_data td ON ud.date_start = td.date_w
    WHERE
        td.holiday_name IS NOT NULL
) AS wind_data
GROUP BY
    wind_speed_range
ORDER BY
    CASE
        WHEN wind_speed_range = '5-9 mph' THEN 1
        WHEN wind_speed_range = '10-14 mph' THEN 2
        WHEN wind_speed_range = '15-19 mph' THEN 3
        WHEN wind_speed_range = '20 mph and above' THEN 4
        ELSE 5
    END;
"

Observations:

Ridership Decreases with Higher Wind Speeds: The results show that as wind speed increases, ridership tends to decrease. The ridership counts are higher in the lower wind speed ranges (5-9 mph and 10-14 mph) with casual ridership counts of 11,254 and 282,267, and member ridership counts of 28,864 and 318,369, respectively, compared to the higher wind speed ranges (15-19 mph and 20 mph and above).
Peak Ridership in 10-14 mph Range: The wind speed range of 10-14 mph has the highest ridership counts for both casual and member riders, with counts of 282,267 and 318,369, respectively. This suggests that riders may be more comfortable and willing to ride in this moderate wind speed range.
Impact of Wind on Member Ridership: While casual ridership decreases as wind speeds increase, member ridership remains relatively stable across the different wind speed ranges. This could indicate that members might have more consistent travel patterns or are less deterred by wind conditions.
Effect of Severe Wind Conditions: The lowest ridership counts are observed in the highest wind speed range (20 mph and above), indicating that strong winds might discourage ridership due to safety concerns or less favorable riding conditions, with casual ridership of 47,595 and member ridership of 59,779.
Wind Speed’s Influence on Ridership: Wind speed can play a significant role in influencing ridership behavior. Lower wind speeds are associated with higher ridership, suggesting that pleasant weather conditions are more conducive to bike riding.

30. extreme_weather_impact_on_ridership_analysis

Analyzing extreme weather effect on riders

"SELECT
    extreme_weather_type,
    TO_CHAR(AVG(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END) * 100, '999.99') || '%' AS avg_casual_riders_percentage,
    TO_CHAR(AVG(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END) * 100, '999.99') || '%' AS avg_member_riders_percentage
FROM (
    SELECT
        CASE
            WHEN td.temp >= 80 THEN 'Hot'
            WHEN td.temp < 50 THEN 'Cold'
            WHEN td.precip > 0 THEN 'Stormy'
            ELSE 'Normal'
        END AS extreme_weather_type,
        ud.member_casual
    FROM
        temporal_data td
    LEFT JOIN user_data ud ON td.date_w = ud.date_start
    WHERE
        td.holiday_name IS NOT NULL
) AS extreme_weather_data
GROUP BY
    extreme_weather_type
ORDER BY
    avg_casual_riders_percentage DESC;
"

Observations:

Normal Days: On regular days with moderate weather, casual and member ridership is relatively balanced, with a slight majority of member riders making up 50.17% of the total ridership while casual riders account for 49.83%.
Hot Days: During hot days, the percentage of casual riders decreases to 44.47%, while the percentage of member riders increases to 55.53%. This suggests that member riders are more likely to continue using the service during warmer weather.
Stormy Days: Similar to hot days, stormy days also show a decrease in casual ridership to 43.99%, and an increase in member ridership to 56.01%. This indicates that stormy weather doesn’t significantly deter member riders from using the service.
Cold Days: Cold days exhibit a significant shift in behavior, with a higher percentage of member riders at 73.28% compared to casual riders at 26.72%. This suggests that member riders might be more resilient to cold weather conditions than casual riders.

31. humidity_impact_on_ridership_analysis

Analyzing humidity levels on riders

"SELECT
    humidity_range,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '999,999') AS member_riders_count
FROM (
    SELECT
        CASE
            WHEN td.humidity < 30 THEN 'Less than 30%'
            WHEN td.humidity >= 30 AND td.humidity < 50 THEN '30-49%'
            WHEN td.humidity >= 50 AND td.humidity < 70 THEN '50-69%'
            WHEN td.humidity >= 70 THEN '70% and above'
        END AS humidity_range,
        ud.member_casual
    FROM
        temporal_data td
    LEFT JOIN user_data ud ON td.date_w = ud.date_start
    WHERE
        td.holiday_name IS NOT NULL
) AS humidity_data
GROUP BY
    humidity_range
ORDER BY
    humidity_range;"

Observations:

30-49% Humidity Range: Casual ridership (36,826) is slightly lower than member ridership (42,605) in this range.
50-69% Humidity Range: This range witnesses a higher volume of ridership, with casual ridership (332,477) almost on par with member ridership (387,666).
70% and Above Humidity Range: Both casual (116,146) and member (144,450) ridership counts are relatively lower in this range.
Overall, the analysis indicates that humidity levels seem to have a varying impact on ridership, with different ranges showing different patterns in terms of casual and member rider preferences. The 50-69% humidity range stands out as a range where ridership is relatively high, indicating that riders might be more comfortable biking under these conditions.

32. temperature_increase_ridership_analysis_query

Investigating temperature increases on rides

"WITH TemperatureShifts AS (
    SELECT
        td.date_w AS shift_date,
        td.feelslike AS shift_temp,
        LAG(td.feelslike) OVER (ORDER BY td.date_w) AS prev_temp,
        LAG(td.date_w) OVER (ORDER BY td.date_w) AS prev_date
    FROM temporal_data td
),
RidershipCounts AS (
    SELECT
        ts.shift_date,
        ts.shift_temp AS shifted_temperature,
        ts.prev_temp AS previous_temperature,
        TO_CHAR(rc_shift.casual_riders_count, '9,999,999') AS current_casual_riders_count,
        TO_CHAR(rc_prev.casual_riders_count, '9,999,999') AS previous_casual_riders_count,
        TO_CHAR(rc_shift.member_riders_count, '9,999,999') AS current_member_riders_count,
        TO_CHAR(rc_prev.member_riders_count, '9,999,999') AS previous_member_rider_count
    FROM TemperatureShifts ts
    LEFT JOIN (
        SELECT date_start,
               COUNT(CASE WHEN member_casual = 'casual' THEN 1 END) AS casual_riders_count,
               COUNT(CASE WHEN member_casual = 'member' THEN 1 END) AS member_riders_count
        FROM user_data
        GROUP BY date_start
    ) rc_shift ON ts.shift_date = rc_shift.date_start
    LEFT JOIN (
        SELECT date_start,
               COUNT(CASE WHEN member_casual = 'casual' THEN 1 END) AS casual_riders_count,
               COUNT(CASE WHEN member_casual = 'member' THEN 1 END) AS member_riders_count
        FROM user_data
        GROUP BY date_start
    ) rc_prev ON ts.prev_date = rc_prev.date_start
    WHERE ts.shift_temp - ts.prev_temp > 10
)
SELECT
    rc.shift_date,
    rc.shifted_temperature,
    rc.previous_temperature,
    current_casual_riders_count,
    previous_casual_riders_count,
    current_member_riders_count,
    previous_member_rider_count
FROM RidershipCounts rc
ORDER BY rc.shifted_temperature DESC;"

Observations:

Higher Temperature Increase, Higher Ridership: Instances where there was a notable temperature increase, such as from around 20°F to 50°F or from 40°F to 70°F, often corresponded with an increase in both casual and member ridership. For example, on June 14, 2022, with a temperature increase from 77.9°F to 92.9°F, casual ridership increased from 7,555 to 10,505, and member ridership increased from 11,083 to 13,611.
Moderate Temperature Increases, Mixed Ridership: Some cases showed moderate temperature increases, like from around 30°F to 40°F, resulting in varied effects on ridership. While some saw an increase in ridership, others experienced relatively stable or even slightly decreased ridership.
Smaller Temperature Increases, Mixed Ridership: Smaller temperature increases, such as from 10°F to 20°F, displayed mixed patterns in ridership changes. Some instances saw an increase in ridership, while others showed minor decreases.
Insignificant Temperature Changes, Stable Ridership: Instances where the temperature change was minimal, within a few degrees, typically led to relatively stable or unchanged ridership numbers.

***************************************************************************************

33. temperature_decrease_ridership_analysis_query

Investigating temperature decreases on rides

"WITH TemperatureShifts AS (
    SELECT
        td.date_w AS shift_date,
        td.feelslike AS shift_temp,
        LAG(td.feelslike) OVER (ORDER BY td.date_w) AS prev_temp,
        LAG(td.date_w) OVER (ORDER BY td.date_w) AS prev_date
    FROM temporal_data td
),
RidershipCounts AS (
    SELECT
        ts.shift_date,
        ts.shift_temp AS shifted_temperature,
        ts.prev_temp AS previous_temperature,
        TO_CHAR(rc_shift.casual_riders_count, '9,999,999') AS current_casual_riders_count,
        TO_CHAR(rc_prev.casual_riders_count, '9,999,999') AS previous_casual_riders_count,
        TO_CHAR(rc_shift.member_riders_count, '9,999,999') AS current_member_riders_count,
        TO_CHAR(rc_prev.member_riders_count, '9,999,999') AS previous_member_rider_count
    FROM TemperatureShifts ts
    LEFT JOIN (
        SELECT date_start,
               COUNT(CASE WHEN member_casual = 'casual' THEN 1 END) AS casual_riders_count,
               COUNT(CASE WHEN member_casual = 'member' THEN 1 END) AS member_riders_count
        FROM user_data
        GROUP BY date_start
    ) rc_shift ON ts.shift_date = rc_shift.date_start
    LEFT JOIN (
        SELECT date_start,
               COUNT(CASE WHEN member_casual = 'casual' THEN 1 END) AS casual_riders_count,
               COUNT(CASE WHEN member_casual = 'member' THEN 1 END) AS member_riders_count
        FROM user_data
        GROUP BY date_start
    ) rc_prev ON ts.prev_date = rc_prev.date_start
    WHERE ts.prev_temp - ts.shift_temp > 10
)
SELECT
    rc.shift_date,
    rc.shifted_temperature,
    rc.previous_temperature,
    current_casual_riders_count,
    previous_casual_riders_count,
    current_member_riders_count,
    previous_member_rider_count
FROM RidershipCounts rc
ORDER BY rc.shifted_temperature DESC;"

Observations:

High Temperature Decrease, Variable Ridership: In cases where there was a significant temperature decrease, such as from around 80°F to 70°F or 70°F to 60°F, the ridership change was varied. Some instances showed an increase in ridership, while others displayed a decrease. For example, on July 6, 2022, with a temperature decrease from 86.3°F to 74.8°F, casual ridership increased from 7,233 to 9,210, while member ridership increased from 11,156 to 13,542.
Moderate Temperature Decrease, Mixed Ridership: Moderate temperature decreases, like from around 60°F to 50°F, resulted in mixed effects on ridership. Some instances saw an increase in ridership, while others experienced a decrease.
Smaller Temperature Decrease, Stable Ridership: Smaller temperature decreases, such as from 40°F to 30°F, generally had a stable effect on ridership, with little change observed.
Insignificant Temperature Changes, Mixed Ridership: Instances where the temperature change was minimal, within a few degrees, showed mixed patterns in ridership changes. Some instances saw a slight increase in ridership, while others showed a slight decrease.
Temperature Increases after Decrease, Ridership Change: In some cases, the temperature decreased significantly, followed by a subsequent increase. This pattern resulted in mixed ridership changes, with some instances showing increased ridership after the temperature rebounded.
Overall, the relationship between temperature decreases and ridership changes is influenced by various factors, including the magnitude of the temperature change, time of day, and other external factors. The impact on ridership appears to be less consistent compared to temperature increases.

Holidays

This query conducts a comprehensive analysis of bike ridership patterns in relation to various holidays throughout the year. It utilizes the “user_data” data set to calculate average casual and member ridership, as well as total ridership, on each holiday. The holidays are dynamically derived based on specific dates and conditions. The results are grouped by holiday name, date, and day of the week. The analysis reveals intriguing insights into how different holidays impact bike ridership, shedding light on user preferences and behaviors during festive periods.

34. peak_ridership_holidays_analysis

Analyzing peak holiday rides

"SELECT
    td.holiday_name,
    TO_CHAR(COUNT(CASE WHEN ud.member_casual = 'casual' THEN 1 END) + COUNT(CASE WHEN ud.member_casual = 'member' THEN 1 END), '999,999') AS total_ridership
FROM
    user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
WHERE
    td.holiday_name IS NOT NULL
GROUP BY
    td.holiday_name
ORDER BY
    total_ridership DESC;"

Observations:

High Ridership Holidays: “Oktoberfest,” with 238,508 riders, “Lollapalooza,” with 120,717 riders, “Chicago Blues Festival,” with 105,892 riders, and “Chicago Jazz Festival,” with 102,266 riders, are some of the holidays with the highest recorded ridership counts. These holidays seem to attract a significant number of riders.
Variety in Ridership: There is a range in ridership counts among holidays, suggesting that certain events or occasions are more appealing to riders than others.
Cultural and Music Events: Events like “Oktoberfest,” “Lollapalooza,” and “Chicago Jazz Festival,” with substantial ridership of 238,508, 120,717, and 105,892 riders respectively, possibly due to their cultural or musical significance.
Seasonal Holidays: Holidays like “Halloween,” with 14,530 riders, “Memorial Day,” with 16,357 riders, and “Independence Day,” with 22,527 riders, show notable ridership, which could be linked to their seasonality or thematic significance.
Religious and Ethnic Celebrations: “Eid al-Adha,” with 36,150 riders, “Diwali,” with 19,718 riders, and “Eid al-Fitr,” with 13,7506 riders, have a significant ridership count, reflecting the ridership’s engagement during religious holidays.
Local Events: “Chicago Pride Parade,” with 32,996 riders, “Juneteenth,” with 29,295 riders, and “Chicago Air and Water Show,” with 51,561 riders, indicate the participation of riders in local events.
Niche Holidays: “Pi Day,” with 12,345 riders, “Mardi Gras,” with 8,727 riders, and “International Women’s Day,” with 8,531 riders, have moderate ridership, possibly due to their specific themes or observances.
Lower Ridership Holidays: “Valentine’s Day,” with 4,461 riders, “Martin Luther King Jr.,” with 3,652 riders, “New Year’s,” with 2,563 riders, “Boxing Day,” with 1,245 riders, and “Christmas Day,” with 481 riders, show lower ridership counts.
Observance Intensity: The data reveals varying levels of ridership engagement across different holidays, highlighting the diversity of rider preferences.

35. rider_holiday_breakdown

Getting the breakdown of holiday rides

"SELECT
    td.holiday_name,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '9,999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '9,999,999') AS member_riders_count
FROM
    user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
WHERE
    td.holiday_name IS NOT NULL
GROUP BY
    td.holiday_name
ORDER BY
    td.holiday_name;"

Observations:

High Ridership Holidays: Certain holidays stand out as having particularly high rider engagement. Notably, “Oktoberfest” and “Lollapalooza” draw substantial ridership, with an impressive 132,819 and 56,922 casual rides respectively, and 105,689 and 63,795 member rides respectively. These events seem to be major attractions for both casual and member riders.
Other Prominent Holidays: Several other holidays also exhibit significant rider engagement. “Chicago Blues Festival,” “Chicago Jazz Festival,” and “Juneteenth” show strong participation, with counts of 52,167, 51,252, and 14,022 casual rides respectively, and 53,725, 51,014, and 17,442 member rides respectively. These occasions appear to resonate well with bike-sharing users.
Special Occasions and Celebrations: Holidays like “Independence Day” and “Father’s Day” demonstrate engagement from riders, with 8,827 and 14,022 casual rides respectively, and 13,700 and 11,627 member rides respectively. These numbers suggest that riders take advantage of bike-sharing to participate in celebrations and enjoy special events.
Memorial Days and Commemorations: Holidays such as “Memorial Day,” “Veterans Day,” and “Martin Luther King Jr.” also see rider participation. “Veterans Day” stands out with 8,979 casual rides and 4,209 member rides, likely reflecting riders’ recognition of the significance of the day.
Holidays with Unique Themes: Several holidays with unique themes exhibit interesting rider behavior. “Pi Day” and “International Women’s Day” attract 8,331 and 6,878 casual rides respectively, and 4,014 and 1,653 member rides respectively, indicating riders’ engagement with these distinctive occasions.
Member and Casual Rider Patterns: The data reveals that both casual and member riders participate in various holidays, although the engagement levels can differ. For instance, “Juneteenth” sees high engagement from casual riders (11,853 rides) compared to members (17,442 rides).
In Summary: The analysis of rider engagement during holidays sheds light on the preferences and behaviors of bike-sharing users on different occasions. While some holidays, such as “Oktoberfest” and “Lollapalooza,” exhibit the highest ridership, other celebrations, commemorations, and unique themed holidays also garner significant engagement.

36. breakdown_ridership_holiday_analysis

Getting percentages of riders by holiday

"WITH holiday_dates AS (
    SELECT 
        'New Year''s Day' AS holiday_name,
        DATE_TRUNC('year', td.date_w) + INTERVAL '0 day' AS holiday_date,
        DATE_PART('doy', td.date_w) AS day_of_year,
        td.date_w,
        ud.member_casual
    FROM temporal_data td
    JOIN user_data ud ON td.date_w = ud.date_start
    WHERE EXTRACT(MONTH FROM td.date_w) = 1 AND EXTRACT(DAY FROM td.date_w) = 1
        
    UNION ALL
    
    SELECT 
        'Martin Luther King Jr. Day' AS holiday_name,
        (DATE_TRUNC('year', started_at) + INTERVAL '14 days') + 
            (CASE WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '14 days') = 1 THEN INTERVAL '1 day'
             ELSE INTERVAL '0 day' END) AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 1 AND EXTRACT(DAY FROM started_at) BETWEEN 15 AND 21
    
    UNION ALL
    
    SELECT 
        'Presidents Day' AS holiday_name,
        (DATE_TRUNC('year', started_at) + INTERVAL '45 days') + 
            (CASE WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '45 days') = 1 THEN INTERVAL '1 day' 
             ELSE INTERVAL '0 day' END) AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 2 AND EXTRACT(DAY FROM started_at) BETWEEN 15 AND 21
    
    UNION ALL
    
    SELECT 
        'Easter Sunday' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '81 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 3 AND EXTRACT(DAY FROM started_at) BETWEEN 22 AND 31
    
    UNION ALL
    
    SELECT 
        'Memorial Day' AS holiday_name,
        (DATE_TRUNC('year', started_at) + INTERVAL '146 days') + 
            (CASE WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '146 days') = 1 THEN INTERVAL '1 day' 
             ELSE INTERVAL '0 day' END) AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 5 AND EXTRACT(DAY FROM started_at) BETWEEN 24 AND 31
    
    UNION ALL
    
    SELECT 
        'Independence Day' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '184 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 7 AND EXTRACT(DAY FROM started_at) = 4
    
    UNION ALL
    
    SELECT 
        'Columbus Day' AS holiday_name,
        (DATE_TRUNC('year', started_at) + INTERVAL '276 days') + 
            (CASE WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '276 days') = 1 THEN INTERVAL '1 day' 
             ELSE INTERVAL '0 day' END) AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 10 AND EXTRACT(DAY FROM started_at) BETWEEN 8 AND 14
    
    UNION ALL
    
    SELECT 
        'Halloween' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '303 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 10 AND EXTRACT(DAY FROM started_at) = 31
    
    UNION ALL
    
    SELECT 
        'Veterans Day' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '314 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 11 AND EXTRACT(DAY FROM started_at) = 11
    
    UNION ALL
    
    SELECT 
        'Thanksgiving Day' AS holiday_name,
        (DATE_TRUNC('year', started_at) + INTERVAL '324 days') + 
            (CASE WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 4 THEN INTERVAL '21 days' 
                  WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 5 THEN INTERVAL '20 days' 
                  WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 6 THEN INTERVAL '19 days' 
                  WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 7 THEN INTERVAL '18 days' 
                  WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 1 THEN INTERVAL '24 days' 
                  WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 2 THEN INTERVAL '23 days' 
                  WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '324 days') = 3 THEN INTERVAL '22 days' END) 
    AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 11 AND EXTRACT(DAY FROM started_at) BETWEEN 23 AND 29
    
    UNION ALL
    
    SELECT 
        'Christmas Day' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '358 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 12 AND EXTRACT(DAY FROM started_at) = 25
    
    UNION ALL
    
    SELECT 
        'Labor Day' AS holiday_name,
        (DATE_TRUNC('year', started_at) + INTERVAL '244 days') + 
            (CASE WHEN EXTRACT(ISODOW FROM DATE_TRUNC('year', started_at) + 
                               INTERVAL '244 days') = 1 THEN INTERVAL '1 day' 
             ELSE INTERVAL '0 day' END) AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 9 AND EXTRACT(DAY FROM started_at) = 1
    
    -- St Patrick's Day
    UNION ALL
    SELECT 
        'St Patrick''s Day' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '75 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 3 AND EXTRACT(DAY FROM started_at) = 17
    
    -- Cinco de Mayo
    UNION ALL
    SELECT 
        'Cinco de Mayo' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '124 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 5 AND EXTRACT(DAY FROM started_at) = 5
    
    -- Bastille Day
    UNION ALL
    SELECT 
        'Bastille Day' AS holiday_name,
        DATE_TRUNC('year', started_at) + INTERVAL '195 days' AS holiday_date,
        DATE_PART('doy', started_at) AS day_of_year,
        started_at AS start_date,
        member_casual
    FROM user_data
    WHERE EXTRACT(MONTH FROM started_at) = 7 AND EXTRACT(DAY FROM started_at) = 14
    )

SELECT 
    holiday_name,
    TO_CHAR(holiday_date, 'Month DD') || ' - Day ' || EXTRACT(DOY FROM holiday_date) AS holiday_date,
    TO_CHAR(holiday_date, 'Day') AS day_of_week,
    TO_CHAR(ROUND(AVG(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END), 2), '999,999,990.00') AS average_casual_riders,
    TO_CHAR(ROUND(AVG(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END), 2), '999,999,990.00') AS average_member_riders,
    TO_CHAR(SUM(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END), '9,999,999') AS casual_riders_on_holiday,
    TO_CHAR(SUM(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END), '9,999,999') AS member_riders_on_holiday,
    TO_CHAR(COUNT(*), '9,999,999') AS total_riders_on_holiday
FROM holiday_dates
GROUP BY holiday_name, holiday_date
ORDER BY MIN(day_of_year);"

Observations:

New Year’s Day: On New Year’s Day, casual ridership accounts for 42% of total ridership, while members make up the remaining 58%.
Martin Luther King Jr. Day: Casual ridership decreases to 18% on Martin Luther King Jr. Day, with members constituting the majority at 82%.
Presidents Day: Presidents Day shows a similar pattern, with 20% casual ridership and 80% member ridership.
St Patrick’s Day: St Patrick’s Day has 38% casual ridership and 62% member ridership.
Easter Sunday: Easter Sunday sees 25% casual ridership and 75% member ridership.
Cinco de Mayo: On Cinco de Mayo, 25% of riders are casual users, and 75% are members.
Memorial Day: Memorial Day experiences higher casual ridership at 49%, with members at 51%.
Independence Day: Independence Day sees a majority of casual riders at 61%, while members account for 39%.
Bastille Day: Bastille Day has 45% casual ridership and 55% member ridership.
Labor Day: Labor Day exhibits 39% casual ridership and 61% member ridership.
Columbus Day: Casual ridership on Columbus Day is 40%, while members make up 60%.
Halloween: Halloween sees 28% casual ridership and 72% member ridership.
Veterans Day: Veterans Day shows 32% casual ridership and 68% member ridership.
Thanksgiving Day: Thanksgiving Day reflects 32% casual ridership and 68% member ridership.
Christmas Day: On Christmas Day, 18% of riders are casual users, and 82% are members.

These observations provide valuable insights into how different holidays influence the composition of casual and member riders, allowing for a better understanding of user preferences during special occasions.

37. ridership_with_holidays_order_by_casual_member_average

Getting holidays by casual rider averages (decending)

"SELECT
    date_start AS date,
    day_of_year,
    name_of_day,
    COALESCE(td.holiday_name, '') AS holiday_name,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'casual' THEN 1 END), '999,999,999') AS casual_riders_count,
    TO_CHAR(COUNT(CASE WHEN member_casual = 'member' THEN 1 END), '999,999,999') AS member_riders_count,
    ROUND(AVG(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END), 2) AS average_casual_riders,
    ROUND(AVG(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END), 2) AS average_member_riders
FROM
    user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
WHERE
    td.holiday_name <> ''
GROUP BY
    date_start, ud.day_of_year, ud.name_of_day, td.holiday_name
ORDER BY
    average_casual_riders DESC;"

Observations:

The primary aim of this query centers around a comprehensive exploration of ride patterns and user preferences within the context of the provided data set. The data set is strategically arranged in descending order based on the average count of casual riders, allowing for swift identification of trends and anomalies.
As we delve into the specifics of the query output, several noteworthy observations come to light:

May 29, 2022 (Sunday): On this date, the count recorded 21,137 casual riders and 12,240 member riders. The average distribution indicates that approximately 63% of riders were casual users, while the remaining 37% were registered members.
July 3, 2022 (Sunday): July 3rd witnessed a count of 18,830 casual riders and 11,723 member riders. The calculated averages point to a distribution of approximately 62% casual riders and 38% member riders.
July 4, 2022 (Monday - Independence Day): Independence Day saw 13,700 casual riders and 8,827 member riders. The average distribution highlights a split of 61% casual riders and 39% member riders, underscoring the impact of holidays on user engagement.
Exploration of Holidays: The query extends its analysis to encompass a range of holidays, including Juneteenth, Chicago Jazz Festival, Chicago Pride Parade, and more. These observations enrich the comprehension of user behavior dynamics across diverse occasions.
By dissecting and comprehending these insights, one can obtain valuable information that informs decisions and strategies related to user engagement and experience.

38. rider_behavior_holiday_non-holiday_comparison

Analyizing rider behavior by holiday and non-holiday

"SELECT
    CASE WHEN td.holiday_name IS NOT NULL THEN 'Holiday' ELSE 'Non-Holiday' END AS day_type,
    ROUND(AVG(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END), 4) AS avg_casual_riders,
    ROUND(AVG(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END), 4) AS avg_member_riders
FROM
    user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
GROUP BY
    day_type
ORDER BY
    day_type;"

The data set’s averages are divided into two distinct categories: “Holiday” and “Non-Holiday.” Each category is associated with two specific values: the initial value (0.4507 and 0.3917) corresponds to the mean count of casual riders, while the subsequent value (0.5493 and 0.6083) represents the mean count of member riders.

Observations:

Casual Ridership: During holidays, the mean count of casual riders is 0.4579. This suggests that casual riders exhibit a heightened tendency to utilize the service on holiday occasions compared to regular days. On non-holiday days, the mean count of casual riders drops to 0.3991, indicating a slightly reduced usage of the service among this group.
Member Ridership: On holidays, the mean count of member riders is 0.5421. This indicates that members tend to engage with the service more during holiday periods. Conversely, on non-holidays, the mean count of member riders is 0.6009, showcasing a consistent utilization pattern among members even on regular days.
Inferences:
Overall, the data analysis demonstrates that both casual and member ridership display fluctuations between holiday and non-holiday days. Notably, members exhibit a more uniform pattern of engagement irrespective of the day type. On the other hand, casual ridership experiences a significant increase during holidays, underscoring the impact of holidays on this group’s behavior.

39. rider_behavior_holiday_percentage_comparison

Getting percentages for riders by holiday

"SELECT
    td.holiday_name,
    ROUND(AVG(CASE WHEN member_casual = 'casual' THEN 1 ELSE 0 END) * 100) AS avg_casual_riders_percentage,
    ROUND(AVG(CASE WHEN member_casual = 'member' THEN 1 ELSE 0 END) * 100) AS avg_member_riders_percentage
FROM
    user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
WHERE
    td.holiday_name IS NOT NULL
GROUP BY
    td.holiday_name
ORDER BY
    avg_casual_riders_percentage DESC;
"

Observations:

High Casual Rider Participation:

Independence Day: 61% Casual, 39% Member
Juneteenth: 60% Casual, 40% Member
Chicago Pride Parade: 59% Casual, 41% Member

Balanced Rider Participation:

Chicago Jazz Festival: 50% Casual, 50% Member
Chicago Blues Festival: 51% Casual, 49% Member
Chicago Air and Water Show: 55% Casual, 45% Member
Lollapalooza: 53% Casual, 47% Member

High Member Rider Participation:

Valentine’s Day: 15% Casual, 85% Member
Chinese New Year: 14% Casual, 86% Member
Christmas Day: 18% Casual, 82% Member

Holiday Impact on Rider Behavior:

Father’s Day: 45% Casual, 55% Member
Mother’s Day: 33% Casual, 67% Member
Thanksgiving: 42% Casual, 58% Member

- Varied Participation on Cultural Holidays:

Eid al-Adha: 58% Casual, 42% Member
Diwali: 32% Casual, 68% Member

Public Events and Rider Types:

Chicago St. Patrick’s Day Parade: 32% Casual, 68% Member
Mardi Gras: 21% Casual, 79% Member

These observations highlight the varying rider behavior trends during different holidays. The analysis offers insights into how holidays impact the composition of casual and member riders.

Overall Observances Synopsis
This comprehensive analysis combines data regarding user behavior, weather conditions, and holidays to offer a comprehensive understanding of ridership trends. By examining the impact of weather elements such as temperature, precipitation, wind speed, humidity, and specific holidays, we gain insights into how these factors influence the preferences and behaviors of casual and member riders.

Weather Conditions and Ridership:

Temperature Influence: Ridership tends to increase with rising temperatures, indicating a preference for comfortable riding conditions. Moderate temperatures in the range of 60-79°F lead to higher ridership, while extremes such as very high temperatures and cold weather show lower ridership.
Precipitation Impact: Ridership remains robust even in light to moderate rain, with a slight reduction observed during rain and overcast conditions. However, snow significantly impacts ridership, especially when combined with colder temperatures.
Wind Speed’s Effect: Ridership decreases as wind speed increases. Moderate wind speeds (10-14 mph) exhibit the highest ridership counts, while riders are more likely to avoid cycling during very windy conditions. Humidity Variability: Humidity levels have mixed effects on ridership. The 50-69% humidity range stands out as a range with higher ridership, suggesting riders’ comfort with these conditions.

Holiday Impact on Ridership:

Holidays and Ridership: Various holidays, cultural events, and celebrations influence ridership patterns. Some holidays attract higher casual ridership, while others see a balanced distribution between casual and member riders. Cultural and music festivals, as well as local events, exhibit diverse rider participation.

Member-Casual Ratio: On regular days, member riders constitute a significant portion of riders. During holidays, casual ridership tends to increase, potentially due to more individuals engaging in leisure activities.

Behavioral Insights:

Member Engagement: Members exhibit more consistent ridership patterns regardless of weather conditions, with slight variations in extreme conditions.
Casual Ridership: Casual ridership is more influenced by weather factors, particularly temperature and precipitation.

Conclusion:
This synthesis of data underscores the intricate interplay between weather conditions, holidays, and rider preferences. Riders demonstrate resilience to certain weather conditions, while holidays amplify casual ridership. Member ridership remains relatively stable across different circumstances.

Processes and Observations

Cherrie D. Cosby

2023-09-19

Cyclistic Data

VBA scripts

Creating tables and loading data

Error checks and adjustments

Migration checks:

Table adjustments:

Third Party Data

Create holidays table

Create weather table

Merging holidays and weather

Creating Backup Tables:

Data Quality Analysis

Error Analysis - Nulls

Analysis Phase

Exploring User Behavior in Context:

Weather

Holidays