Getting to Know Our Data:

[Preface: This data set is updated frequently, the version used in this analysis might not match currently available data depending on when this is viewed.]

“The customer is always right” is a phrase oft used to highlight the importance of customer satisfaction when doing business. Seeing as how the saying is not “the business is always right,” moments where a company, brand, or otherwise fails to deliver to customer expectations are bound to happen which result in consumer complaints against the offending party. Complaints come is all shapes and sizes from overdue repayment for services, qualms with the quality of goods/ services, to harassment of customers, breach of contract, or other serious offenses.

Any good business should ask themselves: ‘What can be gleaned from these complaints? Are there certain things that we can prevent from happening or reduce the damages to ensure best customer-business relations?’ That is what we are trying to find out with this analysis right now. Let’s see what we can take away from these complaints to better understand our consumer population.

Here’s a sample of the 24,295 rows of data with 16 columns:

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import matplotlib.ticker as ticker
import seaborn as sns
import matplotlib.cm as cm

filename = "Consumer_Services_Mediated_Complaints.csv"
path = "C:/Users/neddi/Desktop/DS 736 Data Visualization/Python Programming Directory/" 
df_copy = pd.read_csv(path + filename)

output_head = df_copy.head(11)

print(output_head.to_html())
Industry Complaint Type Mediation Start Date Mediation Close Date Complaint Result Satisfaction Restitution Business Building Business Street Building Address Unit Business City Business State Business Zip Complainant Zip Longitude Latitude
0 Home Improvement Contractor - 100 Breach of Contract - B03 8/27/2014 10/17/2014 No Business Response - NVR No 0 199 LINCOLN AVE NaN BRONX NY 10454 29455 -73.927766 40.810134
1 Garage - 049 Damaged Goods - D01 8/23/2016 10/13/2016 Resolved and Consumer Satisfied - SPF Yes 0 1365 YORK AVE NaN NEW YORK NY 10021 7470 -73.953430 40.767308
2 Tow Truck Company - 124 Misrepresentation - M01 4/14/2015 5/13/2015 Cash Amount - AMT Yes 68 2716 203RD ST NaN BAYSIDE NY 11360 11360 -73.849438 40.784499
3 Home Improvement Contractor - 100 Quality of Work - Q01 10/5/2018 10/23/2018 Goods Repaired - GRS Yes 0 13042 VAN WYCK EXPY NaN S OZONE PARK NY 11420 11369 -73.801724 40.670898
4 Home Improvement Contractor - 100 Quality of Work - Q01 7/20/2018 9/14/2018 Advised to Sue - ATS No 0 435 GOWER ST NaN STATEN ISLAND NY 10314 11782 NaN NaN
5 Home Improvement Contractor - 100 Misrepresentation - M01 2/23/2018 3/9/2018 Unable to Locate Business - ULV NaN 0 58 50TH ST 4 BROOKLYN NY 11232 NaN -73.967893 40.757895
6 Furniture Sales - 242 Wrong Goods - W01 7/27/2015 9/16/2015 Advised to Sue - ATS No 0 8928 165TH ST NaN JAMAICA NY 11432 11432 NaN NaN
7 Pawnbroker - 080 Misrepresentation - M01 3/27/2015 6/5/2015 Referred to Hearing - RTH No 0 3885 13TH ST NaN LONG IS CITY NY 11101 11106 NaN NaN
8 Tow Truck Company - 124 Surcharge/Overcharge - S02 8/29/2019 9/12/2019 Cash Amount - AMT Yes 109 11 BRICK CT NaN STATEN ISLAND NY 10309 10306 NaN NaN
9 Misc Non-Food Retail - 817 Exchange Goods/Contract Cancelled - E01 8/12/2022 9/9/2022 Cash Amount - AMT Yes 700 6217 6TH AVE NaN BROOKLYN NY 11220 11223 -74.015392 40.636767
10 Furniture Sales - 242 Defective Goods - D02 5/24/2016 7/25/2016 Advised to Sue - ATS No 0 125 SOUTH ST NaN PASSAIC NJ 7055 11373 -73.876560 40.736007

Source of Data from NYC Open Data: https://data.cityofnewyork.us/Business/Consumer-Services-Mediated-Complaints/nre2-6m2s/about_data.

For additional information about column contents: file:///C:/Users/neddi/Downloads/DCA_Consumer_Services_Mediated_Complaints_Data_Dictionary.pdf

Analysis Through Plotting:

Plot 1: Complaints Issued by Month (Top 15 Types and ‘Other’ Category):

There were 72 unique complaint types to sift through in this data set ranging from contract breach, issues with customer service, to theft; there’s a lot going on here. So who are our ‘big players’ in terms of the month by month breakdown of complaints collected? In order to get a good idea the Top 15 types (by volume) were isolated and then the remaining 57 were condensed into a category called ‘Other’ since many of them consisted of a single instance such as ‘Lemon’ or ‘False Advertising’.

First and foremost, it would appear that over the months the number of a given complaint does not vary all that much, on the contrary it would seem that the number of complaints received for a given type is fairly static for a given month. What is also readily apparent is that September and November have a noticeable (although not drastic) dip in the overall number of complaints accrued. Why might this be? Well our answer might lie in the other months. March through August (with the exception being May) have the highest number of complaints in aggregate people are going out more since the weather is nicer, summer is starting so people are having things done to their homes for renovations to better experience the warmth. Being out of the house would increase the likelihood that a person is interacting with a given business thus increasing the risk that they would have an unpleasant experience with a business.

September, October, November, and December are cooler (on average); school is picking back up so there is less downtime to go shopping. Looking again at the plot we see that the exceptions are October and December which both have rather large ‘shopping holidays’ those being Halloween and Christmas which would require additional interaction with businesses.

month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];

complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
       'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
                  'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
       'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']

df_copy['Mediation Start Date'] = pd.to_datetime(df_copy['Mediation Start Date'], format='%m/%d/%Y', errors='coerce')
stacked_df = df_copy[['Mediation Start Date', 'Complaint Type']];

stacked_df['MonthComplaint'] = stacked_df['Mediation Start Date'].dt.strftime('%b')

stacked_df['Complaint Type'] = stacked_df['Complaint Type'].apply(lambda x: x if x in complaint_keep else 'Other')

stacked_df = stacked_df.groupby(['MonthComplaint', 'Complaint Type']).size().reset_index(name = 'TotalComplaintsMade')

stacked_df['MonthComplaint'] = pd.Categorical(stacked_df['MonthComplaint'], categories=month_order, ordered=True)

stacked_df = stacked_df.sort_values('MonthComplaint')

stack_pivot = stacked_df.pivot(index = 'MonthComplaint', columns = 'Complaint Type', values = 'TotalComplaintsMade');

stacked_figure = plt.figure(figsize = (25,16))

axis = stacked_figure.add_subplot(1,1,1)

#Generating unique colors achieved through Google Gemini
num_colors = len(stack_pivot.columns)
colors = cm.tab20(np.linspace(0, 1, num_colors))

stack_pivot.plot(kind='bar', stacked=True, ax=axis, color=colors)

plt.ylabel('Total Complaints Issued', fontsize=18, labelpad=10)
plt.title('Total Complaints Issued by Month and Complaint Type\nStacked Bar Chart',fontsize=20)
plt.xticks(rotation=0, horizontalalignment='center', fontsize=15);
plt.yticks(rotation=0, fontsize=14);
axis.set_xlabel('Month', fontsize=18)

axis.legend(loc='center left', bbox_to_anchor=(1, 0.25), fontsize=10)

Plot 2: Line Plot of Complaints by Year and Month:

[Note: There was a period from 7/21/23 to 11/xx/23 where data was not being retained. As a result there is a period from the dates listed above that are not at all filled and then go back up as if nothing happened. This is not program error, rather it is due to no new complaints being tabulated during this window.]

Aside from the little data hiccup outlined above everything else is in order so let’s take a look. 2014 had the highest number of months where it ranked number 1 in complaints received with 7 out of the 12 months having 2014 for its maximum value. Another unique trait of 2014 is that unlike the other years on record, there was an increase in the number of complaints from January to February. Looking at the graph, the normal trend is for there to be a dip from January to February then to jump up in March.

Again we have further proof of the declines in the number of complaints received in September and November forming a ‘W’ shape between August to December. September and November form the ‘valleys’ and August, October, and December are the ‘peaks.’

The year 2015 features the most erratic pattern with its drastic spike from March to April peaking at ~350 complaints in April and June then plunging to around 250 in September.

4 of the years, those being 2014, 2015, 2022, and 2023 have their peaks in April which suggests that business interactions are at a high point which we know to be the case from the above plot which clearly shows that April features the largest overall number of complaints.

df_copy['Mediation Start Date'] = pd.to_datetime(df_copy['Mediation Start Date'], errors='coerce')
df_copy['Year'] = df_copy['Mediation Start Date'].dt.year
df_copy['Month'] = df_copy['Mediation Start Date'].dt.month

group_complaint = df_copy.groupby(['Year', 'Month']).size().reset_index(name='ComplaintCount')

change_group = group_complaint.pivot(index='Month', columns='Year', values='ComplaintCount')

plt.figure(figsize=(20, 16))
change_group.plot(marker='*', linestyle='-', ax=plt.gca()) 

plt.title('Complaints by Year and Month')
plt.xlabel('Month')
plt.ylabel('Complaint Count')
plt.xticks(range(1, 13),month_order); 
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

Plot 3: Aggregate Restitution by Complaint Type:

Complaints that make their way to the courts often involve some kind of restitution payment to the consumer as a means of mending the transgression that took place with a given business. It is a fair question to ask which of these complaint types yielded the most in restitution payments given the spread on this data set. Do they fall around the same amount or is one of them ahead by a sizable margin when compared to the others? For the sake of this exploration there were two (2) criteria that needed to be met in order to be included:

  1. The sum of the restitution payments MUST be greater than $0
  2. The number of instances for a given complaint type must be greater than 50 units.

Once the weeding-out process had been completed we were left with 23 complaint types to look at, a far cry from the 72 that we started with.

Out of the 23 types, we have 4 that have broken $1 million dollars in aggregate restitution payments those being:

  1. Billing Disputes - $1.73 million
  2. Misrepresentation - $1.51 million
  3. Breach of Contract - $1.15 million
  4. Damaged Goods - $1.01 million

Goods and Contract Cancellation is just shy with $980,170. Then there is a rather sharp dip between 5th and 6th place with a difference of $271,387. This trend continues to present itself as the differences continue to expand going all the way down.

Does the count match the amount paid in restitution? Not entirely… Shown below the plot is a table containing the counts of the complaint types where the discrepancies become apparent.

  1. Misrepresentation has the most number of complaints with a difference of ~700 between it and the 2nd highest competitor. Despite this sizable lead, it is the second highest for aggregate restitution amounts.

  2. Breach of Contract is the 9th largest group with 1,253 complaints and yet it is the 3rd hights in terms of aggregate restitution amounts.

As we move down the plot towards the mean, the order adheres to the order in which they appear with respect to the number of complaints in each category.

df_copy['Restitution'] = pd.to_numeric(df_copy['Restitution'], errors='coerce')
payment_df = df_copy.groupby(['Complaint Type']).agg({'Complaint Type': ['count'], 'Restitution': ['sum']}).reset_index()
payment_df.columns = ['ComplaintType', 'ComplaintCount', 'SumRestitutionPayments']

filtered_payment_df = payment_df[ (payment_df['ComplaintCount'] >= 50) & (payment_df['SumRestitutionPayments'] > 0) ]

sorted_filter = filtered_payment_df.sort_values(by = 'SumRestitutionPayments')

mean_val = round(np.mean(sorted_filter['SumRestitutionPayments']))
median_val = round(np.median(sorted_filter['SumRestitutionPayments']))

restitution_bar = plt.figure(figsize = (20,16))
axis = restitution_bar.add_subplot(1,1,1)
bar_fill = axis.barh(sorted_filter.ComplaintType, sorted_filter.SumRestitutionPayments)

axis.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: "${:,.0f}".format(x)))

plt.axvline(mean_val, color='r', linestyle='--', label=f'Mean: {mean_val}')
axis.text(mean_val + 10000, 5, f'Mean: {mean_val}', rotation = 0, fontsize = 10)

plt.axvline(median_val, color='g', linestyle='-', label=f'Median: {median_val}')
axis.text(median_val + 10000, 5, f'Median: {median_val}', rotation = 0, fontsize = 10)

plt.xlabel('Amount Paid in Restitution')
plt.ylabel('Complaint Type')
plt.title('Amount Paid in Restitution by Complaint Type', size = 20)
plt.annotate("Complaint Types were removed if the total number of lodged complaints was less than 50 or if the amount paid in restitution was = 0. This means we are down from 72 types to 23 which are shown above.", (0,0), (-80, -5))

for bar in bar_fill:
    width = bar.get_width()
    label_y = bar.get_y() + bar.get_height() / 2  
    axis.text(width + 450, label_y, s=f'${width:,.0f}', va='center', fontsize = 8) #-> Formatting and sizing attained using Google Gemini

restitution_bar = plt.plot(kind = 'bar')

sorted_table = filtered_payment_df.sort_values(by = 'ComplaintCount', ascending = False)
sorted_table = sorted_table.reset_index(drop = True)

print(sorted_table.to_html())
ComplaintType ComplaintCount SumRestitutionPayments
0 Misrepresentation - M01 3217 1514102.0
1 Exchange Goods/Contract Cancelled - E01 2554 980170.0
2 Billing Dispute - B02 2399 1730066.0
3 Damaged Goods - D01 2360 1019603.0
4 Surcharge/Overcharge - S02 2320 379861.0
5 Non-Delivery of Goods - N01 1676 353994.0
6 Quality of Work - Q01 1591 708783.0
7 Non-Delivery of Service - N02 1501 343983.0
8 Breach of Contract - B03 1253 1151547.0
9 Defective Goods - D02 737 220671.0
10 Refund Policy - R01 516 164253.0
11 Advertising/Misleading - A02 376 61862.0
12 Advertising/General - A01 355 282620.0
13 Outstanding Judgment - J01 223 124790.0
14 Breach of Warranty - B04 160 59761.0
15 Damage / Estimate 156 46141.0
16 Wrong Goods - W01 142 34025.0
17 Harassment - H01 137 43332.0
18 Lost Property - L02 132 20047.0
19 Other - Z01 71 37176.0
20 Unlicensed 68 7811.0
21 Illegal/Unfair Booting 61 7338.0
22 Closed without Notice 51 1460.0

Plot 4: Box and Whisker Plot for Restitution Payments by Complaint Type:

Based on Plot 3, it is clear that certain complaint types yield higher restitution amounts than others. Some are bound to have a higher ‘ceiling’ for what a customer may be able to receive in restitution. In asking this question the importance of scale comes into play for graphing the dollar amounts. Both the non-scaled plot as well as the log-transformed will be shown to illustrate the main point of variation in payment amounts with corresponding dollar amounts. The non-transformed plot will be first to acclimate the viewer to the scale, most of the analysis will come from the log-transformed plot.

Taking the outlines off the table for a second, we can see that ‘Quality of Work- Q01’ has the highest maximum payment value of ~ $10,000 or so dollars. The orientation of the box plot also suggests that the payouts tend towards this higher value since the median is closer to the maximum value.

Which of these complaint types has the least amount of variation for restitution payouts (excluding outliers for the moment…)? Two candidates are present: ‘Damage/Estimate’ and ‘Breach of Warranty-B04’ both have the narrowest ranges for acceptable points which suggests the variation in payouts falls within a uniform range with little deviation.

Going back to the initial plot we can see that there are 3 complaint types that have unusually large ranges (including outliers) those being: ‘Misrepresentation - M01’, ‘Damaged Goods- D01’, and ‘Breach of Contract - B03.’ Both Misrepresentation and Breach of Contract have instances where a customer was awarded over $100,000 in settlements. Damaged Goods did not hit that same level but exceeded all other extraneous payouts from other categories.

complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
       'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
                  'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
       'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']

filtered_box_df = df_copy[df_copy['Complaint Type'].isin(complaint_keep)]

filtered_box_df['Restitution'] = pd.to_numeric(filtered_box_df['Restitution'], errors='coerce');
filtered_box_df = filtered_box_df.dropna(subset=['Restitution']) 
filtered_box_df = filtered_box_df[filtered_box_df['Restitution'] != 0]

filtered_box_df.boxplot(by='Complaint Type',column = 'Restitution', vert = False, figsize=(20,16))

plt.title('Restitution by Complaint Type (Top 15 Types)')
plt.ylabel('Complaint Type')
plt.xticks(ha='right'); 

plt.show()

complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
       'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
                  'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
       'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']

filtered_box_df = df_copy[df_copy['Complaint Type'].isin(complaint_keep)]

filtered_box_df['Restitution'] = pd.to_numeric(filtered_box_df['Restitution'], errors='coerce');
filtered_box_df = filtered_box_df.dropna(subset=['Restitution']) 
filtered_box_df = filtered_box_df[filtered_box_df['Restitution'] != 0];
filtered_box_df.boxplot(by='Complaint Type',column = 'Restitution', vert = False, figsize=(20,16));
plt.title('Restitution by Complaint Type (Top 15 Types) With Log Transformation')
plt.ylabel('Complaint Type')
plt.xscale('log')
plt.xticks(ha='right'); 
plt.show()

Plot 5: Satisfaction of Customers:

At the heart of consumer complaints is a sense of dissatisfaction with the product or business to such an extent that they feel compelled to pursue come kind of action to achieve an ends that meets their expectations. This end can be monetary payment, closure, further legal action, or some other goal; so how many people when all is said and done are pleased with the outcome of their complaint through DCWP?

Looking at this plot we can see that 50% of the recorded cases end with satisfaction for the clients, while only 33% are dissatisfied with the conclusion of their respective issue. Those who did say they were pleased with the process and ultimate outcome of this ordeal most likely sought closure as their primary goal for the proceedings while those who stated they were unhappy might be from a position where they thought they would have gotten a larger payout than they ended up with.

remove_creditcard = df_copy[df_copy['Satisfaction'] != 'Credit Card Refund and/or Contract Cancelled - CRC']
satisfaction_counts = remove_creditcard['Satisfaction'].value_counts(dropna=False)

#Using the dictionary to condense the data frame was an idea from Google Gemini (execution of that idea was my own work)
satisfaction_df = pd.DataFrame({'Satisfaction': satisfaction_counts})

satisfaction_labels = ['Satisfied', 'Unsatisfied', 'Not Applicable/Failed Customer Response (NaN)']
plt.figure(figsize=(20,16))
plt.pie(satisfaction_df['Satisfaction'], labels=satisfaction_labels, autopct='%1.1f%%');

hole = plt.Circle((0, 0), 0.3, fc='white')  
fig = plt.gcf()
fig.gca().add_artist(hole)

total = satisfaction_df['Satisfaction'].sum()
plt.text(0, 0, f'The total sample size:\n{int(total):,} people.', ha='center', va='center', fontsize=10)  

labels = [f'{rating}: {count:,.0f}' for rating, count in satisfaction_counts.items()]
plt.title('Were Clients Pleased With the Restituion Received?\n(Non-Answers or Failed Responses are Denoted as NaN)')
plt.legend(labels, loc = 'best', title = 'Number of Clients:')

plt.show()

Conclusion:

Complaints are an unavoidable part of running a business and can be a serious obstruction to effective operations so it is important to understand what customers are looking at when engaging with a business/ company; especially when restitution payments from the offending party are on the table. Taking appropriate steps to mitigate customer-business friction is critical to maximizing satisfaction as well as profits for a business. Do what you can in order to foster a strong customer relationship in order to grasp entirely what they are looking for in this transaction to not create space for issues to arise.