[Preface: This data set is updated frequently, the version used in this analysis might not match currently available data depending on when this is viewed.]
“The customer is always right” is a phrase oft used to highlight the importance of customer satisfaction when doing business. Seeing as how the saying is not “the business is always right,” moments where a company, brand, or otherwise fails to deliver to customer expectations are bound to happen which result in consumer complaints against the offending party. Complaints come is all shapes and sizes from overdue repayment for services, qualms with the quality of goods/ services, to harassment of customers, breach of contract, or other serious offenses.
Any good business should ask themselves: ‘What can be gleaned from these complaints? Are there certain things that we can prevent from happening or reduce the damages to ensure best customer-business relations?’ That is what we are trying to find out with this analysis right now. Let’s see what we can take away from these complaints to better understand our consumer population.
Here’s a sample of the 24,295 rows of data with 16 columns:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import matplotlib.ticker as ticker
import seaborn as sns
import matplotlib.cm as cm
filename = "Consumer_Services_Mediated_Complaints.csv"
path = "C:/Users/neddi/Desktop/DS 736 Data Visualization/Python Programming Directory/"
df_copy = pd.read_csv(path + filename)
output_head = df_copy.head(11)
print(output_head.to_html())
| Industry | Complaint Type | Mediation Start Date | Mediation Close Date | Complaint Result | Satisfaction | Restitution | Business Building | Business Street | Building Address Unit | Business City | Business State | Business Zip | Complainant Zip | Longitude | Latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Home Improvement Contractor - 100 | Breach of Contract - B03 | 8/27/2014 | 10/17/2014 | No Business Response - NVR | No | 0 | 199 | LINCOLN AVE | NaN | BRONX | NY | 10454 | 29455 | -73.927766 | 40.810134 |
| 1 | Garage - 049 | Damaged Goods - D01 | 8/23/2016 | 10/13/2016 | Resolved and Consumer Satisfied - SPF | Yes | 0 | 1365 | YORK AVE | NaN | NEW YORK | NY | 10021 | 7470 | -73.953430 | 40.767308 |
| 2 | Tow Truck Company - 124 | Misrepresentation - M01 | 4/14/2015 | 5/13/2015 | Cash Amount - AMT | Yes | 68 | 2716 | 203RD ST | NaN | BAYSIDE | NY | 11360 | 11360 | -73.849438 | 40.784499 |
| 3 | Home Improvement Contractor - 100 | Quality of Work - Q01 | 10/5/2018 | 10/23/2018 | Goods Repaired - GRS | Yes | 0 | 13042 | VAN WYCK EXPY | NaN | S OZONE PARK | NY | 11420 | 11369 | -73.801724 | 40.670898 |
| 4 | Home Improvement Contractor - 100 | Quality of Work - Q01 | 7/20/2018 | 9/14/2018 | Advised to Sue - ATS | No | 0 | 435 | GOWER ST | NaN | STATEN ISLAND | NY | 10314 | 11782 | NaN | NaN |
| 5 | Home Improvement Contractor - 100 | Misrepresentation - M01 | 2/23/2018 | 3/9/2018 | Unable to Locate Business - ULV | NaN | 0 | 58 | 50TH ST | 4 | BROOKLYN | NY | 11232 | NaN | -73.967893 | 40.757895 |
| 6 | Furniture Sales - 242 | Wrong Goods - W01 | 7/27/2015 | 9/16/2015 | Advised to Sue - ATS | No | 0 | 8928 | 165TH ST | NaN | JAMAICA | NY | 11432 | 11432 | NaN | NaN |
| 7 | Pawnbroker - 080 | Misrepresentation - M01 | 3/27/2015 | 6/5/2015 | Referred to Hearing - RTH | No | 0 | 3885 | 13TH ST | NaN | LONG IS CITY | NY | 11101 | 11106 | NaN | NaN |
| 8 | Tow Truck Company - 124 | Surcharge/Overcharge - S02 | 8/29/2019 | 9/12/2019 | Cash Amount - AMT | Yes | 109 | 11 | BRICK CT | NaN | STATEN ISLAND | NY | 10309 | 10306 | NaN | NaN |
| 9 | Misc Non-Food Retail - 817 | Exchange Goods/Contract Cancelled - E01 | 8/12/2022 | 9/9/2022 | Cash Amount - AMT | Yes | 700 | 6217 | 6TH AVE | NaN | BROOKLYN | NY | 11220 | 11223 | -74.015392 | 40.636767 |
| 10 | Furniture Sales - 242 | Defective Goods - D02 | 5/24/2016 | 7/25/2016 | Advised to Sue - ATS | No | 0 | 125 | SOUTH ST | NaN | PASSAIC | NJ | 7055 | 11373 | -73.876560 | 40.736007 |
Source of Data from NYC Open Data: https://data.cityofnewyork.us/Business/Consumer-Services-Mediated-Complaints/nre2-6m2s/about_data.
For additional information about column contents: file:///C:/Users/neddi/Downloads/DCA_Consumer_Services_Mediated_Complaints_Data_Dictionary.pdf
There were 72 unique complaint types to sift through in this data set ranging from contract breach, issues with customer service, to theft; there’s a lot going on here. So who are our ‘big players’ in terms of the month by month breakdown of complaints collected? In order to get a good idea the Top 15 types (by volume) were isolated and then the remaining 57 were condensed into a category called ‘Other’ since many of them consisted of a single instance such as ‘Lemon’ or ‘False Advertising’.
First and foremost, it would appear that over the months the number of a given complaint does not vary all that much, on the contrary it would seem that the number of complaints received for a given type is fairly static for a given month. What is also readily apparent is that September and November have a noticeable (although not drastic) dip in the overall number of complaints accrued. Why might this be? Well our answer might lie in the other months. March through August (with the exception being May) have the highest number of complaints in aggregate people are going out more since the weather is nicer, summer is starting so people are having things done to their homes for renovations to better experience the warmth. Being out of the house would increase the likelihood that a person is interacting with a given business thus increasing the risk that they would have an unpleasant experience with a business.
September, October, November, and December are cooler (on average); school is picking back up so there is less downtime to go shopping. Looking again at the plot we see that the exceptions are October and December which both have rather large ‘shopping holidays’ those being Halloween and Christmas which would require additional interaction with businesses.
month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];
complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']
df_copy['Mediation Start Date'] = pd.to_datetime(df_copy['Mediation Start Date'], format='%m/%d/%Y', errors='coerce')
stacked_df = df_copy[['Mediation Start Date', 'Complaint Type']];
stacked_df['MonthComplaint'] = stacked_df['Mediation Start Date'].dt.strftime('%b')
stacked_df['Complaint Type'] = stacked_df['Complaint Type'].apply(lambda x: x if x in complaint_keep else 'Other')
stacked_df = stacked_df.groupby(['MonthComplaint', 'Complaint Type']).size().reset_index(name = 'TotalComplaintsMade')
stacked_df['MonthComplaint'] = pd.Categorical(stacked_df['MonthComplaint'], categories=month_order, ordered=True)
stacked_df = stacked_df.sort_values('MonthComplaint')
stack_pivot = stacked_df.pivot(index = 'MonthComplaint', columns = 'Complaint Type', values = 'TotalComplaintsMade');
stacked_figure = plt.figure(figsize = (25,16))
axis = stacked_figure.add_subplot(1,1,1)
#Generating unique colors achieved through Google Gemini
num_colors = len(stack_pivot.columns)
colors = cm.tab20(np.linspace(0, 1, num_colors))
stack_pivot.plot(kind='bar', stacked=True, ax=axis, color=colors)
plt.ylabel('Total Complaints Issued', fontsize=18, labelpad=10)
plt.title('Total Complaints Issued by Month and Complaint Type\nStacked Bar Chart',fontsize=20)
plt.xticks(rotation=0, horizontalalignment='center', fontsize=15);
plt.yticks(rotation=0, fontsize=14);
axis.set_xlabel('Month', fontsize=18)
axis.legend(loc='center left', bbox_to_anchor=(1, 0.25), fontsize=10)
[Note: There was a period from 7/21/23 to 11/xx/23 where data was not being retained. As a result there is a period from the dates listed above that are not at all filled and then go back up as if nothing happened. This is not program error, rather it is due to no new complaints being tabulated during this window.]
Aside from the little data hiccup outlined above everything else is in order so let’s take a look. 2014 had the highest number of months where it ranked number 1 in complaints received with 7 out of the 12 months having 2014 for its maximum value. Another unique trait of 2014 is that unlike the other years on record, there was an increase in the number of complaints from January to February. Looking at the graph, the normal trend is for there to be a dip from January to February then to jump up in March.
Again we have further proof of the declines in the number of complaints received in September and November forming a ‘W’ shape between August to December. September and November form the ‘valleys’ and August, October, and December are the ‘peaks.’
The year 2015 features the most erratic pattern with its drastic spike from March to April peaking at ~350 complaints in April and June then plunging to around 250 in September.
4 of the years, those being 2014, 2015, 2022, and 2023 have their peaks in April which suggests that business interactions are at a high point which we know to be the case from the above plot which clearly shows that April features the largest overall number of complaints.
df_copy['Mediation Start Date'] = pd.to_datetime(df_copy['Mediation Start Date'], errors='coerce')
df_copy['Year'] = df_copy['Mediation Start Date'].dt.year
df_copy['Month'] = df_copy['Mediation Start Date'].dt.month
group_complaint = df_copy.groupby(['Year', 'Month']).size().reset_index(name='ComplaintCount')
change_group = group_complaint.pivot(index='Month', columns='Year', values='ComplaintCount')
plt.figure(figsize=(20, 16))
change_group.plot(marker='*', linestyle='-', ax=plt.gca())
plt.title('Complaints by Year and Month')
plt.xlabel('Month')
plt.ylabel('Complaint Count')
plt.xticks(range(1, 13),month_order);
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
Complaints that make their way to the courts often involve some kind of restitution payment to the consumer as a means of mending the transgression that took place with a given business. It is a fair question to ask which of these complaint types yielded the most in restitution payments given the spread on this data set. Do they fall around the same amount or is one of them ahead by a sizable margin when compared to the others? For the sake of this exploration there were two (2) criteria that needed to be met in order to be included:
Once the weeding-out process had been completed we were left with 23 complaint types to look at, a far cry from the 72 that we started with.
Out of the 23 types, we have 4 that have broken $1 million dollars in aggregate restitution payments those being:
Goods and Contract Cancellation is just shy with $980,170. Then there is a rather sharp dip between 5th and 6th place with a difference of $271,387. This trend continues to present itself as the differences continue to expand going all the way down.
Does the count match the amount paid in restitution? Not entirely… Shown below the plot is a table containing the counts of the complaint types where the discrepancies become apparent.
Misrepresentation has the most number of complaints with a difference of ~700 between it and the 2nd highest competitor. Despite this sizable lead, it is the second highest for aggregate restitution amounts.
Breach of Contract is the 9th largest group with 1,253 complaints and yet it is the 3rd hights in terms of aggregate restitution amounts.
As we move down the plot towards the mean, the order adheres to the order in which they appear with respect to the number of complaints in each category.
df_copy['Restitution'] = pd.to_numeric(df_copy['Restitution'], errors='coerce')
payment_df = df_copy.groupby(['Complaint Type']).agg({'Complaint Type': ['count'], 'Restitution': ['sum']}).reset_index()
payment_df.columns = ['ComplaintType', 'ComplaintCount', 'SumRestitutionPayments']
filtered_payment_df = payment_df[ (payment_df['ComplaintCount'] >= 50) & (payment_df['SumRestitutionPayments'] > 0) ]
sorted_filter = filtered_payment_df.sort_values(by = 'SumRestitutionPayments')
mean_val = round(np.mean(sorted_filter['SumRestitutionPayments']))
median_val = round(np.median(sorted_filter['SumRestitutionPayments']))
restitution_bar = plt.figure(figsize = (20,16))
axis = restitution_bar.add_subplot(1,1,1)
bar_fill = axis.barh(sorted_filter.ComplaintType, sorted_filter.SumRestitutionPayments)
axis.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: "${:,.0f}".format(x)))
plt.axvline(mean_val, color='r', linestyle='--', label=f'Mean: {mean_val}')
axis.text(mean_val + 10000, 5, f'Mean: {mean_val}', rotation = 0, fontsize = 10)
plt.axvline(median_val, color='g', linestyle='-', label=f'Median: {median_val}')
axis.text(median_val + 10000, 5, f'Median: {median_val}', rotation = 0, fontsize = 10)
plt.xlabel('Amount Paid in Restitution')
plt.ylabel('Complaint Type')
plt.title('Amount Paid in Restitution by Complaint Type', size = 20)
plt.annotate("Complaint Types were removed if the total number of lodged complaints was less than 50 or if the amount paid in restitution was = 0. This means we are down from 72 types to 23 which are shown above.", (0,0), (-80, -5))
for bar in bar_fill:
width = bar.get_width()
label_y = bar.get_y() + bar.get_height() / 2
axis.text(width + 450, label_y, s=f'${width:,.0f}', va='center', fontsize = 8) #-> Formatting and sizing attained using Google Gemini
restitution_bar = plt.plot(kind = 'bar')
sorted_table = filtered_payment_df.sort_values(by = 'ComplaintCount', ascending = False)
sorted_table = sorted_table.reset_index(drop = True)
print(sorted_table.to_html())
| ComplaintType | ComplaintCount | SumRestitutionPayments | |
|---|---|---|---|
| 0 | Misrepresentation - M01 | 3217 | 1514102.0 |
| 1 | Exchange Goods/Contract Cancelled - E01 | 2554 | 980170.0 |
| 2 | Billing Dispute - B02 | 2399 | 1730066.0 |
| 3 | Damaged Goods - D01 | 2360 | 1019603.0 |
| 4 | Surcharge/Overcharge - S02 | 2320 | 379861.0 |
| 5 | Non-Delivery of Goods - N01 | 1676 | 353994.0 |
| 6 | Quality of Work - Q01 | 1591 | 708783.0 |
| 7 | Non-Delivery of Service - N02 | 1501 | 343983.0 |
| 8 | Breach of Contract - B03 | 1253 | 1151547.0 |
| 9 | Defective Goods - D02 | 737 | 220671.0 |
| 10 | Refund Policy - R01 | 516 | 164253.0 |
| 11 | Advertising/Misleading - A02 | 376 | 61862.0 |
| 12 | Advertising/General - A01 | 355 | 282620.0 |
| 13 | Outstanding Judgment - J01 | 223 | 124790.0 |
| 14 | Breach of Warranty - B04 | 160 | 59761.0 |
| 15 | Damage / Estimate | 156 | 46141.0 |
| 16 | Wrong Goods - W01 | 142 | 34025.0 |
| 17 | Harassment - H01 | 137 | 43332.0 |
| 18 | Lost Property - L02 | 132 | 20047.0 |
| 19 | Other - Z01 | 71 | 37176.0 |
| 20 | Unlicensed | 68 | 7811.0 |
| 21 | Illegal/Unfair Booting | 61 | 7338.0 |
| 22 | Closed without Notice | 51 | 1460.0 |
Based on Plot 3, it is clear that certain complaint types yield higher restitution amounts than others. Some are bound to have a higher ‘ceiling’ for what a customer may be able to receive in restitution. In asking this question the importance of scale comes into play for graphing the dollar amounts. Both the non-scaled plot as well as the log-transformed will be shown to illustrate the main point of variation in payment amounts with corresponding dollar amounts. The non-transformed plot will be first to acclimate the viewer to the scale, most of the analysis will come from the log-transformed plot.
Taking the outlines off the table for a second, we can see that ‘Quality of Work- Q01’ has the highest maximum payment value of ~ $10,000 or so dollars. The orientation of the box plot also suggests that the payouts tend towards this higher value since the median is closer to the maximum value.
Which of these complaint types has the least amount of variation for restitution payouts (excluding outliers for the moment…)? Two candidates are present: ‘Damage/Estimate’ and ‘Breach of Warranty-B04’ both have the narrowest ranges for acceptable points which suggests the variation in payouts falls within a uniform range with little deviation.
Going back to the initial plot we can see that there are 3 complaint types that have unusually large ranges (including outliers) those being: ‘Misrepresentation - M01’, ‘Damaged Goods- D01’, and ‘Breach of Contract - B03.’ Both Misrepresentation and Breach of Contract have instances where a customer was awarded over $100,000 in settlements. Damaged Goods did not hit that same level but exceeded all other extraneous payouts from other categories.
complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']
filtered_box_df = df_copy[df_copy['Complaint Type'].isin(complaint_keep)]
filtered_box_df['Restitution'] = pd.to_numeric(filtered_box_df['Restitution'], errors='coerce');
filtered_box_df = filtered_box_df.dropna(subset=['Restitution'])
filtered_box_df = filtered_box_df[filtered_box_df['Restitution'] != 0]
filtered_box_df.boxplot(by='Complaint Type',column = 'Restitution', vert = False, figsize=(20,16))
plt.title('Restitution by Complaint Type (Top 15 Types)')
plt.ylabel('Complaint Type')
plt.xticks(ha='right');
plt.show()
complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']
filtered_box_df = df_copy[df_copy['Complaint Type'].isin(complaint_keep)]
filtered_box_df['Restitution'] = pd.to_numeric(filtered_box_df['Restitution'], errors='coerce');
filtered_box_df = filtered_box_df.dropna(subset=['Restitution'])
filtered_box_df = filtered_box_df[filtered_box_df['Restitution'] != 0];
filtered_box_df.boxplot(by='Complaint Type',column = 'Restitution', vert = False, figsize=(20,16));
plt.title('Restitution by Complaint Type (Top 15 Types) With Log Transformation')
plt.ylabel('Complaint Type')
plt.xscale('log')
plt.xticks(ha='right');
plt.show()
At the heart of consumer complaints is a sense of dissatisfaction with the product or business to such an extent that they feel compelled to pursue come kind of action to achieve an ends that meets their expectations. This end can be monetary payment, closure, further legal action, or some other goal; so how many people when all is said and done are pleased with the outcome of their complaint through DCWP?
Looking at this plot we can see that 50% of the recorded cases end with satisfaction for the clients, while only 33% are dissatisfied with the conclusion of their respective issue. Those who did say they were pleased with the process and ultimate outcome of this ordeal most likely sought closure as their primary goal for the proceedings while those who stated they were unhappy might be from a position where they thought they would have gotten a larger payout than they ended up with.
remove_creditcard = df_copy[df_copy['Satisfaction'] != 'Credit Card Refund and/or Contract Cancelled - CRC']
satisfaction_counts = remove_creditcard['Satisfaction'].value_counts(dropna=False)
#Using the dictionary to condense the data frame was an idea from Google Gemini (execution of that idea was my own work)
satisfaction_df = pd.DataFrame({'Satisfaction': satisfaction_counts})
satisfaction_labels = ['Satisfied', 'Unsatisfied', 'Not Applicable/Failed Customer Response (NaN)']
plt.figure(figsize=(20,16))
plt.pie(satisfaction_df['Satisfaction'], labels=satisfaction_labels, autopct='%1.1f%%');
hole = plt.Circle((0, 0), 0.3, fc='white')
fig = plt.gcf()
fig.gca().add_artist(hole)
total = satisfaction_df['Satisfaction'].sum()
plt.text(0, 0, f'The total sample size:\n{int(total):,} people.', ha='center', va='center', fontsize=10)
labels = [f'{rating}: {count:,.0f}' for rating, count in satisfaction_counts.items()]
plt.title('Were Clients Pleased With the Restituion Received?\n(Non-Answers or Failed Responses are Denoted as NaN)')
plt.legend(labels, loc = 'best', title = 'Number of Clients:')
plt.show()
Complaints are an unavoidable part of running a business and can be a serious obstruction to effective operations so it is important to understand what customers are looking at when engaging with a business/ company; especially when restitution payments from the offending party are on the table. Taking appropriate steps to mitigate customer-business friction is critical to maximizing satisfaction as well as profits for a business. Do what you can in order to foster a strong customer relationship in order to grasp entirely what they are looking for in this transaction to not create space for issues to arise.