New York Open Data: Consumer Mediated Complaints (DCWP)

Getting to Know Our Data:

[Preface: This data set is updated frequently, the version used in this analysis might not match currently available data depending on when this is viewed.]

“The customer is always right” is a phrase oft used to highlight the importance of customer satisfaction when doing business. Seeing as how the saying is not “the business is always right,” moments where a company, brand, or otherwise fails to deliver to customer expectations are bound to happen which result in consumer complaints against the offending party. Complaints come is all shapes and sizes from overdue repayment for services, qualms with the quality of goods/ services, to harassment of customers, breach of contract, or other serious offenses.

Any good business should ask themselves: ‘What can be gleaned from these complaints? Are there certain things that we can prevent from happening or reduce the damages to ensure best customer-business relations?’ That is what we are trying to find out with this analysis right now. Let’s see what we can take away from these complaints to better understand our consumer population.

Here’s a sample of the 24,295 rows of data with 16 columns:

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import matplotlib.ticker as ticker
import seaborn as sns
import matplotlib.cm as cm

filename = "Consumer_Services_Mediated_Complaints.csv"
path = "C:/Users/neddi/Desktop/DS 736 Data Visualization/Python Programming Directory/" 
df_copy = pd.read_csv(path + filename)

output_head = df_copy.head(11)

print(output_head.to_html())

	Industry	Complaint Type	Mediation Start Date	Mediation Close Date	Complaint Result	Satisfaction	Restitution	Business Building	Business Street	Building Address Unit	Business City	Business State	Business Zip	Complainant Zip	Longitude	Latitude
0	Home Improvement Contractor - 100	Breach of Contract - B03	8/27/2014	10/17/2014	No Business Response - NVR	No	0	199	LINCOLN AVE	NaN	BRONX	NY	10454	29455	-73.927766	40.810134
1	Garage - 049	Damaged Goods - D01	8/23/2016	10/13/2016	Resolved and Consumer Satisfied - SPF	Yes	0	1365	YORK AVE	NaN	NEW YORK	NY	10021	7470	-73.953430	40.767308
2	Tow Truck Company - 124	Misrepresentation - M01	4/14/2015	5/13/2015	Cash Amount - AMT	Yes	68	2716	203RD ST	NaN	BAYSIDE	NY	11360	11360	-73.849438	40.784499
3	Home Improvement Contractor - 100	Quality of Work - Q01	10/5/2018	10/23/2018	Goods Repaired - GRS	Yes	0	13042	VAN WYCK EXPY	NaN	S OZONE PARK	NY	11420	11369	-73.801724	40.670898
4	Home Improvement Contractor - 100	Quality of Work - Q01	7/20/2018	9/14/2018	Advised to Sue - ATS	No	0	435	GOWER ST	NaN	STATEN ISLAND	NY	10314	11782	NaN	NaN
5	Home Improvement Contractor - 100	Misrepresentation - M01	2/23/2018	3/9/2018	Unable to Locate Business - ULV	NaN	0	58	50TH ST	4	BROOKLYN	NY	11232	NaN	-73.967893	40.757895
6	Furniture Sales - 242	Wrong Goods - W01	7/27/2015	9/16/2015	Advised to Sue - ATS	No	0	8928	165TH ST	NaN	JAMAICA	NY	11432	11432	NaN	NaN
7	Pawnbroker - 080	Misrepresentation - M01	3/27/2015	6/5/2015	Referred to Hearing - RTH	No	0	3885	13TH ST	NaN	LONG IS CITY	NY	11101	11106	NaN	NaN
8	Tow Truck Company - 124	Surcharge/Overcharge - S02	8/29/2019	9/12/2019	Cash Amount - AMT	Yes	109	11	BRICK CT	NaN	STATEN ISLAND	NY	10309	10306	NaN	NaN
9	Misc Non-Food Retail - 817	Exchange Goods/Contract Cancelled - E01	8/12/2022	9/9/2022	Cash Amount - AMT	Yes	700	6217	6TH AVE	NaN	BROOKLYN	NY	11220	11223	-74.015392	40.636767
10	Furniture Sales - 242	Defective Goods - D02	5/24/2016	7/25/2016	Advised to Sue - ATS	No	0	125	SOUTH ST	NaN	PASSAIC	NJ	7055	11373	-73.876560	40.736007

Source of Data from NYC Open Data: https://data.cityofnewyork.us/Business/Consumer-Services-Mediated-Complaints/nre2-6m2s/about_data.

For additional information about column contents: file:///C:/Users/neddi/Downloads/DCA_Consumer_Services_Mediated_Complaints_Data_Dictionary.pdf

Analysis Through Plotting:

Plot 1: Complaints Issued by Month (Top 15 Types and ‘Other’ Category):

There were 72 unique complaint types to sift through in this data set ranging from contract breach, issues with customer service, to theft; there’s a lot going on here. So who are our ‘big players’ in terms of the month by month breakdown of complaints collected? In order to get a good idea the Top 15 types (by volume) were isolated and then the remaining 57 were condensed into a category called ‘Other’ since many of them consisted of a single instance such as ‘Lemon’ or ‘False Advertising’.

First and foremost, it would appear that over the months the number of a given complaint does not vary all that much, on the contrary it would seem that the number of complaints received for a given type is fairly static for a given month. What is also readily apparent is that September and November have a noticeable (although not drastic) dip in the overall number of complaints accrued. Why might this be? Well our answer might lie in the other months. March through August (with the exception being May) have the highest number of complaints in aggregate people are going out more since the weather is nicer, summer is starting so people are having things done to their homes for renovations to better experience the warmth. Being out of the house would increase the likelihood that a person is interacting with a given business thus increasing the risk that they would have an unpleasant experience with a business.

September, October, November, and December are cooler (on average); school is picking back up so there is less downtime to go shopping. Looking again at the plot we see that the exceptions are October and December which both have rather large ‘shopping holidays’ those being Halloween and Christmas which would require additional interaction with businesses.

month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];

complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
       'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
                  'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
       'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']

df_copy['Mediation Start Date'] = pd.to_datetime(df_copy['Mediation Start Date'], format='%m/%d/%Y', errors='coerce')
stacked_df = df_copy[['Mediation Start Date', 'Complaint Type']];

stacked_df['MonthComplaint'] = stacked_df['Mediation Start Date'].dt.strftime('%b')

stacked_df['Complaint Type'] = stacked_df['Complaint Type'].apply(lambda x: x if x in complaint_keep else 'Other')

stacked_df = stacked_df.groupby(['MonthComplaint', 'Complaint Type']).size().reset_index(name = 'TotalComplaintsMade')

stacked_df['MonthComplaint'] = pd.Categorical(stacked_df['MonthComplaint'], categories=month_order, ordered=True)

stacked_df = stacked_df.sort_values('MonthComplaint')

stack_pivot = stacked_df.pivot(index = 'MonthComplaint', columns = 'Complaint Type', values = 'TotalComplaintsMade');

stacked_figure = plt.figure(figsize = (25,16))

axis = stacked_figure.add_subplot(1,1,1)

#Generating unique colors achieved through Google Gemini
num_colors = len(stack_pivot.columns)
colors = cm.tab20(np.linspace(0, 1, num_colors))

stack_pivot.plot(kind='bar', stacked=True, ax=axis, color=colors)

plt.ylabel('Total Complaints Issued', fontsize=18, labelpad=10)
plt.title('Total Complaints Issued by Month and Complaint Type\nStacked Bar Chart',fontsize=20)
plt.xticks(rotation=0, horizontalalignment='center', fontsize=15);
plt.yticks(rotation=0, fontsize=14);
axis.set_xlabel('Month', fontsize=18)

axis.legend(loc='center left', bbox_to_anchor=(1, 0.25), fontsize=10)

Plot 2: Line Plot of Complaints by Year and Month:

[Note: There was a period from 7/21/23 to 11/xx/23 where data was not being retained. As a result there is a period from the dates listed above that are not at all filled and then go back up as if nothing happened. This is not program error, rather it is due to no new complaints being tabulated during this window.]

Aside from the little data hiccup outlined above everything else is in order so let’s take a look. 2014 had the highest number of months where it ranked number 1 in complaints received with 7 out of the 12 months having 2014 for its maximum value. Another unique trait of 2014 is that unlike the other years on record, there was an increase in the number of complaints from January to February. Looking at the graph, the normal trend is for there to be a dip from January to February then to jump up in March.

Again we have further proof of the declines in the number of complaints received in September and November forming a ‘W’ shape between August to December. September and November form the ‘valleys’ and August, October, and December are the ‘peaks.’

The year 2015 features the most erratic pattern with its drastic spike from March to April peaking at ~350 complaints in April and June then plunging to around 250 in September.

4 of the years, those being 2014, 2015, 2022, and 2023 have their peaks in April which suggests that business interactions are at a high point which we know to be the case from the above plot which clearly shows that April features the largest overall number of complaints.

df_copy['Mediation Start Date'] = pd.to_datetime(df_copy['Mediation Start Date'], errors='coerce')
df_copy['Year'] = df_copy['Mediation Start Date'].dt.year
df_copy['Month'] = df_copy['Mediation Start Date'].dt.month

group_complaint = df_copy.groupby(['Year', 'Month']).size().reset_index(name='ComplaintCount')

change_group = group_complaint.pivot(index='Month', columns='Year', values='ComplaintCount')

plt.figure(figsize=(20, 16))
change_group.plot(marker='*', linestyle='-', ax=plt.gca()) 

plt.title('Complaints by Year and Month')
plt.xlabel('Month')
plt.ylabel('Complaint Count')
plt.xticks(range(1, 13),month_order); 
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

Plot 3: Aggregate Restitution by Complaint Type:

Complaints that make their way to the courts often involve some kind of restitution payment to the consumer as a means of mending the transgression that took place with a given business. It is a fair question to ask which of these complaint types yielded the most in restitution payments given the spread on this data set. Do they fall around the same amount or is one of them ahead by a sizable margin when compared to the others? For the sake of this exploration there were two (2) criteria that needed to be met in order to be included:

The sum of the restitution payments MUST be greater than $0
The number of instances for a given complaint type must be greater than 50 units.

Once the weeding-out process had been completed we were left with 23 complaint types to look at, a far cry from the 72 that we started with.

Out of the 23 types, we have 4 that have broken $1 million dollars in aggregate restitution payments those being:

Billing Disputes - $1.73 million
Misrepresentation - $1.51 million
Breach of Contract - $1.15 million
Damaged Goods - $1.01 million

Goods and Contract Cancellation is just shy with $980,170. Then there is a rather sharp dip between 5th and 6th place with a difference of $271,387. This trend continues to present itself as the differences continue to expand going all the way down.

Does the count match the amount paid in restitution? Not entirely… Shown below the plot is a table containing the counts of the complaint types where the discrepancies become apparent.

Misrepresentation has the most number of complaints with a difference of ~700 between it and the 2nd highest competitor. Despite this sizable lead, it is the second highest for aggregate restitution amounts.
Breach of Contract is the 9th largest group with 1,253 complaints and yet it is the 3rd hights in terms of aggregate restitution amounts.

As we move down the plot towards the mean, the order adheres to the order in which they appear with respect to the number of complaints in each category.

df_copy['Restitution'] = pd.to_numeric(df_copy['Restitution'], errors='coerce')
payment_df = df_copy.groupby(['Complaint Type']).agg({'Complaint Type': ['count'], 'Restitution': ['sum']}).reset_index()
payment_df.columns = ['ComplaintType', 'ComplaintCount', 'SumRestitutionPayments']

filtered_payment_df = payment_df[ (payment_df['ComplaintCount'] >= 50) & (payment_df['SumRestitutionPayments'] > 0) ]

sorted_filter = filtered_payment_df.sort_values(by = 'SumRestitutionPayments')

mean_val = round(np.mean(sorted_filter['SumRestitutionPayments']))
median_val = round(np.median(sorted_filter['SumRestitutionPayments']))

restitution_bar = plt.figure(figsize = (20,16))
axis = restitution_bar.add_subplot(1,1,1)
bar_fill = axis.barh(sorted_filter.ComplaintType, sorted_filter.SumRestitutionPayments)

axis.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: "${:,.0f}".format(x)))

plt.axvline(mean_val, color='r', linestyle='--', label=f'Mean: {mean_val}')
axis.text(mean_val + 10000, 5, f'Mean: {mean_val}', rotation = 0, fontsize = 10)

plt.axvline(median_val, color='g', linestyle='-', label=f'Median: {median_val}')
axis.text(median_val + 10000, 5, f'Median: {median_val}', rotation = 0, fontsize = 10)

plt.xlabel('Amount Paid in Restitution')
plt.ylabel('Complaint Type')
plt.title('Amount Paid in Restitution by Complaint Type', size = 20)
plt.annotate("Complaint Types were removed if the total number of lodged complaints was less than 50 or if the amount paid in restitution was = 0. This means we are down from 72 types to 23 which are shown above.", (0,0), (-80, -5))

for bar in bar_fill:
    width = bar.get_width()
    label_y = bar.get_y() + bar.get_height() / 2  
    axis.text(width + 450, label_y, s=f'${width:,.0f}', va='center', fontsize = 8) #-> Formatting and sizing attained using Google Gemini

restitution_bar = plt.plot(kind = 'bar')

sorted_table = filtered_payment_df.sort_values(by = 'ComplaintCount', ascending = False)
sorted_table = sorted_table.reset_index(drop = True)

print(sorted_table.to_html())

	ComplaintType	ComplaintCount	SumRestitutionPayments
0	Misrepresentation - M01	3217	1514102.0
1	Exchange Goods/Contract Cancelled - E01	2554	980170.0
2	Billing Dispute - B02	2399	1730066.0
3	Damaged Goods - D01	2360	1019603.0
4	Surcharge/Overcharge - S02	2320	379861.0
5	Non-Delivery of Goods - N01	1676	353994.0
6	Quality of Work - Q01	1591	708783.0
7	Non-Delivery of Service - N02	1501	343983.0
8	Breach of Contract - B03	1253	1151547.0
9	Defective Goods - D02	737	220671.0
10	Refund Policy - R01	516	164253.0
11	Advertising/Misleading - A02	376	61862.0
12	Advertising/General - A01	355	282620.0
13	Outstanding Judgment - J01	223	124790.0
14	Breach of Warranty - B04	160	59761.0
15	Damage / Estimate	156	46141.0
16	Wrong Goods - W01	142	34025.0
17	Harassment - H01	137	43332.0
18	Lost Property - L02	132	20047.0
19	Other - Z01	71	37176.0
20	Unlicensed	68	7811.0
21	Illegal/Unfair Booting	61	7338.0
22	Closed without Notice	51	1460.0

Plot 4: Box and Whisker Plot for Restitution Payments by Complaint Type:

Based on Plot 3, it is clear that certain complaint types yield higher restitution amounts than others. Some are bound to have a higher ‘ceiling’ for what a customer may be able to receive in restitution. In asking this question the importance of scale comes into play for graphing the dollar amounts. Both the non-scaled plot as well as the log-transformed will be shown to illustrate the main point of variation in payment amounts with corresponding dollar amounts. The non-transformed plot will be first to acclimate the viewer to the scale, most of the analysis will come from the log-transformed plot.

Taking the outlines off the table for a second, we can see that ‘Quality of Work- Q01’ has the highest maximum payment value of ~ $10,000 or so dollars. The orientation of the box plot also suggests that the payouts tend towards this higher value since the median is closer to the maximum value.

Which of these complaint types has the least amount of variation for restitution payouts (excluding outliers for the moment…)? Two candidates are present: ‘Damage/Estimate’ and ‘Breach of Warranty-B04’ both have the narrowest ranges for acceptable points which suggests the variation in payouts falls within a uniform range with little deviation.

Going back to the initial plot we can see that there are 3 complaint types that have unusually large ranges (including outliers) those being: ‘Misrepresentation - M01’, ‘Damaged Goods- D01’, and ‘Breach of Contract - B03.’ Both Misrepresentation and Breach of Contract have instances where a customer was awarded over $100,000 in settlements. Damaged Goods did not hit that same level but exceeded all other extraneous payouts from other categories.

complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
       'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
                  'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
       'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']

filtered_box_df = df_copy[df_copy['Complaint Type'].isin(complaint_keep)]

filtered_box_df['Restitution'] = pd.to_numeric(filtered_box_df['Restitution'], errors='coerce');
filtered_box_df = filtered_box_df.dropna(subset=['Restitution']) 
filtered_box_df = filtered_box_df[filtered_box_df['Restitution'] != 0]

filtered_box_df.boxplot(by='Complaint Type',column = 'Restitution', vert = False, figsize=(20,16))

plt.title('Restitution by Complaint Type (Top 15 Types)')
plt.ylabel('Complaint Type')
plt.xticks(ha='right'); 

plt.show()

complaint_keep = ['Damage / Estimate', 'Breach of Warranty - B04','Advertising/Misleading - A02', 'Outstanding Judgment - J01',
       'Refund Policy - R01', 'Defective Goods - D02','Advertising/General - A01', 'Non-Delivery of Service - N02',
                  'Non-Delivery of Goods - N01', 'Surcharge/Overcharge - S02','Quality of Work - Q01', 'Exchange Goods/Contract Cancelled - E01',
       'Damaged Goods - D01', 'Breach of Contract - B03','Misrepresentation - M01']

filtered_box_df = df_copy[df_copy['Complaint Type'].isin(complaint_keep)]

filtered_box_df['Restitution'] = pd.to_numeric(filtered_box_df['Restitution'], errors='coerce');
filtered_box_df = filtered_box_df.dropna(subset=['Restitution']) 
filtered_box_df = filtered_box_df[filtered_box_df['Restitution'] != 0];
filtered_box_df.boxplot(by='Complaint Type',column = 'Restitution', vert = False, figsize=(20,16));
plt.title('Restitution by Complaint Type (Top 15 Types) With Log Transformation')
plt.ylabel('Complaint Type')
plt.xscale('log')
plt.xticks(ha='right'); 
plt.show()

Plot 5: Satisfaction of Customers:

At the heart of consumer complaints is a sense of dissatisfaction with the product or business to such an extent that they feel compelled to pursue come kind of action to achieve an ends that meets their expectations. This end can be monetary payment, closure, further legal action, or some other goal; so how many people when all is said and done are pleased with the outcome of their complaint through DCWP?

Looking at this plot we can see that 50% of the recorded cases end with satisfaction for the clients, while only 33% are dissatisfied with the conclusion of their respective issue. Those who did say they were pleased with the process and ultimate outcome of this ordeal most likely sought closure as their primary goal for the proceedings while those who stated they were unhappy might be from a position where they thought they would have gotten a larger payout than they ended up with.

remove_creditcard = df_copy[df_copy['Satisfaction'] != 'Credit Card Refund and/or Contract Cancelled - CRC']
satisfaction_counts = remove_creditcard['Satisfaction'].value_counts(dropna=False)

#Using the dictionary to condense the data frame was an idea from Google Gemini (execution of that idea was my own work)
satisfaction_df = pd.DataFrame({'Satisfaction': satisfaction_counts})

satisfaction_labels = ['Satisfied', 'Unsatisfied', 'Not Applicable/Failed Customer Response (NaN)']
plt.figure(figsize=(20,16))
plt.pie(satisfaction_df['Satisfaction'], labels=satisfaction_labels, autopct='%1.1f%%');

hole = plt.Circle((0, 0), 0.3, fc='white')  
fig = plt.gcf()
fig.gca().add_artist(hole)

total = satisfaction_df['Satisfaction'].sum()
plt.text(0, 0, f'The total sample size:\n{int(total):,} people.', ha='center', va='center', fontsize=10)  

labels = [f'{rating}: {count:,.0f}' for rating, count in satisfaction_counts.items()]
plt.title('Were Clients Pleased With the Restituion Received?\n(Non-Answers or Failed Responses are Denoted as NaN)')
plt.legend(labels, loc = 'best', title = 'Number of Clients:')

plt.show()

Conclusion:

Complaints are an unavoidable part of running a business and can be a serious obstruction to effective operations so it is important to understand what customers are looking at when engaging with a business/ company; especially when restitution payments from the offending party are on the table. Taking appropriate steps to mitigate customer-business friction is critical to maximizing satisfaction as well as profits for a business. Do what you can in order to foster a strong customer relationship in order to grasp entirely what they are looking for in this transaction to not create space for issues to arise.