RESEARCH INTENTION

My aim with this analysis to estimate the customer satisfaction with purchases of different categories of clothing from an online shopping platform. I wanted to use different techniques learnt during the course and apply them on the unstructured text. This is quite informative as it can measure the success or failure of online shopping as compared to in person. Today customer reviews are a crucial aspect that attract customers as what is better than people just like us reviewing products that are in our wishlist.

THE DATASET

I chose a dataset available on Kaggle with reviews about different clothing articles by women purchased online. This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the following variables:

I will be working with the text mainly for the text pre-processing part but before that I need to understand what exactly does my data constitute.

EXPLORATORY DATA ANALYSIS

As part of EDA, I wanted to see the different kinds of information my dataset can reveal without any text pre-processing. There are many different types of columns that can reveal insights into customers buying patterns, their likes and dislikes, about the websites strenghts and weakness and if there is any room for improvement in any of the areas.

1. Rating by Department

The average rating for different departments was calculated. Here we can see that Bottoms received the highest average ratings by customers, although one would believe that purchasing bottoms online without trying them on would turn out to be a disaster. But that is not the case in our data.

Rating by Department
Dept Average_Rating TotalCount
5.000000 14
Bottoms 4.288760 3799
Dresses 4.150815 6319
Intimate 4.280115 1735
Jackets 4.264535 1032
Tops 4.172239 10468
Trend 3.815126 119

2. Reviews by Department

Next, I checked the percentage of reviews left by customers for each department. As the bar chart below shows, tops received the most reviews. From this we cannot infer whether those reviews were positive or negative. All we can see is, approximately 44% reviews were received by the tops department.

3. Age by Department

I also wanted to see how were the reviews distributed in different departments by different age groups. These age categories are randomly created. So we can see that for all age groups tops, dresses and bottoms were the top 3 department sthat received the most reviews.

4. Top 10 words used in Reviews

I looked at the top 10 most frequently used words in reviews columns. These inlcude dress, love, size, top and more I will use these later to see the negative and positive words and create word clouds after cleaning our text further.

TEXT PRE PROCESSING

In the following section I intend to clean the data for further analysis. I filtered on the reviews that were not recommended. I started the text pre-processing with lower casing all the words. Then I went ahead and tokenized and removed stop words that would make words like I, you, and, repeat several times without adding any value to the review itself. By the end I have top 10 negative words.

There were some noteworthy points. Customers mostly talk about dresses and tops in the negative reviews. They also talk about size and fit as an issue which is understandable due to the nature of the business as it an online shop and customers purchase the articles without trying them on. Fabric, material, color are also concerns mentioned by customers. Again, naturally, a drawback of online shopping is what comes in actual may be fairly different from what is advertised and displayed in pictures. Return, is another word frequently used. Perhaps by customers dissatisfied by their purchases and looking for ways to return the article.

BIGRAMS

I used bigrams to see more details about the reviews. Sometimes, merely one word is insufficient to reveal any substantial information. In case of customers reviews this is quite helpful. For example if the word ‘super’ is taken on its own, I cannot infer anything out of it. Is it used for a positive or a negative review? However, if I see super being paired with cute or super alongside uncomfortable I can immediately categorize the review as positive or negative. Below I have shown this and added more depth using ratings from 1-5. For a dress rated 1 we can see poor fit as part of a review while that with a rating 5 fits perfectly is part of a review.

WORDCLOUD

Word clouds are a really interesting way to visualize words. The size, colour, numbers are all clumped together in a cloud and one can immediately make sense of what is being shown and why.

I have created two word clouds. One with the articles rated 1 and the other rated 5. For the ones rated 1, I can see the words such as extremely disappointed and bad quality. While for the ones rated 5, runs true and super soft are some words used. We can also see that the word cloud for less rated has fewer words compared to that of the high rated ones. We know from our EDA above that this will be the case as more people recommended the website as compared to those who did not. This means that those who recommended the wesbite, left a positive review and we had more positive than negative reviews hence the bigger word cloud for articles rated 5.

CONCLUSION

From the analysis above, I can tell the overall customer reviews were positive. With this analysis, I have only scratched the surface of unstructured text and the insights it can give for a business. However, there is much more work, that can be done in this regard. The analysis is limited by the number of reviews available. The more data we have the better it is for analysis to get a more detailed understanding. With e-commerce taking pace as a preferred mode of shopping specially during and post covid world, customers reviews tend to serve as important information for the business as well as other potential buyers. With an analysis as this, business can see where there is room for improvement, which of their products are doing well and what impression do customers have of their products.