EXPLORATION OF CUSTOMER BEHAVIOUR ON E-COMMERCE PLATFORM

Domain: eCommerce

  • Valli a/p Suppramaniam (17053409)
  • Ushani Kumari (s2003535)
  • Rahmezana Yaakub (s2003839)
  • Mohd Anas Ahmad (s2001089)

EXPLANATION ON THE DATASET

The name of the dataset is Brazilian E-Commerce Public Dataset by Olist. The dataset was generated by Olist Store based on orders from 2016 to 2018 from multiple marketplaces in Brazil and it was shared on Kaggle.

It contains 9 files and 52 columns. The files are:

  1. olist_customers_dataset.csv
  2. olist_geolocation_dataset.csv
  3. olist_order_items_dataset.csv
  4. olist_order_payments_dataset.csv
  5. olist_order_reviews_dataset.csv
  6. olist_orders_dataset.csv
  7. olist_products_dataset.csv
  8. olist_sellers_dataset.csv 
  9. product_category_name_translation.csv

The key features in the dataset are order status, price, payment amount, delivery charges, and review score.

The key questions are segregated into two (2) main categories, which are:

1. Customer’s Perspective - to facilitate the product selection process by providing the following information:

  • For a selected product, to list the items by product ID, price and review score.
  • Which are the top selling product categories?

2. Portal Admin’s Perspective - to provide key insights on the performance of the portal, which include:

  • Revenue distribution by product category.
  • Revenue distribution by sales.
  • The Gross Merchandise Value (GMV) trend over time.
  • Information on key measures i.e. Gross Merchandise Value, Average Order Value and Past 30 Days Active Customers (A30 Users).

DATA CLEANSING & MERGING

The data is divided into multiple files and the following schema was used to prepare the final dataset:

Data schema figure place holder

Dataset Preparation includes the following:

  • Add new columns with column name Category_score_review to categorize customer satisfaction.
  • Merging all dataset into 1 view.
  • Empty columns were replaced with ‘na’ values.
  • Remove duplicate data from the dataset.

Treatment for Missing Values

  • The missing values are identified and removed. Removal was deemed as an appropriate treatment as the treatment using other values may distort the data.
  • Furthermore, to evaluate other type of treatments will require subject matter view, which is not available.

New variable creation

  • In order to derive more insights from the data, new variable called Gross Merchandise Value (GMV) was derived. The GMV variable is plotted against time to analyse the overall sales performance over time.

KEY FINDINGS

a) The overall GMV value is showing an increasing trend over time as shown in the below graph:

Data schema figure place holder

The above chart also allows for further exploration on average order value and active users over the past 30 days.

b) The top 10 products are:

Data schema figure place holder

c) The states that are doing well are:

Data schema figure place holder

INSIGHTS

How the above insights can be utilised to be actionable items:

  1. Obviously, the usage and GMV has increased tremendously over time. As such, the portal owner could identify the merchants contributing towards the GMV and reward them by introducing incentive schemes so that they will continue to sell in the same platform.
  2. For sales record with bad review comments, to identify the major factor and improve the particular issue so that overall customer satisfaction can be improved further.
  3. Introduce discount schemes to further boost the sales of top selling products.

Find the app here: https://ushani.shinyapps.io/eCommerce_Performance
Find the codes here: https://github.com/UshaniE/Analysis-of-eCommerce

References:
https://www.kaggle.com/olistbr/brazilian-ecommerce