In general, there are three things that buyers consider in buying goods, namely in terms of the seller, the specifications of the goods, and other buyer reviews of the goods.
Considerations for product specifications include product categories, product prices, product brands, product variations, price discounts, number of likes, comments, and views, product ratings, product stock, number of products sold, availability on flash sale, lowest price guarantee feature, deal features bundles, wholesale features, and free shipping features.
Considerations for buyer reviews include product quality, product accommodations, packaging conditions, delivery speed, buyers’ desire to order again, product defects, whether the product was delivered or not, and how similar the product is to the picture. These buyer reviews will affect the product rating.
Store considerations include the number of items, number of followers, response time, response rate, store location, number of buyers who rate the store, store ratings, and store labels (verified shop/official shop).
One of the focuses of e-commerce is ensuring the availability of products to meet customer needs. For this reason, e-commerce does not only pay attention to customer loyalty but also pays attention to the loyalty of shops/sellers by increasing buyer confidence in the store and increasing the number of transactions.
One of the programs used to increase the number of product transactions in stores is flash sales. Usually, the target of the flash sale program is a store that does not yet have a label or a store that has just joined e-commerce. The products offered in the flash sale program should also be products that are in high demand by customers so that the chances of buying them are greater.
The author will analyze three things that are considered by buyers in e-commerce transactions. This includes making machine learning regarding store classification for shopee verified and official shops. In addition, the program offers such as flash sales can be used to collaborate with stores in the program according to the items available at the store and items that are in high demand. The results of the analysis will be displayed on the dashboard which will be used as a consideration in compiling a program or determining the right target store to be included in a program run by e-commerce.
The target user of this project is e-commerce with the following benefits:
In addition to e-commerce, this project can also be implemented in the FMCG industry or other industries that have stores or branch offices. This project can also be used by shops joining e-commerce and buyers in choosing products.
The output of this project is a dashboard that displays:
The growing popularity of Korean inspired fashion has taken over e-retailing in the Philippines. Entrepreneurs are catching up to this growing trend by selling their Korean fashion inspired products on different online platforms.
The data is secondary data taken by the author from one of the contributors to Kaggle, namely Jaye Cabrera. The dataset contains:
library(dplyr)
library(lubridate)
library(ggplot2)product <- read.csv("data_input/2021June-July_product_data.csv", stringsAsFactors = T)
head(product)Data Glossary
glimpse(product)#> Rows: 1,583
#> Columns: 33
#> $ pk_product <dbl> 202106137919342720, 202106137140015488,~
#> $ date_collected <fct> 2021-06-13, 2021-06-13, 2021-06-13, 202~
#> $ product_itemid <dbl> 7919342724, 7140015494, 4843903203, 114~
#> $ product_shopid <int> 1083822, 130651949, 55748694, 396605392~
#> $ product_category <fct> Short Sleeves, Short Sleeves, Short Sle~
#> $ product_name <fct> "NKD Korean Fashion Erich Basic Daily P~
#> $ product_price <dbl> 300, NA, 200, 391, 300, 150, 150, 603, ~
#> $ product_price_min <int> 135, 149, 75, 294, 55, 63, 58, 423, 58,~
#> $ product_price_max <int> 138, 149, 75, 294, 55, 63, 58, 423, 64,~
#> $ product_discount <int> 55, 0, 63, 25, 82, 58, 61, 30, 61, 51, ~
#> $ product_brand <fct> No Brand, No Brand, No Brand, No Brand,~
#> $ product_like_count <int> 4504, 293, 8184, 13, 5999, 518, 24, 4, ~
#> $ product_comment_count <int> 3696, 158, 15912, 0, 11825, 1544, 83, 0~
#> $ product_views <int> 12815, 2189, 19575, 317, 27205, 1029, 1~
#> $ prod_rate_star_0 <int> 3447, 148, 13890, 0, 10138, 1422, 80, 0~
#> $ prod_rate_star_1 <int> 146, 3, 984, 0, 778, 59, 2, 0, 1293, 49~
#> $ prod_rate_star_2 <int> 70, 3, 606, 0, 459, 34, 0, 0, 726, 26, ~
#> $ prod_rate_star_3 <int> 9, 2, 178, 0, 200, 12, 0, 0, 185, 6, 0,~
#> $ prod_rate_star_4 <int> 24, 0, 251, 0, 249, 17, 0, 0, 309, 9, 0~
#> $ prod_rate_star_5 <int> 3696, 156, 15909, 0, 11824, 1544, 82, 0~
#> $ product_total_rating <dbl> 4.89, 4.90, 4.77, 0.00, 4.72, 4.85, 4.9~
#> $ stock <int> 134411, 84, 7721, 10799, 3358, 49297, 8~
#> $ units_sold <int> 10154, 336, 68171, 0, 35672, 4333, 189,~
#> $ status <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ shop_location <fct> "Taytay, Rizal", "San Nicolas, Metro Ma~
#> $ shop_is_on_flash_sale <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ shop_is_preferred_plus_seller <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ feature_lowest_price_guarantee <int> 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, ~
#> $ feature_can_use_bundle_deal <int> 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, ~
#> $ feature_can_use_cod <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ feature_can_use_wholesale <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ feature_show_free_shipping <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ product_variation_count <int> 20, 9, 14, 3, 10, 7, 9, 3, 14, 9, 12, 3~
We need to adjust data type of several variable: - pk_product, product_itemid, product_shopid, product_name as character - product_price as integer
product <- product %>%
mutate_at(c("pk_product", "product_itemid", "product_shopid", "product_name"), as.character) %>%
mutate_at("product_price", as.integer)colSums(is.na(product))#> pk_product date_collected
#> 0 0
#> product_itemid product_shopid
#> 0 0
#> product_category product_name
#> 0 0
#> product_price product_price_min
#> 198 0
#> product_price_max product_discount
#> 0 0
#> product_brand product_like_count
#> 0 0
#> product_comment_count product_views
#> 0 0
#> prod_rate_star_0 prod_rate_star_1
#> 0 0
#> prod_rate_star_2 prod_rate_star_3
#> 0 0
#> prod_rate_star_4 prod_rate_star_5
#> 0 0
#> product_total_rating stock
#> 0 0
#> units_sold status
#> 0 0
#> shop_location shop_is_on_flash_sale
#> 0 0
#> shop_is_preferred_plus_seller feature_lowest_price_guarantee
#> 0 0
#> feature_can_use_bundle_deal feature_can_use_cod
#> 0 0
#> feature_can_use_wholesale feature_show_free_shipping
#> 0 0
#> product_variation_count
#> 0
Several obervations in product_price variables have NA value.
product %>%
filter(is.na(product_price))Products that have NA value are products with no discount. So, we can fill NA value with product price minimum.
product$product_price[is.na(product$product_price)] = product$product_price_min[is.na(product$product_price)]
anyNA(product)#> [1] FALSE
product <- product %>%
select(-c(pk_product, date_collected, status))
head(product)shop <- read.csv("data_input/2021June-July_shop_data.csv", stringsAsFactors = T)
head(shop)Data Glossary
glimpse(shop)#> Rows: 746
#> Columns: 18
#> $ pk_shop <dbl> 20210613254711736, 20210613314198144, 2021061310377~
#> $ date_collected <fct> 2021-06-13, 2021-06-13, 2021-06-13, 2021-06-13, 202~
#> $ shopid <int> 254711738, 314198143, 10377223, 21407329, 11187885,~
#> $ name <fct> "hello.may.ph", "love.women.shop", "R&O RedOrange",~
#> $ join_month <fct> April, September, September, April, October, April,~
#> $ join_day <int> 27, 25, 27, 30, 18, 26, 17, 12, 22, 30, 14, 26, 17,~
#> $ join_year <int> 2020, 2020, 2016, 2017, 2016, 2021, 2020, 2019, 202~
#> $ item_count <int> 446, 263, 209, 165, 467, 115, 219, 1444, 286, 509, ~
#> $ follower_count <int> 17487, 8574, 660417, 18787, 67371, 41, 30480, 72704~
#> $ response_time <fct> 10:13:55, 10:43:05, 08:32:07, 09:10:38, 11:40:~
#> $ response_rate <int> 96, 73, 90, 94, 89, 29, 92, 95, 76, 85, 77, 10, 61,~
#> $ shop_location <fct> "", "San Nicolas,Metro Manila", "Pasay City,Metro M~
#> $ rating_bad <int> 181, 78, 4647, 44, 680, 0, 87, 354, 679, 4524, 21, ~
#> $ rating_good <int> 17510, 2656, 1071829, 18959, 39252, 2, 16460, 22350~
#> $ rating_normal <int> 729, 198, 18073, 215, 2383, 0, 408, 962, 2027, 1007~
#> $ rating_star <dbl> 4.81, 4.68, 4.91, 4.94, 4.73, 5.00, 4.87, 4.80, 4.7~
#> $ is_shopee_verified <int> 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, ~
#> $ is_official_shop <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
We need to adjust data type of several variables: - pk_shop, shopid, name as character - response_time as time
shop <- shop %>%
mutate_at(c("pk_shop", "shopid", "name"), as.character) %>%
mutate(response_time = hms(response_time))colSums(is.na(shop))#> pk_shop date_collected shopid name
#> 0 0 0 0
#> join_month join_day join_year item_count
#> 0 0 0 0
#> follower_count response_time response_rate shop_location
#> 0 0 0 0
#> rating_bad rating_good rating_normal rating_star
#> 0 0 0 6
#> is_shopee_verified is_official_shop
#> 0 0
There ara several observations that has NA value in rating_star variable.
shop %>%
filter(is.na(rating_star))These data have NA value because the shop has no transaction yet. So, we can fill the NA value with 0.
shop$rating_star[is.na(shop$rating_star)] = 0
anyNA(shop)#> [1] FALSE
shop <- shop %>%
select(-c(pk_shop, date_collected))
head(product)Target variables:
Predictor variables:
product %>%
group_by(product_category) %>%
summarise(total_product = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = total_product, y = reorder(product_category, total_product))) +
geom_col(aes(fill = total_product), color = "darkred") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Product",) +
geom_text(aes(label = total_product), nudge_x = -12) +
labs( title = "Number of Products with a Rating of 4.5 and Above in Each Category",
x = "Number of Product",
y = NULL
) Based on the chart above, most of the products have a rating of 4.5 and above. The crop top category has the most products with a rating of 4.5 and above.
The following are the most in-demand product categories with a rating of 4.5 and above at each store location.
product %>%
group_by(shop_location, product_category) %>%
summarise(total_product = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = total_product, y = reorder(shop_location, total_product)))+
geom_col(aes(fill = product_category), position = "fill")+
labs(x = "Category",
y = NULL,
fill = NULL)+
scale_fill_brewer(palette = "Set2")+
theme_minimal()+
theme(legend.position = "top") From the graph, it can be seen that each location has different product demands.
product %>%
group_by(product_category) %>%
summarise(likes = round(mean(product_like_count))) %>%
ggplot(aes(x = product_category, y = likes))+
geom_col(aes(fill = likes), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Likes Count",) +
geom_text(aes(label = likes), nudge_y = -80) +
labs( title = "Most in-demand Product Categories",
subtitle = "Based on Likes Count",
x = NULL,
y = "Likes Count"
)product %>%
group_by(product_category) %>%
summarise(views = round(mean(product_views))) %>%
ggplot(aes(x = product_category, y = views))+
geom_col(aes(fill = views), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Views",) +
geom_text(aes(label = views), nudge_y = -1000) +
labs( title = "Most in-demand Product Categories",
subtitle = "Based on Views",
x = NULL,
y = "Views"
) The crop top is the most popular product category both in terms of the number of likes and the number of views.
product %>%
mutate(feature_lowest_price_guarantee = as.factor(feature_lowest_price_guarantee)) %>%
group_by(feature_lowest_price_guarantee) %>%
summarise(products = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = feature_lowest_price_guarantee, y = products))+
geom_col(aes(fill = products), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = products), nudge_y = -80) +
labs( title = "Availability Feature Lowest Price Guarantee",
subtitle = "on Products with Rating 4.5 and Above",
x = NULL,
y = "Number of Products"
)product %>%
mutate(feature_can_use_bundle_deal = as.factor(feature_can_use_bundle_deal)) %>%
group_by(feature_can_use_bundle_deal) %>%
summarise(products = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = feature_can_use_bundle_deal, y = products))+
geom_col(aes(fill = products), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = products), nudge_y = -80) +
labs( title = "Availability Feature Use Bundle Deal",
subtitle = "on Products with Rating 4.5 and Above",
x = NULL,
y = "Number of Products"
)product %>%
mutate(feature_can_use_cod = as.factor(feature_can_use_cod)) %>%
group_by(feature_can_use_cod) %>%
summarise(products = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = feature_can_use_cod, y = products))+
geom_col(aes(fill = products), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = products), nudge_y = -80) +
labs( title = "Availability Feature Use Cash on Delivery",
subtitle = "on Products with Rating 4.5 and Above",
x = NULL,
y = "Number of Products"
)product %>%
mutate(feature_can_use_wholesale = as.factor(feature_can_use_wholesale)) %>%
group_by(feature_can_use_wholesale) %>%
summarise(products = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = feature_can_use_wholesale, y = products))+
geom_col(aes(fill = products), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = products), nudge_y = -80) +
labs( title = "Availability Feature Use Wholesale",
subtitle = "on Products with Rating 4.5 and Above",
x = NULL,
y = "Number of Products"
)product %>%
mutate(feature_show_free_shipping = as.factor(feature_show_free_shipping)) %>%
group_by(feature_show_free_shipping) %>%
summarise(products = sum(table(product_total_rating[product_total_rating >= 4.5]))) %>%
ggplot(aes(x = feature_show_free_shipping, y = products))+
geom_col(aes(fill = products), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = products), nudge_y = -80) +
labs( title = "Availability Feature Show Free Shipping",
subtitle = "on Products with Rating 4.5 and Above",
x = NULL,
y = "Number of Products"
) Based on 5 graphs of feature availability on products that have a rating of 4.5 and above, it can be concluded that feature availability does not affect product ratings.
Here are the shops that have not yet been rated or have a 0 rating.
shop %>%
filter(rating_star == 0)These shops can be used as recommendations for collaboration in several programs to increase shop ratings, for example participating in flash sales programs.
shop %>%
mutate(is_shopee_verified = as.factor(is_shopee_verified)) %>%
group_by(is_shopee_verified) %>%
summarise(shops = sum(table(shopid))) %>%
ggplot(aes(x = is_shopee_verified, y = shops))+
geom_col(aes(fill = shops), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = shops), nudge_y = -12) +
labs( title = "Shopee Verified",
x = NULL,
y = "Number of Shops"
) Most of the shops are not shopee verified.
shop %>%
mutate(is_official_shop = as.factor(is_official_shop)) %>%
group_by(is_official_shop) %>%
summarise(shops = sum(table(shopid))) %>%
ggplot(aes(x = is_official_shop, y = shops))+
geom_col(aes(fill = shops), position = "dodge") +
scale_fill_gradient(low = "white", high = "salmon", name = "Number of Products") +
geom_text(aes(label = shops), nudge_y = -12) +
labs( title = "Official Shop",
x = NULL,
y = "Number of Shops"
) Almost all shops are not official shops.
Shops that have been verified by Shopee and official stores can increase buyer confidence in transactions and increase the chances of transactions with buyers. Buyers who have transacted will rate the store and product they bought. This is also a consideration for buyers to transact again and can attract other buyers to transact with stores that have good ratings and good products.