Nowadays, C2C model in the business world is getting more and more popular. By providing a platform, suppliers and the customers make transactions easily. However, there are many unknown factors in the E-commerce world. Using its data will be a great way to help the E-commerce store grow.
In this study, I’ll use the data from a French E-commerce website and a map data to answer the following two questions:
Are the users active enough in that website?
How do those active users distribute in the world?
For more questions, please visit thses links:
https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store
https://data.world/sgpohlj87/countrylatlong/workspace/file?filename=country_lat_long.xlsx
pacman::p_load(tidyverse)
pacman::p_load(readr)
pacman::p_load(maps)
pacman::p_load(sf)
pacman::p_load(downloader)
dat <- read_csv("6M-0K-99K.users.dataset.public.csv")
map_dat <- readxl::read_excel("country_lat_long.xlsx")
dat1 <- dat %>%
mutate(Activation = case_when(
daysSinceLastLogin <= 30 ~ "30 days active users",
daysSinceLastLogin %in% c(30:90) ~ "90 days active users",
daysSinceLastLogin >= 90 ~ "Not active users"
))
dat1 %>%
ggplot(aes(x = daysSinceLastLogin, fill = Activation)) +
geom_bar() +
geom_vline(xintercept = 30,
size = .5,
linetype = "dashed",
color = "gray") +
annotate(x = 30,
y = -2,
label = "30 days",
geom = "text",
color = "gray40",
vjust = 2) +
geom_vline(xintercept = 90,
size = .5,
linetype = "dashed",
color = "gray") +
annotate(x = 90,
y = -2,
label = "90 days",
geom = "text",
color = "gray40",
vjust = 2) +
scale_fill_manual(values = c("blue", "gray40", "red")) +
theme_bw() +
theme(panel.background = element_blank(),
panel.grid = element_blank()) +
labs(title = "Number of Active Users Distribution", x = "Days Since Last Login", y = "Number of Users")
# Create percentages for each group
dat2 <- dat1 %>%
mutate(total = n()) %>%
group_by(Activation) %>%
summarise(Activation, count = n(), total, percentage = count/total*100) %>%
unique() %>%
mutate(percentage = paste0(round(percentage, 2), "%"))
knitr::kable(dat2, escape=TRUE, digits=4)
Activation | count | total | percentage |
---|---|---|---|
30 days active users | 3999 | 98913 | 4.04% |
90 days active users | 2987 | 98913 | 3.02% |
Not active users | 91927 | 98913 | 92.94% |
I seperated the users into three categories: 30 days active users, 90 days active users, and not active users. From the graph above, we can see that there were only a few of users staying active within 30 days and 90 days. Most of the users were not active. The table can help us take a close look at the actual numbers. There were only 4.04% of the users using the app at least once, and about 3.02% of the users using the appat least once. However, 92.94% of the users logined their account 90 days or even later. It indicates that the users were not active with the app. Getting users’ attentions and helping them stay active will be a good focus.
dat3 <- dat1 %>%
filter(daysSinceLastLogin <= 90) %>%
select(Activation, gender, productsBought) %>%
group_by(Activation, gender) %>%
summarise(total = sum(productsBought))
ggplot(data = dat3, aes(x = gender, y = total, fill = Activation, label = total)) +
geom_bar(stat = "identity") +
geom_text(aes(label = total), position = position_stack(vjust = 0.7)) +
theme_bw() +
theme(panel.grid = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
labs(title = "The Number of Products Bought by Active Users", x = "Gender", y = "Total Number of Products Bought")
From the graph above, we can see that along those active users, the number of products bought for female users were about three times of male users’. So products for female users will be high demanded in this platform. Also, we can see that the amount of products 30 days active users bought were about four times of 90 days actives users’. From this, we can say that keeping users to stay active in the platform will increase the amount of transactions, which will be a great focus for the company to the user inferface more friendly to female users to achieve this goal.
For this question, I’ll consider 30 days and 90 days active users as active users.
# Wrangle data
# Create a function to get the total number of users for each group
get_total_number <- function(df, country) {
summary <- df %>%
group_by(country) %>%
summarise(country = country, countryCode = countryCode, total = n()) %>%
unique() %>%
ungroup()
return(summary)
}
dat4 <- dat1 %>%
filter(Activation %in% c("30 days active users", "90 days active users" )) %>%
get_total_number(., country) %>%
mutate(Code = toupper(.$countryCode))
dat5 <- dat4 %>%
left_join(map_dat, by = c("Code" = "countrycode")) %>%
rename(Country_in_Frech = country.x,
Country_in_English = country.y)
map <- map_data("world")
# Create Visualization
ggplot() +
geom_polygon(data = map, aes(x=long, y=lat, group=group)) +
geom_point(data = dat5, aes(x = longitude, y = latitude, size = total), color = "skyblue") +
theme_bw() +
scale_size_continuous(name = "Total Active Users")
# prepare data for the table
dat6 <- dat5 %>%
mutate(overall_total = sum(total),
percent = round(total/overall_total, 5)*100) %>%
mutate(percentage = paste0(percent, "%")) %>%
arrange(desc(percent)) %>%
select(Country_in_Frech, Country_in_English, total, overall_total, percentage)
knitr::kable(dat6, escape=TRUE, digits=4)
Country_in_Frech | Country_in_English | total | overall_total | percentage |
---|---|---|---|---|
France | France | 1724 | 6986 | 24.678% |
Royaume-Uni | United Kingdom | 837 | 6986 | 11.981% |
Etats-Unis | United States | 766 | 6986 | 10.965% |
Italie | Italy | 746 | 6986 | 10.678% |
Allemagne | Germany | 561 | 6986 | 8.03% |
Espagne | Spain | 354 | 6986 | 5.067% |
Danemark | Denmark | 269 | 6986 | 3.851% |
Suède | Sweden | 234 | 6986 | 3.35% |
Australie | Australia | 215 | 6986 | 3.078% |
Pays-Bas | Netherlands | 180 | 6986 | 2.577% |
Belgique | Belgium | 135 | 6986 | 1.932% |
Finlande | Finland | 99 | 6986 | 1.417% |
Canada | Canada | 85 | 6986 | 1.217% |
Suisse | Switzerland | 85 | 6986 | 1.217% |
Hong Kong | Hong Kong | 70 | 6986 | 1.002% |
Autriche | Austria | 67 | 6986 | 0.959% |
Roumanie | Romania | 50 | 6986 | 0.716% |
Grèce | Greece | 42 | 6986 | 0.601% |
Irlande | Ireland | 41 | 6986 | 0.587% |
Portugal | Portugal | 41 | 6986 | 0.587% |
Pologne | Poland | 35 | 6986 | 0.501% |
Singapour | Singapore | 25 | 6986 | 0.358% |
Russie | Russia | 21 | 6986 | 0.301% |
Luxembourg | Luxembourg | 18 | 6986 | 0.258% |
Norvège | Norway | 18 | 6986 | 0.258% |
Chine | China | 17 | 6986 | 0.243% |
Chypre | Cyprus | 15 | 6986 | 0.215% |
Bulgarie | Bulgaria | 14 | 6986 | 0.2% |
Japon | Japan | 13 | 6986 | 0.186% |
Croatie | Croatia | 12 | 6986 | 0.172% |
Taiwan | Taiwan | 10 | 6986 | 0.143% |
Afrique du Sud | South Africa | 8 | 6986 | 0.115% |
Malaisie | Malaysia | 8 | 6986 | 0.115% |
République tchèque | Czech Republic | 8 | 6986 | 0.115% |
Corée du Sud | South Korea | 7 | 6986 | 0.1% |
Hongrie | Hungary | 7 | 6986 | 0.1% |
Lettonie | Latvia | 7 | 6986 | 0.1% |
Maroc | Morocco | 7 | 6986 | 0.1% |
Slovaquie | Slovakia | 7 | 6986 | 0.1% |
Émirats arabes unis | United Arab Emirates | 6 | 6986 | 0.086% |
Tunisie | Tunisia | 6 | 6986 | 0.086% |
Ukraine | Ukraine | 6 | 6986 | 0.086% |
Brésil | Brazil | 5 | 6986 | 0.072% |
Israel | Israel | 5 | 6986 | 0.072% |
Nouvelle Zélande | New Zealand | 5 | 6986 | 0.072% |
Turquie | Turkey | 5 | 6986 | 0.072% |
Arabie Saoudite | Saudi Arabia | 4 | 6986 | 0.057% |
Islande | Iceland | 4 | 6986 | 0.057% |
Malte | Malta | 4 | 6986 | 0.057% |
Monaco | Monaco | 4 | 6986 | 0.057% |
Arménie | Armenia | 3 | 6986 | 0.043% |
Bahreïn | Bahrain | 3 | 6986 | 0.043% |
Cambodge | Cambodia | 3 | 6986 | 0.043% |
Egypte | Egypt | 3 | 6986 | 0.043% |
Estonie | Estonia | 3 | 6986 | 0.043% |
Mexique | Mexico | 3 | 6986 | 0.043% |
Puerto Rico | Puerto Rico | 3 | 6986 | 0.043% |
Serbie | Serbia | 3 | 6986 | 0.043% |
Algérie | Algeria | 2 | 6986 | 0.029% |
Argentine | Argentina | 2 | 6986 | 0.029% |
Colombie | Colombia | 2 | 6986 | 0.029% |
Géorgie | Georgia | 2 | 6986 | 0.029% |
Guadeloupe | Guadeloupe | 2 | 6986 | 0.029% |
Guyane | Guyana | 2 | 6986 | 0.029% |
Inde | India | 2 | 6986 | 0.029% |
Indonésie | Indonesia | 2 | 6986 | 0.029% |
Jordanie | Jordan | 2 | 6986 | 0.029% |
Kazakhstan | Kazakhstan | 2 | 6986 | 0.029% |
Lituanie | Lithuania | 2 | 6986 | 0.029% |
Madagascar | Madagascar | 2 | 6986 | 0.029% |
Martinique | Martinique | 2 | 6986 | 0.029% |
Philippines | Philippines | 2 | 6986 | 0.029% |
Réunion | Réunion | 2 | 6986 | 0.029% |
Slovénie | Slovenia | 2 | 6986 | 0.029% |
Thailande | Thailand | 2 | 6986 | 0.029% |
Andorre | Andorra | 1 | 6986 | 0.014% |
Anguilla | Anguilla | 1 | 6986 | 0.014% |
Azerbaïdjan | Azerbaijan | 1 | 6986 | 0.014% |
Biélorussie | Belarus | 1 | 6986 | 0.014% |
Brunei Darussalam | Brunei | 1 | 6986 | 0.014% |
Cameroun | Cameroon | 1 | 6986 | 0.014% |
Costa Rica | Costa Rica | 1 | 6986 | 0.014% |
Gabon | Gabon | 1 | 6986 | 0.014% |
Kenya | Kenya | 1 | 6986 | 0.014% |
Kiribati | Kiribati | 1 | 6986 | 0.014% |
Kowait | Kuwait | 1 | 6986 | 0.014% |
Liban | Lebanon | 1 | 6986 | 0.014% |
Macau | Macau | 1 | 6986 | 0.014% |
Panama | Panama | 1 | 6986 | 0.014% |
Paraguay | Paraguay | 1 | 6986 | 0.014% |
Viet Nam | Vietnam | 1 | 6986 | 0.014% |
From the graph above, we can see that most of the active users for this service are from Europe. There are some active users in other continents, but they didn’t have a lot compared to the European continent. It indicates that there is a hugh potential to grow the number of active users in the Asian, American, and Australian continent.
There are not many active users in the website or the app. There are rooms to improve in keeping the users active. Also, most of the users are female. Designing the user interface more friendly toward female will be a good focus to help the users stay active. Besides that, there are rooms to improve in advertising this website in other continents since there are not many active users in other continents. So, advertising this website in other continents will be a good focus as well.