Background

Nowadays, C2C model in the business world is getting more and more popular. By providing a platform, suppliers and the customers make transactions easily. However, there are many unknown factors in the E-commerce world. Using its data will be a great way to help the E-commerce store grow.

In this study, I’ll use the data from a French E-commerce website and a map data to answer the following two questions:

  1. Are the users active enough in that website?

  2. How do those active users distribute in the world?

For more questions, please visit thses links:

https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store

https://data.world/sgpohlj87/countrylatlong/workspace/file?filename=country_lat_long.xlsx

pacman::p_load(tidyverse)
pacman::p_load(readr)
pacman::p_load(maps)
pacman::p_load(sf)
pacman::p_load(downloader)

dat <- read_csv("6M-0K-99K.users.dataset.public.csv")
map_dat <- readxl::read_excel("country_lat_long.xlsx")

Question 1

dat1 <- dat %>% 
  mutate(Activation = case_when(
    daysSinceLastLogin <= 30 ~ "30 days active users",
    daysSinceLastLogin %in% c(30:90) ~ "90 days active users",
    daysSinceLastLogin >= 90 ~ "Not active users"
  ))


dat1 %>% 
  ggplot(aes(x = daysSinceLastLogin, fill = Activation)) + 
    geom_bar() + 
    geom_vline(xintercept = 30,
               size = .5,
               linetype = "dashed",
               color = "gray") + 
    annotate(x = 30,
             y = -2,
             label = "30 days",
             geom = "text",
             color = "gray40",
             vjust = 2) + 
    geom_vline(xintercept = 90,
               size = .5,
               linetype = "dashed",
               color = "gray") + 
    annotate(x = 90,
             y = -2,
             label = "90 days",
             geom = "text",
             color = "gray40",
             vjust = 2) +
    scale_fill_manual(values = c("blue", "gray40", "red")) +
    theme_bw() + 
    theme(panel.background = element_blank(),
          panel.grid = element_blank()) +
  labs(title = "Number of Active Users Distribution", x = "Days Since Last Login", y = "Number of Users")

# Create percentages for each group
dat2 <- dat1 %>% 
  mutate(total = n()) %>% 
  group_by(Activation) %>% 
  summarise(Activation, count = n(), total, percentage = count/total*100) %>% 
  unique() %>% 
  mutate(percentage = paste0(round(percentage, 2), "%"))
 

knitr::kable(dat2, escape=TRUE, digits=4)
Activation count total percentage
30 days active users 3999 98913 4.04%
90 days active users 2987 98913 3.02%
Not active users 91927 98913 92.94%

I seperated the users into three categories: 30 days active users, 90 days active users, and not active users. From the graph above, we can see that there were only a few of users staying active within 30 days and 90 days. Most of the users were not active. The table can help us take a close look at the actual numbers. There were only 4.04% of the users using the app at least once, and about 3.02% of the users using the appat least once. However, 92.94% of the users logined their account 90 days or even later. It indicates that the users were not active with the app. Getting users’ attentions and helping them stay active will be a good focus.

dat3 <- dat1 %>% 
  filter(daysSinceLastLogin <= 90) %>% 
  select(Activation, gender, productsBought) %>% 
  group_by(Activation, gender) %>% 
  summarise(total = sum(productsBought))


ggplot(data = dat3, aes(x = gender, y = total, fill = Activation, label = total)) +
  geom_bar(stat = "identity") + 
  geom_text(aes(label = total), position = position_stack(vjust = 0.7)) +
  theme_bw() +
  theme(panel.grid = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank()) +
  labs(title = "The Number of Products Bought by Active Users", x = "Gender", y = "Total Number of Products Bought")

From the graph above, we can see that along those active users, the number of products bought for female users were about three times of male users’. So products for female users will be high demanded in this platform. Also, we can see that the amount of products 30 days active users bought were about four times of 90 days actives users’. From this, we can say that keeping users to stay active in the platform will increase the amount of transactions, which will be a great focus for the company to the user inferface more friendly to female users to achieve this goal.

Question 2

For this question, I’ll consider 30 days and 90 days active users as active users.

# Wrangle data
# Create a function to get the total number of users for each group
get_total_number <- function(df, country) {
  summary <- df %>% 
    group_by(country) %>% 
    summarise(country = country, countryCode = countryCode, total = n()) %>% 
    unique() %>% 
    ungroup()
  
  return(summary)
}

dat4 <- dat1 %>% 
  filter(Activation %in% c("30 days active users", "90 days active users" )) %>% 
  get_total_number(., country) %>% 
  mutate(Code = toupper(.$countryCode))

dat5 <- dat4 %>% 
  left_join(map_dat, by = c("Code" = "countrycode")) %>% 
  rename(Country_in_Frech = country.x,
         Country_in_English = country.y)

map <- map_data("world")

# Create Visualization
ggplot() +
  geom_polygon(data = map, aes(x=long, y=lat, group=group)) +
  geom_point(data = dat5, aes(x = longitude, y = latitude, size = total), color = "skyblue") + 
  theme_bw() +
  scale_size_continuous(name = "Total Active Users")

# prepare data for the table
dat6 <- dat5 %>% 
  mutate(overall_total = sum(total),
         percent = round(total/overall_total, 5)*100) %>% 
  mutate(percentage = paste0(percent, "%")) %>%
  arrange(desc(percent)) %>% 
  select(Country_in_Frech, Country_in_English, total, overall_total, percentage) 

knitr::kable(dat6, escape=TRUE, digits=4)
Country_in_Frech Country_in_English total overall_total percentage
France France 1724 6986 24.678%
Royaume-Uni United Kingdom 837 6986 11.981%
Etats-Unis United States 766 6986 10.965%
Italie Italy 746 6986 10.678%
Allemagne Germany 561 6986 8.03%
Espagne Spain 354 6986 5.067%
Danemark Denmark 269 6986 3.851%
Suède Sweden 234 6986 3.35%
Australie Australia 215 6986 3.078%
Pays-Bas Netherlands 180 6986 2.577%
Belgique Belgium 135 6986 1.932%
Finlande Finland 99 6986 1.417%
Canada Canada 85 6986 1.217%
Suisse Switzerland 85 6986 1.217%
Hong Kong Hong Kong 70 6986 1.002%
Autriche Austria 67 6986 0.959%
Roumanie Romania 50 6986 0.716%
Grèce Greece 42 6986 0.601%
Irlande Ireland 41 6986 0.587%
Portugal Portugal 41 6986 0.587%
Pologne Poland 35 6986 0.501%
Singapour Singapore 25 6986 0.358%
Russie Russia 21 6986 0.301%
Luxembourg Luxembourg 18 6986 0.258%
Norvège Norway 18 6986 0.258%
Chine China 17 6986 0.243%
Chypre Cyprus 15 6986 0.215%
Bulgarie Bulgaria 14 6986 0.2%
Japon Japan 13 6986 0.186%
Croatie Croatia 12 6986 0.172%
Taiwan Taiwan 10 6986 0.143%
Afrique du Sud South Africa 8 6986 0.115%
Malaisie Malaysia 8 6986 0.115%
République tchèque Czech Republic 8 6986 0.115%
Corée du Sud South Korea 7 6986 0.1%
Hongrie Hungary 7 6986 0.1%
Lettonie Latvia 7 6986 0.1%
Maroc Morocco 7 6986 0.1%
Slovaquie Slovakia 7 6986 0.1%
Émirats arabes unis United Arab Emirates 6 6986 0.086%
Tunisie Tunisia 6 6986 0.086%
Ukraine Ukraine 6 6986 0.086%
Brésil Brazil 5 6986 0.072%
Israel Israel 5 6986 0.072%
Nouvelle Zélande New Zealand 5 6986 0.072%
Turquie Turkey 5 6986 0.072%
Arabie Saoudite Saudi Arabia 4 6986 0.057%
Islande Iceland 4 6986 0.057%
Malte Malta 4 6986 0.057%
Monaco Monaco 4 6986 0.057%
Arménie Armenia 3 6986 0.043%
Bahreïn Bahrain 3 6986 0.043%
Cambodge Cambodia 3 6986 0.043%
Egypte Egypt 3 6986 0.043%
Estonie Estonia 3 6986 0.043%
Mexique Mexico 3 6986 0.043%
Puerto Rico Puerto Rico 3 6986 0.043%
Serbie Serbia 3 6986 0.043%
Algérie Algeria 2 6986 0.029%
Argentine Argentina 2 6986 0.029%
Colombie Colombia 2 6986 0.029%
Géorgie Georgia 2 6986 0.029%
Guadeloupe Guadeloupe 2 6986 0.029%
Guyane Guyana 2 6986 0.029%
Inde India 2 6986 0.029%
Indonésie Indonesia 2 6986 0.029%
Jordanie Jordan 2 6986 0.029%
Kazakhstan Kazakhstan 2 6986 0.029%
Lituanie Lithuania 2 6986 0.029%
Madagascar Madagascar 2 6986 0.029%
Martinique Martinique 2 6986 0.029%
Philippines Philippines 2 6986 0.029%
Réunion Réunion 2 6986 0.029%
Slovénie Slovenia 2 6986 0.029%
Thailande Thailand 2 6986 0.029%
Andorre Andorra 1 6986 0.014%
Anguilla Anguilla 1 6986 0.014%
Azerbaïdjan Azerbaijan 1 6986 0.014%
Biélorussie Belarus 1 6986 0.014%
Brunei Darussalam Brunei 1 6986 0.014%
Cameroun Cameroon 1 6986 0.014%
Costa Rica Costa Rica 1 6986 0.014%
Gabon Gabon 1 6986 0.014%
Kenya Kenya 1 6986 0.014%
Kiribati Kiribati 1 6986 0.014%
Kowait Kuwait 1 6986 0.014%
Liban Lebanon 1 6986 0.014%
Macau Macau 1 6986 0.014%
Panama Panama 1 6986 0.014%
Paraguay Paraguay 1 6986 0.014%
Viet Nam Vietnam 1 6986 0.014%

From the graph above, we can see that most of the active users for this service are from Europe. There are some active users in other continents, but they didn’t have a lot compared to the European continent. It indicates that there is a hugh potential to grow the number of active users in the Asian, American, and Australian continent.

Conclusion

There are not many active users in the website or the app. There are rooms to improve in keeping the users active. Also, most of the users are female. Designing the user interface more friendly toward female will be a good focus to help the users stay active. Besides that, there are rooms to improve in advertising this website in other continents since there are not many active users in other continents. So, advertising this website in other continents will be a good focus as well.