Dans ce tutoriel, nous allons voir comment travailler avec R studio et son Notebook Rmarkdown, plus spécifiquement pour effectuer des analyses du monde réél, avec le dataset que vous trouverez dans le site de Tableau nommé “Global Superstore Orders” ou encore dans le lien suivant https://data.world/tableauhelp/superstore-data-sets. Vous aurez dans certains cas à traiter le dataset pour avoir de bons résultats car le dataset a des données à emputer surtout si vous voulez faire du Machine Learning avec.
Bref, Ici Nous allons utiliser un haut niveau de Tableau appelé Level of Detail(LOD) ce qui nous permettra de pousser de manière fine nos analyses et effectuer des visualisations très complexes grâce à des aggregations. Rappelons que Tableau nous aide facilement à créer des graphes grâce à des Drag_&Drop, mais dans le monde réel, parfois les Drag&_Drop ne suffisent pas. C’est la raison pour laquelle nous allons comparer la visualisation entre R et Tableau avec quelque fois l’équivalent SQL, afin que si vous avez déjà un usage de R que vous puissiez comprendre comment fonctionne Tableau et vice-versa en performant facilement le “LOD” que je trouve super passionnant par sa complexité et surtout par sa précision.(Tout est une question de compréhension certains le trouverons très facile après 😉!!!!).
Je vous conseil d’avoir Tableau et R dans votre pc pour pouvoir en même temps effectuer des testes, c’est super formateur et on se fait très vite une idée sur nos préférences selon les problématiques, ce qui vous aidera très fortement dans vos analyses et présentations en entreprise.
# Manipulation des dataframes
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts -------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Traiter les dates
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.6.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
# ggplot
library(ggplot2)
# Visualisation interactive
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.3
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggplotlyExtra)
## Warning: package 'ggplotlyExtra' was built under R version 3.6.3
# Manipulation des tables
library(data.table)
## Warning: package 'data.table' was built under R version 3.6.3
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
# Manipulation des tables avec SQLDF
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Warning: package 'RSQLite' was built under R version 3.6.3
superstore <- fread("sup_store.csv")
# Visualisation des 6 premières lignes
head(superstore)
## Row_ID Order_ID Order_Date Ship_Date Ship_Mode Customer_ID
## 1: 32298 CA-2012-124891 31/07/2012 31/07/2012 Same Day RH-19495
## 2: 40155 CA-2014-135909 14/10/2014 21/10/2014 Standard Class JW-15220
## 3: 40936 CA-2012-116638 28/01/2012 31/01/2012 Second Class JH-15985
## 4: 34577 CA-2011-102988 05/04/2011 09/04/2011 Second Class GM-14695
## 5: 36178 CA-2014-143567 03/11/2014 06/11/2014 Second Class TB-21175
## 6: 31784 CA-2011-154627 29/10/2011 31/10/2011 First Class SA-20830
## Customer_Name Segment City State Country
## 1: Rick Hansen Consumer New York City New York United States
## 2: Jane Waco Corporate Sacramento California United States
## 3: Joseph Holt Consumer Concord North Carolina United States
## 4: Greg Maxwell Corporate Alexandria Virginia United States
## 5: Thomas Boland Corporate Henderson Kentucky United States
## 6: Sue Ann Reed Consumer Chicago Illinois United States
## Postal_Code Market Region Product_ID Category Sub-Category
## 1: 10024 US East TEC-AC-10003033 Technology Accessories
## 2: 95823 US West OFF-BI-10003527 Office Supplies Binders
## 3: 28027 US South FUR-TA-10000198 Furniture Tables
## 4: 22304 US South OFF-SU-10002881 Office Supplies Supplies
## 5: 42420 US South TEC-AC-10004145 Technology Accessories
## 6: 60610 US Central TEC-PH-10001363 Technology Phones
## Product_Name
## 1: Plantronics CS510 - Over-the-Head monaural Wireless Headset System
## 2: Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind
## 3: Chromcraft Bull-Nose Wood Oval Conference Tables & Bases
## 4: Martin Yale Chadless Opener Electric Letter Opener
## 5: Logitech diNovo Edge Keyboard
## 6: Apple iPhone 5S
## Sales Quantity Discount Profit Shipping_Cost Order_Priority
## 1: 2309.650 7 0 762.1845 933.57 Critical
## 2: 5083.960 5 0,2 1906.4850 867.69 Low
## 3: 4297.644 13 0,4 -1862.3124 865.74 Critical
## 4: 4164.050 5 0 83.2810 846.54 High
## 5: 2249.910 9 0 517.4793 780.70 Critical
## 6: 2735.952 6 0,2 341.9940 752.51 High
str(superstore)
## Classes 'data.table' and 'data.frame': 9994 obs. of 24 variables:
## $ Row_ID : int 32298 40155 40936 34577 36178 31784 37311 32735 32543 36423 ...
## $ Order_ID : chr "CA-2012-124891" "CA-2014-135909" "CA-2012-116638" "CA-2011-102988" ...
## $ Order_Date : chr "31/07/2012" "14/10/2014" "28/01/2012" "05/04/2011" ...
## $ Ship_Date : chr "31/07/2012" "21/10/2014" "31/01/2012" "09/04/2011" ...
## $ Ship_Mode : chr "Same Day" "Standard Class" "Second Class" "Second Class" ...
## $ Customer_ID : chr "RH-19495" "JW-15220" "JH-15985" "GM-14695" ...
## $ Customer_Name : chr "Rick Hansen" "Jane Waco" "Joseph Holt" "Greg Maxwell" ...
## $ Segment : chr "Consumer" "Corporate" "Consumer" "Corporate" ...
## $ City : chr "New York City" "Sacramento" "Concord" "Alexandria" ...
## $ State : chr "New York" "California" "North Carolina" "Virginia" ...
## $ Country : chr "United States" "United States" "United States" "United States" ...
## $ Postal_Code : int 10024 95823 28027 22304 42420 60610 90008 79109 93727 10009 ...
## $ Market : chr "US" "US" "US" "US" ...
## $ Region : chr "East" "West" "South" "South" ...
## $ Product_ID : chr "TEC-AC-10003033" "OFF-BI-10003527" "FUR-TA-10000198" "OFF-SU-10002881" ...
## $ Category : chr "Technology" "Office Supplies" "Furniture" "Office Supplies" ...
## $ Sub-Category : chr "Accessories" "Binders" "Tables" "Supplies" ...
## $ Product_Name : chr "Plantronics CS510 - Over-the-Head monaural Wireless Headset System" "Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind" "Chromcraft Bull-Nose Wood Oval Conference Tables & Bases" "Martin Yale Chadless Opener Electric Letter Opener" ...
## $ Sales : num 2310 5084 4298 4164 2250 ...
## $ Quantity : int 7 5 13 5 9 6 8 5 12 4 ...
## $ Discount : chr "0" "0,2" "0,4" "0" ...
## $ Profit : num 762.2 1906.5 -1862.3 83.3 517.5 ...
## $ Shipping_Cost : num 934 868 866 847 781 ...
## $ Order_Priority: chr "Critical" "Low" "Critical" "High" ...
## - attr(*, ".internal.selfref")=<externalptr>
Nous voyons que “Order Date” et “Ship Date” ne sont pas au format Date Remettons les dates au bon format mdy car on a pas l’habitude avec ce type de format où l’année vient en premier
# Convertion de la date de commande
superstore$`Order_Date` <- dmy(superstore$`Order_Date`)
# Convertion de la date d'envoie
superstore$`Ship_Date` <- dmy(superstore$`Ship_Date`)
# Visualisation de la structure de notre dataframe
str(superstore)
## Classes 'data.table' and 'data.frame': 9994 obs. of 24 variables:
## $ Row_ID : int 32298 40155 40936 34577 36178 31784 37311 32735 32543 36423 ...
## $ Order_ID : chr "CA-2012-124891" "CA-2014-135909" "CA-2012-116638" "CA-2011-102988" ...
## $ Order_Date : Date, format: "2012-07-31" "2014-10-14" ...
## $ Ship_Date : Date, format: "2012-07-31" "2014-10-21" ...
## $ Ship_Mode : chr "Same Day" "Standard Class" "Second Class" "Second Class" ...
## $ Customer_ID : chr "RH-19495" "JW-15220" "JH-15985" "GM-14695" ...
## $ Customer_Name : chr "Rick Hansen" "Jane Waco" "Joseph Holt" "Greg Maxwell" ...
## $ Segment : chr "Consumer" "Corporate" "Consumer" "Corporate" ...
## $ City : chr "New York City" "Sacramento" "Concord" "Alexandria" ...
## $ State : chr "New York" "California" "North Carolina" "Virginia" ...
## $ Country : chr "United States" "United States" "United States" "United States" ...
## $ Postal_Code : int 10024 95823 28027 22304 42420 60610 90008 79109 93727 10009 ...
## $ Market : chr "US" "US" "US" "US" ...
## $ Region : chr "East" "West" "South" "South" ...
## $ Product_ID : chr "TEC-AC-10003033" "OFF-BI-10003527" "FUR-TA-10000198" "OFF-SU-10002881" ...
## $ Category : chr "Technology" "Office Supplies" "Furniture" "Office Supplies" ...
## $ Sub-Category : chr "Accessories" "Binders" "Tables" "Supplies" ...
## $ Product_Name : chr "Plantronics CS510 - Over-the-Head monaural Wireless Headset System" "Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind" "Chromcraft Bull-Nose Wood Oval Conference Tables & Bases" "Martin Yale Chadless Opener Electric Letter Opener" ...
## $ Sales : num 2310 5084 4298 4164 2250 ...
## $ Quantity : int 7 5 13 5 9 6 8 5 12 4 ...
## $ Discount : chr "0" "0,2" "0,4" "0" ...
## $ Profit : num 762.2 1906.5 -1862.3 83.3 517.5 ...
## $ Shipping_Cost : num 934 868 866 847 781 ...
## $ Order_Priority: chr "Critical" "Low" "Critical" "High" ...
## - attr(*, ".internal.selfref")=<externalptr>
superstore$Profit = as.numeric(superstore$Profit)
superstore$Sales = as.numeric(superstore$Sales)
str(superstore)
## Classes 'data.table' and 'data.frame': 9994 obs. of 24 variables:
## $ Row_ID : int 32298 40155 40936 34577 36178 31784 37311 32735 32543 36423 ...
## $ Order_ID : chr "CA-2012-124891" "CA-2014-135909" "CA-2012-116638" "CA-2011-102988" ...
## $ Order_Date : Date, format: "2012-07-31" "2014-10-14" ...
## $ Ship_Date : Date, format: "2012-07-31" "2014-10-21" ...
## $ Ship_Mode : chr "Same Day" "Standard Class" "Second Class" "Second Class" ...
## $ Customer_ID : chr "RH-19495" "JW-15220" "JH-15985" "GM-14695" ...
## $ Customer_Name : chr "Rick Hansen" "Jane Waco" "Joseph Holt" "Greg Maxwell" ...
## $ Segment : chr "Consumer" "Corporate" "Consumer" "Corporate" ...
## $ City : chr "New York City" "Sacramento" "Concord" "Alexandria" ...
## $ State : chr "New York" "California" "North Carolina" "Virginia" ...
## $ Country : chr "United States" "United States" "United States" "United States" ...
## $ Postal_Code : int 10024 95823 28027 22304 42420 60610 90008 79109 93727 10009 ...
## $ Market : chr "US" "US" "US" "US" ...
## $ Region : chr "East" "West" "South" "South" ...
## $ Product_ID : chr "TEC-AC-10003033" "OFF-BI-10003527" "FUR-TA-10000198" "OFF-SU-10002881" ...
## $ Category : chr "Technology" "Office Supplies" "Furniture" "Office Supplies" ...
## $ Sub-Category : chr "Accessories" "Binders" "Tables" "Supplies" ...
## $ Product_Name : chr "Plantronics CS510 - Over-the-Head monaural Wireless Headset System" "Fellowes PB500 Electric Punch Plastic Comb Binding Machine with Manual Bind" "Chromcraft Bull-Nose Wood Oval Conference Tables & Bases" "Martin Yale Chadless Opener Electric Letter Opener" ...
## $ Sales : num 2310 5084 4298 4164 2250 ...
## $ Quantity : int 7 5 13 5 9 6 8 5 12 4 ...
## $ Discount : chr "0" "0,2" "0,4" "0" ...
## $ Profit : num 762.2 1906.5 -1862.3 83.3 517.5 ...
## $ Shipping_Cost : num 934 868 866 847 781 ...
## $ Order_Priority: chr "Critical" "Low" "Critical" "High" ...
## - attr(*, ".internal.selfref")=<externalptr>
# set `plotly = TRUE`
requete_1 <- superstore %>% group_by(Order_Date) %>%
summarise(Profit_journalier = round(sum(Profit),1)) %>%
mutate(Year = lubridate::year(Order_Date), Month = lubridate::month(Order_Date,label=TRUE,abbr=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
requete_1 <- requete_1 %>% group_by(Year, Month) %>%
summarise(total_ventes_mensuel = sum(Profit_journalier)) %>%
group_by(total_ventes_mensuel ,Year, Month)
## `summarise()` regrouping output by 'Year' (override with `.groups` argument)
# Graphe
requete_1_graphe <- requete_1 %>% ggplot(aes(Month, y = total_ventes_mensuel)) + geom_bar(stat = 'identity', aes(color = factor(Year))) +
facet_wrap(~ Year, ncol = 4) +
ggtitle("Profit Mensuel Pour Chaque Année")+
scale_colour_discrete(name = "Year") +
theme(axis.text.x = element_text(angle = 90), strip.background = element_rect(fill="orange"), plot.title = element_text(size = 15, hjust = 0.5, color = "red", face ="bold.italic"))
requete_1_graphe %>% ggplotly(session = "knitr")
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Solution de cette requête sur tableau software
supposons: si le profit est supérieur à 200 dollars, alors la journée est très rentable si le profit est inférieur à 0 dollars, alors la journée n’est pas rentable sinon est rentable entre 0 et 200 dollars
requete_2 <- superstore %>% group_by(Order_Date) %>%
summarise(Profit_journalier = sum(Profit)) %>%
mutate(Rentabilite = ifelse(Profit_journalier > 200, "Très Rentable",
ifelse(Profit_journalier < 0, "Non Rentable" , "Rentable")))
## `summarise()` ungrouping output (override with `.groups` argument)
requete_2 <- requete_2 %>% mutate(Year = lubridate::year(Order_Date), Month = lubridate::month(Order_Date,label=TRUE,abbr=TRUE)) %>% group_by(Rentabilite, Year, Month) %>% summarise(Count = n())
## `summarise()` regrouping output by 'Rentabilite', 'Year' (override with `.groups` argument)
# Graphe
requete_2_graphe <- requete_2 %>% ggplot(aes(Month, Count)) +
facet_grid(Rentabilite ~ Year) +
geom_bar(stat = "identity", aes(fill = Rentabilite)) +
ggtitle("Nombre de Jour Rentable dans le Mois") +
theme(axis.text.x = element_text(angle = 60, color = "red"), axis.text.y = element_text(color = "red"), plot.title = element_text(size = 15, hjust = 0.5, color = "red", face ="bold.italic"))
requete_2_graphe
Solution de cette requête sur tableau software
R
# Clients ayant commandé en 2014
client_en_2014 <- superstore %>%
filter(lubridate::year(Order_Date) == 2014)
# Clients 2013 ayant commandé en 2014
client_en_2013_dans_2014 <- superstore %>%
filter(lubridate::year(Order_Date) == 2013) %>%
filter(`Customer_ID` %in% unique(client_en_2014$`Customer_ID`))
# Clients ayant commandé en 2013 et pas en 2014
client_2013_et_pas_en_2014 <- superstore %>% filter(lubridate::year(`Order_Date`) == 2013) %>%
filter(!(`Customer_ID` %in% unique(client_en_2014$`Customer_ID`)))
# calcul du pourcentage
requete_3 <- length(unique(client_en_2013_dans_2014$`Customer_ID`)) / (length(unique(client_en_2013_dans_2014$`Customer_ID`)) + length(unique(client_2013_et_pas_en_2014$`Customer_ID`))) * 100
# Print
#message(paste0(round(requete_3, 2), "% des clients ont commandé en 2013 et également en 2014"))
message(paste0("Le pourcentage de client ayant commandé en 2013 et également en 2014 est de "), round(requete_3, 2), "%")
## Le pourcentage de client ayant commandé en 2013 et également en 2014 est de 87.44%
Solution de cette requête sur tableau software
SQL
req_3_sql <- sqldf("select count(distinct(Order_ID)) as nb_diff_order from superstore group by Customer_ID having nb_diff_order > 10") %>% nrow()
req_3_sql
## [1] 49
Résultat:On obtient bien 49 lignes.
R
req_4_R <- superstore %>% distinct(Order_ID,Customer_Name) %>%
group_by(Customer_Name) %>%
summarise(nb_diff_order = n()) %>%
filter(nb_diff_order > 10)
## `summarise()` ungrouping output (override with `.groups` argument)
client_ayant_plusde_10_commandes <- nrow(req_4_R)
client_ayant_plusde_10_commandes
## [1] 49
Résultat:On obtient également 49 lignes.
Solution de cette requête sur tableau software https://public.tableau.com/views/datasetcomplet1/Plusde10commandes?:language=fr&:display_count=y&publish=yes&:origin=viz_share_link
Sachant que une seule commande peut inclure plusieurs Catégories de produits. Chaque article est sur et sera sur une ligne de donnée différentes mais partage un même ID de commande.
# Récupérons notre Catégorie "Technology"
technologie = superstore %>% filter(Category =="Technology")
# Récupérons l'ID unique
Liste_des_IDS_technology = unique(technologie$Order_ID)
# récupération eds commandes qui possèdent comme catégorie Technologie
commandes_contenant_technologie = superstore %>% filter(Order_ID %in% Liste_des_IDS_technology)
# Requête
requete_42_R = sum(commandes_contenant_technologie$Sales)/(length(Liste_des_IDS_technology))
message(paste0("La valeur moyenne des ventes pour les commandes qui incluent la technologie comme Catégorie est : ", round(requete_42_R, 2)))
## La valeur moyenne des ventes pour les commandes qui incluent la technologie comme Catégorie est : 822
Solution de cette requête sur tableau software https://public.tableau.com/views/ValeurMoyennedesVentespourlesCommandesincluantlaTechnologieDatasetsuperstore/ValeurMoyennedesVentespourlesCommandesincluantlaTechnologie?:language=fr&:display_count=y&:origin=viz_share_link
R
requete_5 <- superstore %>% filter(lubridate::year(Order_Date) == 2013, lubridate::month(Order_Date) == 3) %>%
group_by(Order_Date) %>%
summarise(somme_profit = sum(Profit)) %>%
mutate(Profit_positif = ifelse(somme_profit > 0, 'Oui','Non')) %>%
group_by(Profit_positif) %>% summarise(count = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
requete_5
## # A tibble: 2 x 2
## Profit_positif count
## <chr> <int>
## 1 Non 8
## 2 Oui 20
# Graphe
requete_5_graphe <- requete_5 %>% ggplot(aes(Profit_positif, count, fill = Profit_positif)) +
geom_bar(stat = "identity") +
geom_text(aes(label = count, hjust = 0, vjust = 0.5)) +
ggtitle("Le nombre de jour en Mars 2013 ayant des Profits positifs et négatifs") +
xlab("Profit Positif?") +
ylab("Nombre de jour") +
scale_fill_manual(values = c("Non" = "red", "Oui" = "darkgreen")) +
theme(plot.title = element_text(size = 15, hjust = 0.5, color = "red")) +
theme_bw()
requete_5_graphe
Solution de cette requête sur tableau software
SQL
req_6_sql <- sqldf("select count(distinct(Order_ID)) as nombre_de_commande from superstore group by Customer_ID")
req_6_sql <- sqldf("select count(*), nombre_de_commande from req_6_sql group by nombre_de_commande")
req_6_sql
## count(*) nombre_de_commande
## 1 12 1
## 2 34 2
## 3 53 3
## 4 96 4
## 5 134 5
## 6 107 6
## 7 116 7
## 8 82 8
## 9 71 9
## 10 39 10
## 11 23 11
## 12 18 12
## 13 7 13
## 14 1 17
R
requete_6_R <- superstore %>% distinct(Customer_ID,Order_ID) %>%
group_by(Customer_ID) %>%
summarise(nombre_de_commande = n()) %>%
group_by(nombre_de_commande) %>%
summarise(nombre_de_client= n())
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
requete_6_R
## # A tibble: 14 x 2
## nombre_de_commande nombre_de_client
## <int> <int>
## 1 1 12
## 2 2 34
## 3 3 53
## 4 4 96
## 5 5 134
## 6 6 107
## 7 7 116
## 8 8 82
## 9 9 71
## 10 10 39
## 11 11 23
## 12 12 18
## 13 13 7
## 14 17 1
# Graphe
requete_6_graphe <- requete_6_R %>% ggplot(aes(nombre_de_commande, nombre_de_client)) +
geom_bar(stat = "identity", fill = "#0066FF") +
geom_text(aes(label = nombre_de_client, hjust = 0.5, color = "white")) +
ggtitle("Combien de Client ont fait 1, 2, 3, N Commandes?") +
xlab("Nombre de commande par client") +
ylab("Nombre de client") +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 15, hjust = 0.5, color = "darkred"))
requete_6_graphe
Solution de cette requête sur tableau software
SQL
req_7_sql <- sqldf("select round((sum(Sales) * 100)/ (select sum(Sales) from superstore),2) as somme_sales,State from superstore group by State order by somme_sales desc limit 5")
req_7_sql
## somme_sales State
## 1 19.92 California
## 2 13.53 New York
## 3 7.41 Texas
## 4 6.04 Washington
## 5 5.07 Pennsylvania
R
# Total des ventes dans le dataset
Total_des_ventes <- sum(superstore$Sales)
message(paste0("Le total des ventes dans le dataset est de: "),Total_des_ventes, " dollars")
## Le total des ventes dans le dataset est de: 2297200.8603 dollars
# Pourcentage des ventes par Etat (top 5)
req_7_R <- superstore %>% select(State, Sales) %>%
group_by(State) %>%
summarise(pourcentage_sales = round(sum(Sales) * 100 / Total_des_ventes,2)) %>%
arrange(desc(pourcentage_sales)) %>%
head(5)
## `summarise()` ungrouping output (override with `.groups` argument)
req_7_R
## # A tibble: 5 x 2
## State pourcentage_sales
## <chr> <dbl>
## 1 California 19.9
## 2 New York 13.5
## 3 Texas 7.41
## 4 Washington 6.04
## 5 Pennsylvania 5.07
Solution de cette requête sur tableau software
req_8_R <- superstore %>% inner_join(superstore %>%
group_by(Customer_ID) %>%
summarise(annee_subscription = lubridate::year(min(Order_Date))), by = "Customer_ID") %>%
mutate(annee_subscription = as.factor(annee_subscription)) %>%
group_by(annee_commande = lubridate::year(Order_Date), annee_subscription) %>%
summarise(Total_Sales = round(sum(Sales), 0))
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'annee_commande' (override with `.groups` argument)
# Graphe
req_8_R <- req_8_R %>% ggplot(aes(x = annee_commande, y = Total_Sales, fill = annee_subscription )) +
geom_bar(stat = "identity",color = 'skyblue') +
geom_text(aes(label = Total_Sales), color = "white", hjust = 0.5) +
scale_fill_manual(values = c("2011" = "#0066FF", "2012" = "darkorange","2013" = "red", "2014" = "#00FF33")) +
xlab("Order Date") + ylab("Sales") + ggtitle("Revenue Annuel de chaque Cohorte de Client (Analyse Cohorte)") +
theme(plot.title = element_text(size = 15, hjust = 0.5, color = "red")) +
theme_bw()
# Afficher le Graphe
req_8_R
Solution de cette requête sur tableau software
On peut voir par exemple que en 2014, Les clients qui ont rejoint en 2011 ont généré 533$ de revenue, et ceux de 2012, 131$, ..
# Les le Top 5 des vendeurs par Etat
top_5_Etat_par_vente <- superstore %>% group_by(State) %>%
summarise(somme_vente = sum(Sales)) %>%
arrange(desc(somme_vente)) %>%
head(5) %>% .$State
## `summarise()` ungrouping output (override with `.groups` argument)
# Requête
req_9_R = superstore %>% inner_join(superstore %>% group_by(Customer_ID) %>%
summarise(premiere_date_commande = min(Order_Date)), by = "Customer_ID") %>%
mutate(Existant = ifelse(Order_Date == premiere_date_commande, "Oui", "NON")) %>%
filter(Existant == "Oui") %>%
filter(State %in% top_5_Etat_par_vente) %>%
distinct(Customer_ID, .keep_all = TRUE) %>%
group_by(State,Order_Date) %>%
summarise(nouveau_clients = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'State' (override with `.groups` argument)
# Créons un nouveau DataFrame
new_dataframe = data_frame()
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
for(etat in top_5_Etat_par_vente ){
variable = req_9_R %>% filter(State == etat)
variable$nouveau_clients = cumsum(variable$nouveau_clients)
new_dataframe = bind_rows(new_dataframe, variable)
}
# Affichons le graphe
graphe <- new_dataframe %>% ggplot(aes(Order_Date, nouveau_clients, color = State)) + geom_line() +
ggtitle("Trend: Nombre de Clients Acquis par Jour et par État pour les cinq principaux États en termes de Ventes") +
theme(plot.title = element_text(size = 10, hjust = 0.5, color = "red"))
graphe
Solution de cette requête sur tableau software