CarreFour Kenya Sale Increment Strategies.

1. Research Question

Carrefour Kenya and are currently undertaking a project that will inform the marketing department on the most relevant marketing strategies that will result in the highest no. of sales (total price including tax).This project is aimed at doing analysis on the dataset provided by carrefour and create insights on how to achieve highest sales.

2. Metric of Success

Achieving a set of variables that are important and influences the sales variable.

3. Understanding the context.

CarreFour is an International chain of retail supemarkets in the world, It was set up in Kenya in the year 2016 and has been performing well over the years. This project is aimed at creating insights from existing and current trends to develop marketing strategies that will enable the marketing team achieve higher sales.

4. Recording the Experimental Design

  1. Data Loading
  2. Data Cleaning and preprocessing
  3. Exploratory Data Analysis
  4. Implementation o solution.
  5. Recommendations and Conclusions.

5. Data Relevance.

The provided data is relevant for this study since it’s been sourced from CarreFour database and is a reflection of current transactions.

Data Preview

Loading the libraries

library(modelr)
library(broom)
## 
## Attaching package: 'broom'
## The following object is masked from 'package:modelr':
## 
##     bootstrap
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
library(rpart)
library(ggplot2)
library(Amelia)
## Loading required package: Rcpp
## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.0, built: 2021-05-26)
## ## Copyright (C) 2005-2022 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble  3.1.6     v purrr   0.3.4
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x data.table::between() masks dplyr::between()
## x broom::bootstrap()    masks modelr::bootstrap()
## x dplyr::filter()       masks stats::filter()
## x data.table::first()   masks dplyr::first()
## x dplyr::lag()          masks stats::lag()
## x data.table::last()    masks dplyr::last()
## x purrr::lift()         masks caret::lift()
## x purrr::transpose()    masks data.table::transpose()

Loading the data

carrefour <- fread('http://bit.ly/CarreFourDataset')
carrefour
##        Invoice ID Branch Customer type Gender           Product line Unit price
##    1: 750-67-8428      A        Member Female      Health and beauty      74.69
##    2: 226-31-3081      C        Normal Female Electronic accessories      15.28
##    3: 631-41-3108      A        Normal   Male     Home and lifestyle      46.33
##    4: 123-19-1176      A        Member   Male      Health and beauty      58.22
##    5: 373-73-7910      A        Normal   Male      Sports and travel      86.31
##   ---                                                                          
##  996: 233-67-5758      C        Normal   Male      Health and beauty      40.35
##  997: 303-96-2227      B        Normal Female     Home and lifestyle      97.38
##  998: 727-02-1313      A        Member   Male     Food and beverages      31.84
##  999: 347-56-2442      A        Normal   Male     Home and lifestyle      65.82
## 1000: 849-09-3807      A        Member Female    Fashion accessories      88.34
##       Quantity     Tax      Date  Time     Payment   cogs
##    1:        7 26.1415  1/5/2019 13:08     Ewallet 522.83
##    2:        5  3.8200  3/8/2019 10:29        Cash  76.40
##    3:        7 16.2155  3/3/2019 13:23 Credit card 324.31
##    4:        8 23.2880 1/27/2019 20:33     Ewallet 465.76
##    5:        7 30.2085  2/8/2019 10:37     Ewallet 604.17
##   ---                                                    
##  996:        1  2.0175 1/29/2019 13:46     Ewallet  40.35
##  997:       10 48.6900  3/2/2019 17:16     Ewallet 973.80
##  998:        1  1.5920  2/9/2019 13:22        Cash  31.84
##  999:        1  3.2910 2/22/2019 15:33        Cash  65.82
## 1000:        7 30.9190 2/18/2019 13:28        Cash 618.38
##       gross margin percentage gross income Rating     Total
##    1:                4.761905      26.1415    9.1  548.9715
##    2:                4.761905       3.8200    9.6   80.2200
##    3:                4.761905      16.2155    7.4  340.5255
##    4:                4.761905      23.2880    8.4  489.0480
##    5:                4.761905      30.2085    5.3  634.3785
##   ---                                                      
##  996:                4.761905       2.0175    6.2   42.3675
##  997:                4.761905      48.6900    4.4 1022.4900
##  998:                4.761905       1.5920    7.7   33.4320
##  999:                4.761905       3.2910    4.1   69.1110
## 1000:                4.761905      30.9190    6.6  649.2990

Checking on the first 6 entries

head(carrefour, 6)
##     Invoice ID Branch Customer type Gender           Product line Unit price
## 1: 750-67-8428      A        Member Female      Health and beauty      74.69
## 2: 226-31-3081      C        Normal Female Electronic accessories      15.28
## 3: 631-41-3108      A        Normal   Male     Home and lifestyle      46.33
## 4: 123-19-1176      A        Member   Male      Health and beauty      58.22
## 5: 373-73-7910      A        Normal   Male      Sports and travel      86.31
## 6: 699-14-3026      C        Normal   Male Electronic accessories      85.39
##    Quantity     Tax      Date  Time     Payment   cogs gross margin percentage
## 1:        7 26.1415  1/5/2019 13:08     Ewallet 522.83                4.761905
## 2:        5  3.8200  3/8/2019 10:29        Cash  76.40                4.761905
## 3:        7 16.2155  3/3/2019 13:23 Credit card 324.31                4.761905
## 4:        8 23.2880 1/27/2019 20:33     Ewallet 465.76                4.761905
## 5:        7 30.2085  2/8/2019 10:37     Ewallet 604.17                4.761905
## 6:        7 29.8865 3/25/2019 18:30     Ewallet 597.73                4.761905
##    gross income Rating    Total
## 1:      26.1415    9.1 548.9715
## 2:       3.8200    9.6  80.2200
## 3:      16.2155    7.4 340.5255
## 4:      23.2880    8.4 489.0480
## 5:      30.2085    5.3 634.3785
## 6:      29.8865    4.1 627.6165

checking on last 6 entries

tail(carrefour, 6)
##     Invoice ID Branch Customer type Gender           Product line Unit price
## 1: 652-49-6720      C        Member Female Electronic accessories      60.95
## 2: 233-67-5758      C        Normal   Male      Health and beauty      40.35
## 3: 303-96-2227      B        Normal Female     Home and lifestyle      97.38
## 4: 727-02-1313      A        Member   Male     Food and beverages      31.84
## 5: 347-56-2442      A        Normal   Male     Home and lifestyle      65.82
## 6: 849-09-3807      A        Member Female    Fashion accessories      88.34
##    Quantity     Tax      Date  Time Payment   cogs gross margin percentage
## 1:        1  3.0475 2/18/2019 11:40 Ewallet  60.95                4.761905
## 2:        1  2.0175 1/29/2019 13:46 Ewallet  40.35                4.761905
## 3:       10 48.6900  3/2/2019 17:16 Ewallet 973.80                4.761905
## 4:        1  1.5920  2/9/2019 13:22    Cash  31.84                4.761905
## 5:        1  3.2910 2/22/2019 15:33    Cash  65.82                4.761905
## 6:        7 30.9190 2/18/2019 13:28    Cash 618.38                4.761905
##    gross income Rating     Total
## 1:       3.0475    5.9   63.9975
## 2:       2.0175    6.2   42.3675
## 3:      48.6900    4.4 1022.4900
## 4:       1.5920    7.7   33.4320
## 5:       3.2910    4.1   69.1110
## 6:      30.9190    6.6  649.2990

checking on data types

str(carrefour)
## Classes 'data.table' and 'data.frame':   1000 obs. of  16 variables:
##  $ Invoice ID             : chr  "750-67-8428" "226-31-3081" "631-41-3108" "123-19-1176" ...
##  $ Branch                 : chr  "A" "C" "A" "A" ...
##  $ Customer type          : chr  "Member" "Normal" "Normal" "Member" ...
##  $ Gender                 : chr  "Female" "Female" "Male" "Male" ...
##  $ Product line           : chr  "Health and beauty" "Electronic accessories" "Home and lifestyle" "Health and beauty" ...
##  $ Unit price             : num  74.7 15.3 46.3 58.2 86.3 ...
##  $ Quantity               : int  7 5 7 8 7 7 6 10 2 3 ...
##  $ Tax                    : num  26.14 3.82 16.22 23.29 30.21 ...
##  $ Date                   : chr  "1/5/2019" "3/8/2019" "3/3/2019" "1/27/2019" ...
##  $ Time                   : chr  "13:08" "10:29" "13:23" "20:33" ...
##  $ Payment                : chr  "Ewallet" "Cash" "Credit card" "Ewallet" ...
##  $ cogs                   : num  522.8 76.4 324.3 465.8 604.2 ...
##  $ gross margin percentage: num  4.76 4.76 4.76 4.76 4.76 ...
##  $ gross income           : num  26.14 3.82 16.22 23.29 30.21 ...
##  $ Rating                 : num  9.1 9.6 7.4 8.4 5.3 4.1 5.8 8 7.2 5.9 ...
##  $ Total                  : num  549 80.2 340.5 489 634.4 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Checking on dataset description

summary(carrefour)
##   Invoice ID           Branch          Customer type         Gender         
##  Length:1000        Length:1000        Length:1000        Length:1000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Product line         Unit price       Quantity          Tax         
##  Length:1000        Min.   :10.08   Min.   : 1.00   Min.   : 0.5085  
##  Class :character   1st Qu.:32.88   1st Qu.: 3.00   1st Qu.: 5.9249  
##  Mode  :character   Median :55.23   Median : 5.00   Median :12.0880  
##                     Mean   :55.67   Mean   : 5.51   Mean   :15.3794  
##                     3rd Qu.:77.94   3rd Qu.: 8.00   3rd Qu.:22.4453  
##                     Max.   :99.96   Max.   :10.00   Max.   :49.6500  
##      Date               Time             Payment               cogs       
##  Length:1000        Length:1000        Length:1000        Min.   : 10.17  
##  Class :character   Class :character   Class :character   1st Qu.:118.50  
##  Mode  :character   Mode  :character   Mode  :character   Median :241.76  
##                                                           Mean   :307.59  
##                                                           3rd Qu.:448.90  
##                                                           Max.   :993.00  
##  gross margin percentage  gross income         Rating           Total        
##  Min.   :4.762           Min.   : 0.5085   Min.   : 4.000   Min.   :  10.68  
##  1st Qu.:4.762           1st Qu.: 5.9249   1st Qu.: 5.500   1st Qu.: 124.42  
##  Median :4.762           Median :12.0880   Median : 7.000   Median : 253.85  
##  Mean   :4.762           Mean   :15.3794   Mean   : 6.973   Mean   : 322.97  
##  3rd Qu.:4.762           3rd Qu.:22.4453   3rd Qu.: 8.500   3rd Qu.: 471.35  
##  Max.   :4.762           Max.   :49.6500   Max.   :10.000   Max.   :1042.65

checking the size/shape of a dataframe

dim(carrefour)
## [1] 1000   16

Data Preprocessing.

i. Completeness

This is achieved by checking for missing values if any imputed to ensure correct predictions are made.

is.null(carrefour)
## [1] FALSE
Total number of null valuesin dataset
total_null <- sum(is.na(carrefour))
total_null
## [1] 0

ii. Consistency.

Consistency is achieved when all the duplicated rows are done away with.

duplicated_rows <- carrefour[duplicated(carrefour), ]
duplicated_rows
## Empty data.table (0 rows and 16 cols): Invoice ID,Branch,Customer type,Gender,Product line,Unit price...
anyDuplicated(carrefour)
## [1] 0

iii. Relevance.

Relevance is achieved by ensuring all the features provided for the analysis are relevant to the objective Which in this case all provided features are.

iv. Accuracy.

Checking that all entries are correct.

Outliers

We can visualize any outliers in a dataset using boxplots

colnames(carrefour)
##  [1] "Invoice ID"              "Branch"                 
##  [3] "Customer type"           "Gender"                 
##  [5] "Product line"            "Unit price"             
##  [7] "Quantity"                "Tax"                    
##  [9] "Date"                    "Time"                   
## [11] "Payment"                 "cogs"                   
## [13] "gross margin percentage" "gross income"           
## [15] "Rating"                  "Total"
Renaming column names
# Rename column where names 
names(carrefour)[names(carrefour) == "Invoice ID"] <- "Invoice_ID"
names(carrefour)[names(carrefour) == "Customer type"] <- "Customer_type"
names(carrefour)[names(carrefour) == "Product line"] <- "Product_line"
names(carrefour)[names(carrefour) == "gross margin percentage"] <- "gross_margin_percentage"
names(carrefour)[names(carrefour) == "Unit price"] <- "Unit_price"
names(carrefour)[names(carrefour) == "gross income"] <- "gross_income"
colnames(carrefour)
##  [1] "Invoice_ID"              "Branch"                 
##  [3] "Customer_type"           "Gender"                 
##  [5] "Product_line"            "Unit_price"             
##  [7] "Quantity"                "Tax"                    
##  [9] "Date"                    "Time"                   
## [11] "Payment"                 "cogs"                   
## [13] "gross_margin_percentage" "gross_income"           
## [15] "Rating"                  "Total"
str(carrefour)
## Classes 'data.table' and 'data.frame':   1000 obs. of  16 variables:
##  $ Invoice_ID             : chr  "750-67-8428" "226-31-3081" "631-41-3108" "123-19-1176" ...
##  $ Branch                 : chr  "A" "C" "A" "A" ...
##  $ Customer_type          : chr  "Member" "Normal" "Normal" "Member" ...
##  $ Gender                 : chr  "Female" "Female" "Male" "Male" ...
##  $ Product_line           : chr  "Health and beauty" "Electronic accessories" "Home and lifestyle" "Health and beauty" ...
##  $ Unit_price             : num  74.7 15.3 46.3 58.2 86.3 ...
##  $ Quantity               : int  7 5 7 8 7 7 6 10 2 3 ...
##  $ Tax                    : num  26.14 3.82 16.22 23.29 30.21 ...
##  $ Date                   : chr  "1/5/2019" "3/8/2019" "3/3/2019" "1/27/2019" ...
##  $ Time                   : chr  "13:08" "10:29" "13:23" "20:33" ...
##  $ Payment                : chr  "Ewallet" "Cash" "Credit card" "Ewallet" ...
##  $ cogs                   : num  522.8 76.4 324.3 465.8 604.2 ...
##  $ gross_margin_percentage: num  4.76 4.76 4.76 4.76 4.76 ...
##  $ gross_income           : num  26.14 3.82 16.22 23.29 30.21 ...
##  $ Rating                 : num  9.1 9.6 7.4 8.4 5.3 4.1 5.8 8 7.2 5.9 ...
##  $ Total                  : num  549 80.2 340.5 489 634.4 ...
##  - attr(*, ".internal.selfref")=<externalptr>
  1. Unit Price.
a <- carrefour$Unit_price
boxplot(a)

b.Quantity

quantity <- carrefour$Quantity
boxplot(quantity)

  1. cogs
cogs <- carrefour$cogs
boxplot(cogs)

  1. Gross margin percentage
b <- carrefour$gross_margin_percentage
boxplot(b)

  1. Gross income
gross_income <- carrefour$gross_income
boxplot(gross_income)

  1. Rating
rating <- carrefour$Rating 
boxplot(rating)

  1. Total
total <- carrefour$Total 
boxplot(total)

The total and gross income have outliers .

To see the number of outliers

Gross Income

a <- carrefour$gross_income
boxplot.stats(a)$out
## [1] 47.790 49.490 49.650 47.720 48.605 49.260 48.750 48.685 48.690

The outlier entries are 9.

Total

a <- carrefour$Total
boxplot.stats(a)$out
## [1] 1003.590 1039.290 1042.650 1002.120 1020.705 1034.460 1023.750 1022.385
## [9] 1022.490

The outlier entries are 9.

Exploratory Data Analysis.

Univariate Analysis.

  1. Unit Price
mean(carrefour$Unit_price, trim = 0, na.rm=FALSE)
## [1] 55.67213
median(carrefour$Unit_price,na.rm=FALSE)
## [1] 55.23
range(carrefour$Unit_price,na.rm=FALSE, finite=FALSE)
## [1] 10.08 99.96
quantile(carrefour$Unit_price, probs=seq(0, 1,0.25), na.rm=FALSE, names=TRUE, type=7)
##     0%    25%    50%    75%   100% 
## 10.080 32.875 55.230 77.935 99.960
var(carrefour$Unit_price)
## [1] 701.9653
sd(carrefour$Unit_price,na.rm=FALSE)
## [1] 26.49463

mode

getmode <- function(v){
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
unit_price_mode <- getmode(carrefour$Unit_price)
unit_price_mode
## [1] 83.77

Visualizing Unit Price.

d<-hist(carrefour$Unit_price, breaks=10, col="red", xlab="Unit price",main="Unit price")

plot(d)

The highest unit prices are between 20-30, 70-80 and 90-100.

head(carrefour)
##     Invoice_ID Branch Customer_type Gender           Product_line Unit_price
## 1: 750-67-8428      A        Member Female      Health and beauty      74.69
## 2: 226-31-3081      C        Normal Female Electronic accessories      15.28
## 3: 631-41-3108      A        Normal   Male     Home and lifestyle      46.33
## 4: 123-19-1176      A        Member   Male      Health and beauty      58.22
## 5: 373-73-7910      A        Normal   Male      Sports and travel      86.31
## 6: 699-14-3026      C        Normal   Male Electronic accessories      85.39
##    Quantity     Tax      Date  Time     Payment   cogs gross_margin_percentage
## 1:        7 26.1415  1/5/2019 13:08     Ewallet 522.83                4.761905
## 2:        5  3.8200  3/8/2019 10:29        Cash  76.40                4.761905
## 3:        7 16.2155  3/3/2019 13:23 Credit card 324.31                4.761905
## 4:        8 23.2880 1/27/2019 20:33     Ewallet 465.76                4.761905
## 5:        7 30.2085  2/8/2019 10:37     Ewallet 604.17                4.761905
## 6:        7 29.8865 3/25/2019 18:30     Ewallet 597.73                4.761905
##    gross_income Rating    Total
## 1:      26.1415    9.1 548.9715
## 2:       3.8200    9.6  80.2200
## 3:      16.2155    7.4 340.5255
## 4:      23.2880    8.4 489.0480
## 5:      30.2085    5.3 634.3785
## 6:      29.8865    4.1 627.6165
  1. Quantity
hist(carrefour$Quantity, breaks=12, col="skyblue",xlab="Quantity", main='Quantity of Products')

Most product quantities bought are 1 or 2.

  1. Tax.
d <- density(carrefour$Tax, xlab="Tax")
## Warning: In density.default(carrefour$Tax, xlab = "Tax") :
##  extra argument 'xlab' will be disregarded
plot(d)

  1. cogs
cogs <- hist(carrefour$cogs, xlab="cogs")

plot(cogs)

The highest number of cogs is at zero but the occurence reduces as the value of cogs increases.

  1. gross margin percentage
#plot(gross_margin)

The gross margin percentage range between 2 to 8.

head(carrefour)
##     Invoice_ID Branch Customer_type Gender           Product_line Unit_price
## 1: 750-67-8428      A        Member Female      Health and beauty      74.69
## 2: 226-31-3081      C        Normal Female Electronic accessories      15.28
## 3: 631-41-3108      A        Normal   Male     Home and lifestyle      46.33
## 4: 123-19-1176      A        Member   Male      Health and beauty      58.22
## 5: 373-73-7910      A        Normal   Male      Sports and travel      86.31
## 6: 699-14-3026      C        Normal   Male Electronic accessories      85.39
##    Quantity     Tax      Date  Time     Payment   cogs gross_margin_percentage
## 1:        7 26.1415  1/5/2019 13:08     Ewallet 522.83                4.761905
## 2:        5  3.8200  3/8/2019 10:29        Cash  76.40                4.761905
## 3:        7 16.2155  3/3/2019 13:23 Credit card 324.31                4.761905
## 4:        8 23.2880 1/27/2019 20:33     Ewallet 465.76                4.761905
## 5:        7 30.2085  2/8/2019 10:37     Ewallet 604.17                4.761905
## 6:        7 29.8865 3/25/2019 18:30     Ewallet 597.73                4.761905
##    gross_income Rating    Total
## 1:      26.1415    9.1 548.9715
## 2:       3.8200    9.6  80.2200
## 3:      16.2155    7.4 340.5255
## 4:      23.2880    8.4 489.0480
## 5:      30.2085    5.3 634.3785
## 6:      29.8865    4.1 627.6165

Bivariate Analysis.

Covariance Covariance is the statistical representation of the degree to which two variables vary from each other.

Covariance.

carrefour_cov <- carrefour[,c(6,7,8,12,14,16)]
cov(carrefour_cov)
##                Unit_price    Quantity        Tax       cogs gross_income
## Unit_price    701.9653313   0.8347785  196.66834  3933.3668    196.66834
## Quantity        0.8347785   8.5464464   24.14957   482.9914     24.14957
## Tax           196.6683401  24.1495704  137.09659  2741.9319    137.09659
## cogs         3933.3668019 482.9914076 2741.93188 54838.6377   2741.93188
## gross_income  196.6683401  24.1495704  137.09659  2741.9319    137.09659
## Total        4130.0351420 507.1409780 2879.02848 57580.5695   2879.02848
##                  Total
## Unit_price    4130.035
## Quantity       507.141
## Tax           2879.028
## cogs         57580.570
## gross_income  2879.028
## Total        60459.598

Correlation.

carrefour.cor <- cor(carrefour_cov, method=c('spearman'))

visualizing

#install.packages('corrplot')
library(corrplot)
## corrplot 0.92 loaded
corrplot(carrefour.cor)

cogs,gross income, tax and total are highly correlated to each other.

IMPLEMENTATION.

  1. Feature Selection.
carrefour_1 <- carrefour
head(carrefour_1)
##     Invoice_ID Branch Customer_type Gender           Product_line Unit_price
## 1: 750-67-8428      A        Member Female      Health and beauty      74.69
## 2: 226-31-3081      C        Normal Female Electronic accessories      15.28
## 3: 631-41-3108      A        Normal   Male     Home and lifestyle      46.33
## 4: 123-19-1176      A        Member   Male      Health and beauty      58.22
## 5: 373-73-7910      A        Normal   Male      Sports and travel      86.31
## 6: 699-14-3026      C        Normal   Male Electronic accessories      85.39
##    Quantity     Tax      Date  Time     Payment   cogs gross_margin_percentage
## 1:        7 26.1415  1/5/2019 13:08     Ewallet 522.83                4.761905
## 2:        5  3.8200  3/8/2019 10:29        Cash  76.40                4.761905
## 3:        7 16.2155  3/3/2019 13:23 Credit card 324.31                4.761905
## 4:        8 23.2880 1/27/2019 20:33     Ewallet 465.76                4.761905
## 5:        7 30.2085  2/8/2019 10:37     Ewallet 604.17                4.761905
## 6:        7 29.8865 3/25/2019 18:30     Ewallet 597.73                4.761905
##    gross_income Rating    Total
## 1:      26.1415    9.1 548.9715
## 2:       3.8200    9.6  80.2200
## 3:      16.2155    7.4 340.5255
## 4:      23.2880    8.4 489.0480
## 5:      30.2085    5.3 634.3785
## 6:      29.8865    4.1 627.6165

1. Filter Method.

Applies a metric to assign a score to each feature, then features are then ranked by the score. It uses a correlation function therefore removes redundancy using correlation.

carrefour_2 <- carrefour_1[,c(6,7,8,12,14,15,16)] 
head(carrefour_2)
##    Unit_price Quantity     Tax   cogs gross_income Rating    Total
## 1:      74.69        7 26.1415 522.83      26.1415    9.1 548.9715
## 2:      15.28        5  3.8200  76.40       3.8200    9.6  80.2200
## 3:      46.33        7 16.2155 324.31      16.2155    7.4 340.5255
## 4:      58.22        8 23.2880 465.76      23.2880    8.4 489.0480
## 5:      86.31        7 30.2085 604.17      30.2085    5.3 634.3785
## 6:      85.39        7 29.8865 597.73      29.8865    4.1 627.6165

Libraries

library(caret)
library(corrplot)

Calculating the correlation matrix

correlationMatrix <- cor(carrefour_2)
# Find attributes that are highly correlated
highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.75)
head(highlyCorrelated)
## [1] 4 7 3
head(carrefour_2)
##    Unit_price Quantity     Tax   cogs gross_income Rating    Total
## 1:      74.69        7 26.1415 522.83      26.1415    9.1 548.9715
## 2:      15.28        5  3.8200  76.40       3.8200    9.6  80.2200
## 3:      46.33        7 16.2155 324.31      16.2155    7.4 340.5255
## 4:      58.22        8 23.2880 465.76      23.2880    8.4 489.0480
## 5:      86.31        7 30.2085 604.17      30.2085    5.3 634.3785
## 6:      85.39        7 29.8865 597.73      29.8865    4.1 627.6165

Variables that are highly correlated are at index 4,7 and 3 they include; tax, cogs and total.

We will remove these variables then compare the results graphically.

# Removing Redundant Features 
carrefour_22 <-carrefour_2[,-c(3,4,7)]
head(carrefour_22)
##    Unit_price Quantity gross_income Rating
## 1:      74.69        7      26.1415    9.1
## 2:      15.28        5       3.8200    9.6
## 3:      46.33        7      16.2155    7.4
## 4:      58.22        8      23.2880    8.4
## 5:      86.31        7      30.2085    5.3
## 6:      85.39        7      29.8865    4.1
correlationmatrix <- cor(carrefour_22)
# Performing our graphical comparison
par(mfrow = c(1, 2))
corrplot(correlationMatrix, order = "hclust")
corrplot(correlationmatrix, order = "hclust")

Conclusion.

Important features for this research are unit price, quantity, gross income and rating.