Description

This project provides customer behaviour to predict customer to buy or not to buy a product. The dataset for the modelling HERE

The project is structured as follows :

  1. Data Understanding

  2. Data Visualizations / EDA

  3. Data Preprocessing

  4. Modeling

  5. Performance Evaluations

1. Data Understanding

rm(list = ls())

library(ggplot2)

# install.packages("readr")

library(readr)
customerbehaviour <- read.csv("Customer_Behaviour.csv")

str(customerbehaviour)
## 'data.frame':    400 obs. of  5 variables:
##  $ User.ID        : int  15624510 15810944 15668575 15603246 15804002 15728773 15598044 15694829 15600575 15727311 ...
##  $ Gender         : chr  "Male" "Male" "Female" "Female" ...
##  $ Age            : int  19 35 26 27 19 27 27 32 25 35 ...
##  $ EstimatedSalary: int  19000 20000 43000 57000 76000 58000 84000 150000 33000 65000 ...
##  $ Purchased      : int  0 0 0 0 0 0 0 1 0 0 ...

The dataset contains column User.ID can ignored and column Purchased (0, 1) changed to the labels (“Not Buy”, “Buy”) The target variable is Purchased

2. Data Visualization / EDA

2.1 Univariate Data Analysis

Plot distribution of Purchased with density

ggplot(customerbehaviour,
       aes(x = Age,
           fill = Purchased)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.