This project provides customer behaviour to predict customer to buy or not to buy a product. The dataset for the modelling HERE
The project is structured as follows :
Data Understanding
Data Visualizations / EDA
Data Preprocessing
Modeling
Performance Evaluations
rm(list = ls())
library(ggplot2)
# install.packages("readr")
library(readr)
customerbehaviour <- read.csv("Customer_Behaviour.csv")
str(customerbehaviour)
## 'data.frame': 400 obs. of 5 variables:
## $ User.ID : int 15624510 15810944 15668575 15603246 15804002 15728773 15598044 15694829 15600575 15727311 ...
## $ Gender : chr "Male" "Male" "Female" "Female" ...
## $ Age : int 19 35 26 27 19 27 27 32 25 35 ...
## $ EstimatedSalary: int 19000 20000 43000 57000 76000 58000 84000 150000 33000 65000 ...
## $ Purchased : int 0 0 0 0 0 0 0 1 0 0 ...
The dataset contains column User.ID can ignored and column Purchased (0, 1) changed to the labels (“Not Buy”, “Buy”) The target variable is Purchased
Plot distribution of Purchased with density
ggplot(customerbehaviour,
aes(x = Age,
fill = Purchased)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.