title: "STIS3023 GROUP PROJECT - MALL CUSTOMER ANALYSIS" author: "WONG CHEE KIN & MUHAMMAD DANIEL FIRDAUS BIN MAZILAN" date: '2022-07-20' output: pdfdocument: default htmldocument: theme: cerulean ---

1.0 PROJECT INFORMATION

1.1 Project Title

MALL CUSTOMER ANALYSIS

1.2 Group Members

The members involved in this project are Wong Chee Kin (279592), Muhammad Daniel Firdaus Bin Mazilan (275552)

1.3 Data Sources

Retrieved from Kaggle open data source website for mall customer data released by the following website:

MallCustomerdatasets

This dataset can be downloaded at:

https://www.kaggle.com/datasets/mltuts/mall-customer-datasets/download?datasetVersionNumber=1

2.0 PROJECT REQUIREMENTS

2.1 Introduction to the project

In this project, we will implement Mall customer analysis in RStudio. Mall customer analytics is a great way to find the best customers when the need arises. We will explore the data on which we will build our segmentation model in this machine learning project. Additionally, we will see descriptive analysis of our data in this data science project and then implement several versions of the K-means algorithm.

2.2 Problems need to be solved

Mall customer analysis is one of the most important applications utilizing unsupervised learning, customer segmentation. To achieve this goal of being able to target potential user bases, companies can use clustering techniques to identify several segments of customers. In this machine learning project, we will use the basic algorithm for clustering unlabeled datasets, namely K-means clustering. Mall customer analysis is the process of dividing a customer base into groups of individuals that are similar in different ways related to marketing such as age, gender, hobbies, and consumption purposes.

Companies deploying customer segmentation need specific marketing efforts because they believe that each customer has different requirements, and this approach can properly address these issues in order to be able to gain a better understanding of their target customers. And they should be customized to each client's needs because the goals have to be specific. In addition, in order to maximize profits, companies can use the collected data to gain a better understanding of the preferences of their target customers, as well as discover the needs of valuable market segments. In this way, they can minimize the possibility of investment risk and develop marketing strategies more effectively.

2.3 Data analysis specifications

For this project, Mall customer analysis will depend on several key differentiating factors that divide customers into target groups. Behavioral patterns, demographics, economic conditions, and geography all play a role in determining the direction a company addresses each market segment.

Application Development Platform

• Web Based

• R Programming, RStudio, R Shiny

3.0 DATA PREPARATION

In the process of preparing data for analysis, several tasks need to be performed such as data acquiring, data cleaning, data transformation, and data for analysis.

3.1 Data acquiring

Data acquisition is the process of importing data into an R session so that it can be viewed, stored, and analyzed. There are many ways to obtain data in R, depending on the consumer's data format. And the way we need to get the data is from a spreadsheet-like table organized as an Excel file.

{r} library(readr) getwd() setwd("C:/Users/cheehoe/Documents/Sem 4/PROGRAMMING FOR DATA SCIENCE/project") cs_data <- read.csv("customer-segmentation-dataset/Mall_Customers.csv") View(cs_data) print(cs_data)

3.2 Data Cleaning

Data cleaning is the process of converting dirty data into reliable data that can be analyzed. Data cleaning improves your data quality and overall productivity.

{r} str(cs_data) summary(cs_data)

And we use the summary() method to check whether there is dirty data transfer in the data. But the results clearly show that there is no missing data (Na)

4.0 DATA ANALYSIS

The data analysis is carried out according to the needs of the project to understand the relationship between Gender, Age, Annual Income and Spending Score.

Here are some examples of data analysis that have been performed:

4.1 Gender Comparision

{r} a=table(cs_data$Gender) barplot(a,main="Gender Comparision", ylab="Count", xlab="Gender", col=rainbow(2), legend=rownames(a))

4.2 Ratio of Female and Male

{r} library(plotrix) library(plyr) library(dplyr) piec =round(a/sum(a)*100) lbs=paste(c("Female","Male")," ",piec,"%",sep=" ") pie3D(a,labels=lbs, main="Ratio of Female and Male")

4.3 Age Class

{r} hist(cs_data$Age, col="blue", main="Histogram to Show Count of Age Class", xlab="Age Class", ylab="Frequency", labels=TRUE)

4.4 Annual Income

{r} summary(cs_data$Annual.Income..k..) hist(cs_data$Annual.Income..k.., col="#660033", main="Annual Income", xlab="Annual Income Class", ylab="Frequency", labels=TRUE)

4.5 Density Plot for Annual Income

{r} plot(density(cs_data$Annual.Income..k..), col="yellow", main="Density Plot for Annual Income", xlab="Annual Income Class", ylab="Density")

4.6 Spending Score

{r} hist(cs_data$Spending.Score..1.100., main="Spending Score", xlab="Spending Score Class", ylab="Frequency", col="#6600cc", labels=TRUE)

4.7 K-mean

{r} library(purrr) set.seed(123) # function to calculate total intra-cluster sum of square iss <- function(k) { kmeans(cs_data[,3:5],k,iter.max=100,nstart=100,algorithm="Lloyd" )$tot.withinss } k.values <- 1:10 iss_values <- map_dbl(k.values, iss) plot(k.values, iss_values, type="b", pch = 19, frame = FALSE, xlab="Number of clusters K", ylab="Total intra-clusters sum of squares")

5.0 DATA VISUALIZATION

The visualization of the analytical data accessible through the MallCA app is developed using the latest data analysis software and technologies (R, RStudio and Shiny) and can be accessed via web browsers and mobile phones at MallCA.

6.0 PROJECT REPORT AND PRESENTATION

This project report has been generated through the R Markdown function, and it can be accessed at Here

8.0 CONCLUSION

This project enables Mall customer analysis. Mall customer analysis is a great way to find the best customers when needed. We have explored the data for building segmentation models in this machine learning project. Here are some conclusions that can be made about this project:

• Use the latest open source software R, RStudio and the Shiny data analysis software platform.

• An analytical and user-friendly data platform with analytical functions and a graphical user interface (GUI).

• Ability to develop on various operating platforms can be completed in a short time.

• The system is accessible 24/7 and from multiple devices (web, mobile).