K Means Clustering!

class: title-slide

.row[
.col-7[
.title[
# K Means Clustering
]
.subtitle[
## K Means Clustering
]
.author[
### Laxmikant Soni [blog](https://laxmikants.github.io) [](https://github.com/laxmiaknts) [](https://twitter.com/laxmikantsoni09)
]

.affiliation[
]

]

.col-5[

.logo[

<img src="figures/rmarkdown.png" width="480" />
]

]
]

---

# K-Means Clustering

.pull-top[

## Introduction to K-Means Clustering

* **Definition**: K-means clustering is an unsupervised machine learning algorithm that divides a dataset into clusters based on feature similarity.

* **Objective**: To group similar data points into k  clusters, where k is predefined.
]

# K-Means Clustering

.pull-top[

## Key Concepts

* **Centroid**: The center of a cluster, calculated as the mean position of all data points within the cluster.

* **Clusters**: Groups where points in the same cluster are more similar to each other than to points in other clusters.
]

---

# K-Means Clustering

.pull-top[

## Additional Concepts

* **Inertia**: A measure of cluster compactness; lower inertia indicates tighter, more cohesive clusters.

* **K Value**: The number of clusters, chosen based on factors like the elbow method to balance cohesion and separation.
]

# K-Means Clustering

.pull-top[

## Example: Mall Customer Segmentation

* **Dataset**: Each data point represents a customer, with features such as age, income, and spending score.
* **Goal**: Segment customers based on shopping behaviors for targeted marketing.
  - **Cluster 1**: High-income, high-spending frequent shoppers.
  - **Cluster 2**: Younger customers with low to moderate spending scores, potentially price-sensitive.
  - **Cluster 3**: Mid-age customers with steady income and spending patterns.
]

---

# k-Means Clustering

.pull-top[

## Mall Customer Segmentation Dataset

| CustomerID | Age | Annual_Income (k$) | Spending_Score (1-100) |
|------------|-----|---------------------|-------------------------|
| 1          | 19  | 15                 | 39                      |
| 2          | 21  | 15                 | 81                      |
| 3          | 20  | 16                 | 6                       |
| 4          | 23  | 16                 | 77                      |
| 5          | 31  | 17                 | 40                      |
| 6          | 22  | 17                 | 76                      |
| 7          | 35  | 18                 | 6                       |
| 8          | 23  | 18                 | 94                      |
| 9          | 64  | 19                 | 3                       |
| 10         | 30  | 19                 | 72                      |
]

---

# Features and Target Explanation

.pull-top[

## Features

1. **CustomerID**: A unique identifier for each customer. Not used for clustering but helps to identify individual records.
2. **Age**: The age of the customer. Useful for understanding the age demographics of each customer segment.
3. **Annual_Income (k$)**: The customer’s annual income in thousands of dollars. This feature helps to differentiate high-income vs. low-income customer groups.
4. **Spending_Score (1-100)**: A score assigned based on customer behavior and purchasing data, ranging from 1 (low) to 100 (high). Higher scores indicate a tendency to spend more, which can help distinguish high and low spenders.
]

---

# Features and Target Explanation

.pull-top[

## Target (Cluster Label)

* **Cluster Label**: The cluster number assigned to each customer after running the K-means algorithm. 
  - This label represents the group or segment to which the customer belongs, based on similarities in age, income, and spending score.
  - For instance, clusters may represent high-income frequent spenders, budget-conscious shoppers, or younger, moderate spenders.
]

---

# Python Implementation

.pull-top[

## Importing Libraries

```python
# Import necessary libraries
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
```

]

---

# Python Implementation

.pull-top[

## Setting up Dataset

```python
# Sample dataset: Replace this with your actual dataset
data = {
    'CustomerID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Age': [19, 21, 20, 23, 31, 22, 35, 23, 64, 30],
    'Annual_Income': [15, 15, 16, 16, 17, 17, 18, 18, 19, 19],
    'Spending_Score': [39, 81, 6, 77, 40, 76, 6, 94, 3, 72]
}
df = pd.DataFrame(data)
```

]

---

# Python Implementation

.pull-top[

## Features for clustering

```python

X = df[['Age', 'Annual_Income', 'Spending_Score']]

```

]

---

# Python Implementation

.pull-top[

## Data Standardization

```python

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

```
]

---

# Python Implementation

.pull-top[

## Apply K-means clustering

```python
# Apply K-means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_scaled)

# Display the resulting clusters
print("Clustered Data:")
```

```
## Clustered Data:
```

```python
print(df)
```

```
##    CustomerID  Age  Annual_Income  Spending_Score  Cluster
## 0           1   19             15              39        2
## 1           2   21             15              81        0
## 2           3   20             16               6        2
## 3           4   23             16              77        0
## 4           5   31             17              40        2
## 5           6   22             17              76        0
## 6           7   35             18               6        2
## 7           8   23             18              94        0
## 8           9   64             19               3        1
## 9          10   30             19              72        0
```
]

---

# Python Implementation

.pull-top[

## Classify New Customer

```python
# Classify a new customer
new_customer = pd.DataFrame({
    'Age': [25],
    'Annual_Income': [20],
    'Spending_Score': [60]
})

# Standardize the new customer's data using the previous scaler
new_customer_scaled = scaler.transform(new_customer)

# Predict the cluster for the new customer
new_customer_cluster = kmeans.predict(new_customer_scaled)

print(f"New customer belongs to Cluster: {new_customer_cluster[0]}")
```

```
## New customer belongs to Cluster: 0
```
]

---

class: inverse, center, middle
# Thanks