nba <- read.csv("C:/Users/nicho/OneDrive/Desktop/MSDA/MK 6460 - Marketing Research & Analytics/Week 5 - Principal Component Analysis & Multidemensional Scaling/Week5_task_data/nba.csv")
rownames(nba) <- nba[,1]
nba_data <- nba[, -1]
# Standardize the data
nba_scaled <- scale(nba_data)
# Calculate distance and apply MDS
nba_dist <- dist(nba_scaled)
nba_mds <- cmdscale(nba_dist, k = 2)
nba_mds_df <- as.data.frame(nba_mds)
nba_mds_df$Player <- rownames(nba)
ggplot(nba_mds_df, aes(V1, V2, label = Player)) +
geom_point(color = "blue", size = 2) +
geom_text(hjust = 0.5, vjust = -1, size = 3) +
labs(title = "NBA Players Metric MDS Map", x = "Dimension 1", y = "Dimension 2")
In this task, we applied metric multidimensional scaling to
standardized statistics of 50 NBA players. The cmdscale()
function translates the players’ Euclidean distances into a
two-dimensional space. Players positioned closely on the map share
similar stat profiles, suggesting comparable playing styles or roles.
Those on opposite ends likely differ significantly in performance
metrics such as scoring, defense, or assists. This visualization can
help teams identify clusters of similar players or unique outliers worth
scouting or game-planning around.
adult <- read.csv("C:/Users/nicho/OneDrive/Desktop/MSDA/MK 6460 - Marketing Research & Analytics/Week 5 - Principal Component Analysis & Multidemensional Scaling/Week5_task_data/Adult2.csv")
# Remove *all* duplicated rows based on feature columns (excluding ID)
adult_data <- adult[, -1]
dupe_flags <- duplicated(adult_data) | duplicated(adult_data, fromLast = TRUE)
adult_clean <- adult[!dupe_flags, ] # remove *all* duplicated rows
# Prepare cleaned data for Gower distance
adult_data <- adult_clean[, -1]
adult_data[] <- lapply(adult_data, as.factor)
# Compute Gower distance
adult_dist <- daisy(adult_data, metric = "gower")
# Run isoMDS
adult_mds <- isoMDS(adult_dist, k = 2)
## initial value 31.273406
## iter 5 value 24.486394
## final value 24.393229
## converged
# Create dataframe for plotting
adult_mds_df <- as.data.frame(adult_mds$points)
adult_mds_df$Consumer <- adult_clean[, 1]
# Plot
library(ggplot2)
ggplot(adult_mds_df, aes(V1, V2, label = Consumer)) +
geom_point(color = "darkgreen", size = 2) +
geom_text(hjust = 0.5, vjust = -1, size = 3) +
labs(title = "Consumer Segmentation (Non-Metric MDS)",
x = "Dimension 1", y = "Dimension 2")
For this task, we explored the structure of a consumer dataset using
non-metric MDS via isoMDS() and Gower distance. Since the
variables are categorical, we bypassed metric assumptions and used
daisy() to compute distances appropriate for mixed data
types. The resulting map reveals how consumers cluster based on shared
characteristics like income level, occupation, and household
demographics. Such a visualization is powerful for segmenting a market
and targeting promotions toward specific consumer archetypes.
# Task 3: Metric MDS using USArrests dataset
# Load dataset
data("USArrests")
states_data <- USArrests
states_scaled <- scale(states_data)
# Compute distance matrix and apply MDS
states_dist <- dist(states_scaled)
states_mds <- cmdscale(states_dist, k = 2)
# Prepare plot dataframe
states_mds_df <- as.data.frame(states_mds)
states_mds_df$State <- rownames(USArrests)
# Plot MDS map
library(ggplot2)
ggplot(states_mds_df, aes(V1, V2, label = State)) +
geom_point(color = "firebrick", size = 2) +
geom_text(hjust = 0.5, vjust = -1, size = 3) +
labs(title = "MDS Map of U.S. States Based on Arrest Statistics",
x = "Dimension 1", y = "Dimension 2")
In Task 3, I selected the built-in USArrests dataset, which contains standardized statistics on violent crime and urbanization for all 50 U.S. states. After standardizing the numeric data, metric MDS was applied to visualize the pairwise similarities between states. The resulting spatial map reveals how states cluster based on shared crime profiles or socio-demographic characteristics. States like Florida, California, and Nevada appear isolated, indicating distinct patterns in arrest rates or urban concentration. This type of analysis can be useful for criminology research, public safety planning, or geographic segmentation in policy development or social marketing.