In response to changes in the European small car market and the success of the Renault Twingo, Ford decided to launch a second small car, the Ford Ka. In 1990s, Ford was the third-largest car manufacturer in Europe and the Ford Fiesta – Ford’s existing small car – is its best-selling car.
The Ford Ka has already been developed, the production capacity determined, and the launch set for October 1996 in France. Ford Ka France retained highly reputable Goldfarb Market Research to conduct a market research study for developing a marketing strategy for the launch of the Ford Ka. Goldfarb used both secondary and primary research to inform their marketing strategy recommendations. Secondary research data contains publicly available car sales data by demographics and other relevant information and data.
Background and Secondary Research
To inform their thinking, the market research firm relied on data and other information that were collected by Ford previously or publicly available.
The belief has been that the size of a car was strongly correlated with production costs and thus price. Small cars sold to younger, lower income buyers. Large cars, on the other hand, sold mostly to older, wealthier buyers and families. In previous marketing efforts, Ford mainly relied on a breakdown of small car buyers by household income, age and life stage for France.
During the 1980s and 1990s, a series of environmental and demographic changes significantly affected the French car market. Increased road congestion and the problem of parking in large cities made small cars increasingly attractive to customers. A further advantage of small cars was their low fuel consumption. In France, fuel prices remained high throughout the 1990s due to tax accounting for more than 85% of the retail price of fuel. Needless to say, environmentally conscious people also appreciated the reduced toxic emissions associated with smaller cars.
They were also important demographic changes that made small cars a more attractive option to the buying public in mid-1990s. Exhibit 1 shows the changes in household demographics for France. As Exhibit 1 indicates that the percentage of couples with children has been decreasing, increasing the feasibility of small cars as primary source of family transportation. In addition, the rise in the number of working women in France led to an increase in the number of women car buyers. This too, led to stronger demand for small cars as women tend to have a relatively higher preference for small cars. These changes brought new buyers to the small car market and raised questions about the underlying rationale for the traditional market segmentation based on the size of the car using demographics. In other words, traditional product categories that were based on car size, engine output, and price of a carwere no longer by themselves appropriate.
By 1996, car manufacturers were confronted with an increasingly fragmented small car market, reflecting an increase in the variety of consumer needs and manufacturers’ positioning strategies. For many customers considering the purchase of a small car, price was no longer the most important factor. Older customers who valued the advantages of smaller cars expected features similar to larger models due to (in many cases) their previous experience with large cars. Positioning research conducted for Ford in January 1996 showed that future small car buyers “…will be looking for more space/functionality, power steering, greater comfort (air conditioning) and greater performance (power).”
Ford, given the challenge from Renault Twingo and the importance of the small car market to Ford, needed to respond quickly. Ford management decided therefore to develop the Ka using the same chassis as the Fiesta. While this saved time and cut the development costs to $250 million (instead of the usual $1 billion or more to develop a new car), it restricted Ford’s ability to build the Ka on the basis of technical innovation and market input. Instead, the idea was to use innovative styling, features and maneuverability as a basis for marketing the Ka. Also, Ford had already located a production capacity of 250,000 units per year, which corresponded with the sales objectives for Europe. Exhibit 2 provides detailed information about the Ford Ka.
Primary Research
The market research to identify customers for the Ka was conducted in two stages. First, following Ford’s standard approach, reactions and attitudes of potential customers towards the Ka were elicited in focus groups from three potential target groups: - First-time buyers - Single working professionals - Multi-car households
Thirty focus group sessions were conducted. Focus group participants were recruited based on demographic and lifestyle information from Ford’s existing buyer surveys. In addition, more structured personal interviews with potential customers were conducted to better understand the competitors in the small car market and customer perceptions of the Ka relative to its potential rivals. Through these interviews, a series of customer characteristics and general attitudes were collected.
Question # | Statement |
---|---|
1 | I want a car that is trendy. |
2 | I am fashion conscious. |
3 | I do not have the time to worry about car maintenance. |
4 | Basic transportation is all I need. |
5 | Small cars are not prestigious. |
6 | Today’s cars last much longer than yesterday’s. |
7 | My car must function with total reliability. |
8 | I want a car that is easy to handle. |
9 | I am looking for a car which delivers a smooth ride. |
10 | When I buy a car, dependability is most important to me. |
11 | Today’s cars are more efficient than yesterday’s. |
12 | I want a car that is fuel economic. |
13 | I love to drive. |
14 | The car I buy must be able to handle long motorway journeys. |
15 | I want the most equipment I can get for my money. |
16 | I want a vehicle that is environmentally friendly. |
17 | I want a car that is nippy and zippy. |
18 | I prefer buying my next car from the same car manufacturer. |
19 | I wish there were stricter exhaust regulations. |
20 | One should not spend beyond ones means. |
21 | Good aerodynamics help fuel economy. |
22 | Small cars are much safer nowadays. |
23 | Buying a car on a lower interest rate does not interest me. |
24 | I want a car that drives well on country roads. |
25 | I consider myself an authority on cars. |
26 | Small cars are for kids. |
27 | Small cars are for women. |
28 | Domestic made is best made. |
29 | A car is a fashion accessory to me. |
30 | Having a masculine car is important to me. |
31 | I want a comfortable car. |
32 | City driving is my main concern. |
33 | Fuel economy comes at the expense of performance. |
34 | I want a practical car. |
35 | I have always been fascinated by cars which have a cult following. |
36 | I like to believe that the car I drive will one day become a cult car. |
37 | I prefer cars with high performance. |
38 | I do not believe that a Swatch branded car will be successful. |
39 | Small cars take up less room in today’s traffic. |
40 | I prefer small cars. |
41 | In today’s world it is anti-social to drive big cars. |
42 | Many manufacturers do not really care about their customers needs. |
43 | I would rather deal with a manufacturer’s rep than a salesperson. |
44 | I want to buy a car that makes a statement about me. |
45 | A car is an extension of oneself. |
46 | I always want the latest style and design in a vehicle. |
47 | When it comes to cars my heart rules my head. |
48 | My car must have a very individual interior. |
49 | Nowadays smart cars are mainly foreign brands. |
50 | People ought to buy domestic products for the good of the country. |
51 | I want a car equipped with the latest features and technology. |
52 | I have a relationship with my car. |
53 | Quality and reliability of products are my main concerns. |
54 | Image is not important to me in a car. |
55 | Cars all look the same these days. |
56 | Most environmentally friendly products do not perform as well as those they replaced |
57 | I want a car that has character. |
58 | For me a car is a symbol of freedom and independence. |
59 | I am interested in car maintenance. |
60 | When buying a car I only consider a national make. |
61 | The government should implement policies that favor public transportation. |
62 | The government is right to tax large cars more heavily than small cars. |
Data
The dataset contains responses from 250 potential small car buyers to different tasks and questions. First it contains the preference ranking of the Ford Ka from a set of 10 small cars along with six demographic variables describing these respondents. The demographic variables are: age, gender, marital status, number of children in the household, first-time car purchase, and household income gap. Second, it contains the responses for the same 250 respondents to a set of 62 attitudinal questions about cars and driving.
# clean workspace
rm(list = ls())
# Set the seed for reproducibility
set.seed(123)
# Prepare needed libraries
packages <- c("ggplot2", # Visualizations
"dplyr", # Data manipulation
"tidyr", # Data formatting
"stargazer", # LaTex tables
"knitr", # markdown output formatting
"factoextra", # clustering methods
"readxl", # read in excel files
"nnet", # mutinomial logit
"Hmisc", #
"psych" # psycholoical research
)
for (i in 1:length(packages)) {
if (!packages[i] %in% rownames(installed.packages())) {
install.packages(packages[i]
, repos = "http://cran.rstudio.com/"
, dependencies = TRUE
)
}
library(packages[i], character.only = TRUE)
}
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
# Set working directory for chunk
setwd("~/Documents/MarketResearch")
# load data from excel file and then make them into a data frame
demographic <- data.frame(
read_excel("Ford_Ka.xls",
sheet = "Demographic Data",
range = "A7:J257")
)
mds <- data.frame(read_excel("Ford_Ka.xls",
sheet = "MDS Data",
range = "A12:M1892")
)
## New names:
## • `` -> `...3`
pys_data <- data.frame(read_excel("Ford_Ka.xls",
sheet = "Psychographic Data",
range = "A7:BK257")
)
questions <- data.frame(read_excel("Ford_Ka.xls",
sheet = "Psychographic questionnaire",
range = "A7:B69")
)
df <- merge(pys_data,demographic,by = "Respondent.Number",all.x = TRUE)
Demographic Data | |
---|---|
Preference Group | |
1 | Ka Chooser (top 3) |
2 | Ka Non-Chooser (bottom 3) |
3 | Middle (middle 4) |
Gender | |
1 | Male |
2 | Female |
Marital Status | |
1 | Married |
2 | Living Together |
3 | Single |
Number of Children | |
Number of children living in household | |
1st Time Purchase | |
1 | yes |
2 | no |
Age Category | Age |
1 | <25 |
2 | 25 - 29 |
3 | 30 - 34 |
4 | 35 - 39 |
5 | 40 - 44 |
6 | >44 |
Children Category | Number of Children in Household |
0 | 0 |
1 | 1 |
2 | >1 |
Income Category | Annual Household Income |
1 | <100K |
2 | 100K - 150K |
3 | 150K - 200K |
4 | 200K - 250K |
5 | 250K - 300K |
6 | >300K |
# make dependent variable factor
df$Preference.Group <- factor(df$Preference.Group, levels = c(1, 2, 3), labels = c("Ka Chooser", "Ka Non-Chooser", "Middle"))
# summary stats for our data sets
stargazer(df[2:62], type = "text", title = "Survey Questions")
##
## Survey Questions
## ====================================
## Statistic N Mean St. Dev. Min Max
## ------------------------------------
## Q1 250 5.100 1.537 1 7
## Q2 250 4.060 1.966 1 7
## Q3 250 4.444 1.289 1 7
## Q4 250 4.236 1.629 1 7
## Q5 250 3.848 1.835 1 7
## Q6 250 3.992 1.041 1 7
## Q7 250 3.880 0.966 2 6
## Q8 250 3.916 0.996 1 7
## Q9 250 3.904 1.093 1 7
## Q10 250 3.916 1.036 1 7
## Q11 250 3.984 1.068 2 7
## Q12 250 4.072 1.135 1 7
## Q13 250 3.988 1.146 1 6
## Q14 250 4.132 2.089 1 7
## Q15 250 4.972 1.407 2 7
## Q16 250 4.512 1.345 1 7
## Q17 250 4.444 1.880 1 7
## Q18 250 4.532 1.354 1 7
## Q19 250 4.688 1.314 1 7
## Q20 250 3.832 1.938 1 7
## Q21 250 4.912 1.389 2 7
## Q22 250 4.992 1.365 1 7
## Q23 250 4.120 1.976 1 7
## Q24 250 2.376 1.272 1 6
## Q25 250 3.148 1.347 1 7
## Q26 250 3.012 1.393 1 7
## Q27 250 3.460 1.323 1 7
## Q28 250 3.120 1.398 1 7
## Q29 250 3.448 1.344 1 7
## Q30 250 3.344 1.290 1 6
## Q31 250 4.056 2.072 1 7
## Q32 250 4.604 1.311 2 7
## Q33 250 4.564 1.279 1 7
## Q34 250 4.496 1.284 1 7
## Q35 250 4.584 1.297 1 7
## Q36 250 4.452 1.261 2 7
## Q37 250 4.836 1.476 1 7
## Q38 250 4.616 1.294 2 7
## Q39 250 3.444 1.376 1 7
## Q40 250 3.368 1.261 1 7
## Q41 250 3.912 2.098 1 7
## Q42 250 3.148 1.399 1 7
## Q43 250 3.392 1.313 1 7
## Q44 250 4.260 1.903 1 7
## Q45 250 4.744 1.456 1 7
## Q46 250 4.752 1.476 1 7
## Q47 250 4.768 1.568 1 7
## Q48 250 4.776 1.533 1 7
## Q49 250 4.776 1.504 2 7
## Q50 250 4.812 1.478 1 7
## Q51 250 3.308 1.455 1 7
## Q52 250 3.532 1.824 1 7
## Q53 250 3.616 1.829 1 7
## Q54 250 3.160 1.469 1 7
## Q55 250 3.136 1.493 1 6
## Q56 250 3.148 1.512 1 7
## Q57 250 4.316 1.347 1 7
## Q58 250 4.384 1.325 2 7
## Q59 250 4.320 1.265 1 7
## Q60 250 3.772 1.377 1 7
## Q61 250 3.680 1.293 1 7
## ------------------------------------
stargazer(df[64:72], type = "text", title = "Demographic Data")
##
## Demographic Data
## ==============================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------
## Gender 250 1.480 0.501 1 2
## Age 250 36.364 9.107 20 58
## Marital.Status 250 1.872 0.935 1 3
## Number.of.Children 250 0.728 1.036 0 4
## X1st.Time.Purchase 250 1.852 0.356 1 2
## Age.Category 250 3.768 1.619 1 6
## Children.Category 250 0.624 0.818 0 2
## Income.Category 250 3.680 1.571 1 6
## ----------------------------------------------
The Chi-squared test for cross tabulations is testing if the selected variables are assoicated with the car prefrence groups. Most of the vairbales are accepting the null hypotheses that the vairbales are independent of each other AKA no Association. Based on the chi-squared test results of our crosstabs, First Time Purchase stands out as a potentially useful demographic variable for differentiating between the preference groups, while the other variables do not show significant associations at the .05 level. Age catagory and gender show significance at the .1 level but this is not quite enoght cofidence.
# Fit multinomial logistic regression model
model <- multinom(Preference.Group ~ Gender + Marital.Status + Number.of.Children + X1st.Time.Purchase + Age.Category + Income.Category, data = df)
## # weights: 24 (14 variable)
## initial value 274.653072
## iter 10 value 252.853559
## final value 251.798781
## converged
# Summary of the model
stargazer(model,type = "text")
##
## ===============================================
## Dependent variable:
## ----------------------------
## Ka Non-Chooser Middle
## (1) (2)
## -----------------------------------------------
## Gender -0.104 -0.880**
## (0.310) (0.347)
##
## Marital.Status 0.240 0.264
## (0.164) (0.178)
##
## Number.of.Children -0.064 -0.030
## (0.147) (0.164)
##
## X1st.Time.Purchase -0.073 -1.065**
## (0.493) (0.450)
##
## Age.Category 0.088 -0.190*
## (0.099) (0.105)
##
## Income.Category -0.128 -0.064
## (0.099) (0.105)
##
## Constant -0.455 3.035**
## (1.215) (1.184)
##
## -----------------------------------------------
## Akaike Inf. Crit. 531.598 531.598
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Multinomial model allows us to understand the interaction of covariates and take advantage of correlation in our data set. Multinomial Logit is and extension of the logit model that allows us to compare more than two categories. In the context of this business question we can use it to find relationship between ouput categories and the independent variables.
Interestingly enough the results of the crosstab table chi-squared tests are showing basically the same results in-terms of variable importance. The regression seems to have identified the same variables as influential. It seems that women are still less likly to by the ford ka then men, also as individuals get older the odds that they by a ford ka get lower.
Factor analysis can be a powerful tool for both customer segmentation and interpreting survey questions. It helps in simplifying complex data, uncovering hidden patterns, and providing insights into customer preferences. This tool will be very helpful in our case because of the high number of questions. We hope to better understand unobserved beliefs and attitudes of our sample. This will really help us interpret the results of the cluster analysis that we will do later.
# Provide example of surevy data
head(df[2:5])
## Q1 Q2 Q3 Q4
## 1 6 2 4 3
## 2 7 7 7 5
## 3 5 4 6 5
## 4 4 2 5 4
## 5 5 5 7 6
## 6 6 6 4 4
Our survey data is liker scale data, this can often be treated as interval data due to the assumption that the intervals between each response option are equal. This assumption is often considered reasonable for Likert scales with five or more response options like ours.
# Pearsons Correlation used to create square matrix
corr_matrix <- as.data.frame(cor(df[2:62]))
Kaiser, Meyer, Olkin Measure of Sampling Adequacy. This test is used to see if our data is suitable for factor analysis
KMO Value | Interpretation |
---|---|
> 0.9 | Marvelous |
> 0.80 | Meritorious |
> 0.70 | Middling |
> 0.60 | Mediocre |
> 0.50 | Miserable |
< 0.50 | Unacceptable |
# Kaiser, Meyer, Olkin Measure of Sampling Adequacy
KMO(corr_matrix)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = corr_matrix)
## Overall MSA = 0.95
## MSA for each item =
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
## 0.95 0.97 0.93 0.93 0.95 0.35 0.51 0.57 0.53 0.54 0.35 0.47 0.39 0.96 0.95 0.92
## Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32
## 0.95 0.91 0.94 0.97 0.96 0.96 0.97 0.96 0.94 0.94 0.92 0.94 0.93 0.92 0.96 0.91
## Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40 Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48
## 0.95 0.95 0.94 0.93 0.94 0.95 0.92 0.91 0.96 0.91 0.92 0.95 0.97 0.95 0.97 0.96
## Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 Q59 Q60 Q61
## 0.97 0.97 0.97 0.98 0.97 0.97 0.96 0.97 0.90 0.85 0.88 0.91 0.92
ev <- eigen(corr_matrix,
symmetric = TRUE
)# get eigenvalues
ev$values
## [1] 16.53133961 10.76348062 5.52320090 1.46448014 1.27596377 1.21277417
## [7] 1.16913747 1.12298138 1.01034021 0.97680604 0.89987797 0.84614765
## [13] 0.79386927 0.76649120 0.73966062 0.72385450 0.69953134 0.64355283
## [19] 0.63649088 0.62564452 0.60636969 0.58976371 0.56902964 0.54626872
## [25] 0.52157922 0.51461163 0.49515295 0.46637663 0.45699510 0.44306563
## [31] 0.41499714 0.40119046 0.39764526 0.37740356 0.36108326 0.34455595
## [37] 0.32981949 0.32343878 0.31899475 0.30304474 0.29428602 0.27931649
## [43] 0.26084867 0.25932726 0.23781229 0.23303562 0.22466281 0.21078973
## [49] 0.20458061 0.19155644 0.18296289 0.16257573 0.15271931 0.14826550
## [55] 0.13985984 0.12503473 0.12238596 0.10675513 0.09262940 0.08801333
## [61] 0.07557084
# plot eigenvalues in scree plot
scree(corr_matrix,
pc=FALSE
) # Use pc=FALSE for factor analysis
Nfacs <- 3 # Number of facrors based on total variation in eigenvalues.
fit <- factanal(df[2:63],
Nfacs,
rotation="varimax"
)
print(fit,
digits=2,
cutoff=0.3,
sort=TRUE
)
##
## Call:
## factanal(x = df[2:63], factors = Nfacs, rotation = "varimax")
##
## Uniquenesses:
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
## 0.35 0.16 0.53 0.30 0.27 0.99 0.97 0.98 0.98 0.99 0.99 0.99 0.99 0.12 0.39 0.53
## Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32
## 0.17 0.52 0.58 0.20 0.40 0.41 0.18 0.31 0.37 0.44 0.50 0.41 0.57 0.51 0.12 0.57
## Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40 Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48
## 0.58 0.55 0.52 0.49 0.39 0.56 0.55 0.64 0.12 0.41 0.55 0.17 0.38 0.34 0.38 0.38
## Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 Q59 Q60 Q61 Q62
## 0.42 0.40 0.34 0.21 0.22 0.39 0.34 0.38 0.62 0.58 0.59 0.57 0.51 0.65
##
## Loadings:
## Factor1 Factor2 Factor3
## Q1 0.64 -0.31 0.38
## Q2 0.91
## Q14 -0.88
## Q15 -0.61 -0.40
## Q18 -0.53 -0.30 -0.33
## Q19 -0.52
## Q20 -0.89
## Q21 -0.62 -0.37
## Q22 -0.56 -0.44
## Q23 0.90
## Q25 0.55 0.51
## Q26 0.57 0.40
## Q28 0.57 0.46
## Q31 -0.60 0.53 0.50
## Q41 0.63 -0.50 -0.47
## Q44 0.65 -0.63
## Q45 0.73
## Q46 0.76 -0.30
## Q47 0.75
## Q48 0.72
## Q49 0.70
## Q50 0.71
## Q51 -0.76
## Q52 -0.70 0.54
## Q53 -0.70 0.54
## Q54 -0.76
## Q55 -0.79
## Q56 -0.72
## Q3 0.67
## Q4 0.63 -0.55
## Q5 0.35 0.75
## Q17 -0.36 -0.81
## Q24 -0.31 0.62 0.45
## Q32 0.64
## Q33 0.64
## Q34 0.66
## Q35 0.69
## Q36 0.70
## Q38 0.66
## Q39 -0.66
## Q40 -0.60
## Q43 -0.66
## Q37 0.51 0.55
## Q42 -0.49 -0.57
## Q57 0.58
## Q58 0.64
## Q59 0.60
## Q60 -0.65
## Q61 -0.65
## Q62 -0.56
## Q6
## Q7
## Q8
## Q9
## Q10
## Q11
## Q12
## Q13
## Q16 -0.50 -0.33 -0.34
## Q27 0.46 0.41 0.35
## Q29 0.44 0.37 0.30
## Q30 0.50 0.32 0.37
##
## Factor1 Factor2 Factor3
## SS loadings 15.35 10.43 6.18
## Proportion Var 0.25 0.17 0.10
## Cumulative Var 0.25 0.42 0.52
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 1697.61 on 1708 degrees of freedom.
## The p-value is 0.566
# create factor loadings data frame
load <- as.data.frame(fit$loadings[,1:3])
# Plot first two Factor loadings
ggplot(load, aes(x = Factor1, y = Factor2, label = rownames(load))) +
geom_text(size = 3, hjust = 0, vjust = 0) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
labs(x = "Factor 1", y = "Factor 2", title = "Factor Loadings Plot") +
theme_update()
# Set the number of clusters (k)
k1 <- 3
# Run k-means clustering
kmeans_question_result <- kmeans(load[,1:2], centers = k1)
# View the clustering results
print(kmeans_question_result)
## K-means clustering with 3 clusters of sizes 18, 28, 16
##
## Cluster means:
## Factor1 Factor2
## 1 0.1232949 0.5419066
## 2 0.3165536 -0.2018250
## 3 -0.6558390 -0.1017303
##
## Clustering vector:
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20
## 2 2 1 1 1 2 2 1 2 2 2 2 2 3 3 3 3 3 3 3
## Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32 Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40
## 3 3 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 2
## Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48 Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 Q59 Q60
## 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 2 2 2
## Q61 Q62
## 2 2
##
## Within cluster sum of squares by cluster:
## [1] 2.136509 5.037992 2.052705
## (between_SS / total_SS = 64.1 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# Create a vector of total within-cluster sum of squares
wss_values <- numeric(10)
for(k in 1:10) {
kmeans_temp <- kmeans(load[,1:2], centers = k)
wss_values[k] <- kmeans_temp$tot.withinss
}
# Plot the scree plot
plot(1:10, wss_values, type = "b", pch = 19, frame = FALSE,
xlab = "Number of Clusters (k)", ylab = "Total Within Sum of Squares",
main = "Scree Plot for K-Means Clustering")
# Visualize the clustering results with factoextra
fviz_cluster(kmeans_question_result,
data = load[,1:2],
geom = "point",
stand = FALSE,
main = "Kmeans Cluster Plot of Questions"
)
We have 62 questions in total so it is helpful to break down our questions into the underlying attitudes that each question is trying to understand. Each of the 4 clusters is measuring a unobserved attitude in our sample, we used factor analysis to find the structure of correlations between questions and then clusters them to make interpretation easier. We have labeled each cluster based off of the description of the questions and
# make data frame with question description and cluster.
questions_df$cluster <- kmeans_question_result$cluster
# Iterate over unique cluster labels
unique_clusters <- unique(questions_df$cluster)
for (cluster_label in unique_clusters) {
cat("Cluster", cluster_label, " :\n")
# Filter DataFrame for current cluster
cluster_df <- subset(questions_df, cluster == cluster_label)
print(cluster_df)
}
## Cluster 2 :
## Question_Number
## 1 1
## 2 2
## 6 6
## 7 7
## 9 9
## 10 10
## 11 11
## 12 12
## 13 13
## 23 23
## 39 39
## 40 40
## 41 41
## 42 42
## 43 43
## 44 44
## 45 45
## 46 46
## 47 47
## 48 48
## 49 49
## 50 50
## 57 57
## 58 58
## 59 59
## 60 60
## 61 61
## 62 62
## Statement
## 1 I want a car that is trendy.
## 2 I am fashion conscious.
## 6 Today's cars last much longer than yesterday's.
## 7 My car must function with total reliability.
## 9 I am looking for a car which delivers a smooth ride.
## 10 When I buy a car, dependability is most important to me.
## 11 Today's cars are more efficient than yesterday's.
## 12 I want a car that is fuel economic.
## 13 I love to drive.
## 23 Buying a car on a lower interest rate does not interest me.
## 39 Small cars take up less room in today's traffic.
## 40 I prefer small cars.
## 41 In today's world it is anti-social to drive big cars.
## 42 Many manufacturers do not really care about their customers needs.
## 43 I would rather deal with a manufacturer's rep than a salesperson.
## 44 I want to buy a car that makes a statement about me.
## 45 A car is an extension of oneself.
## 46 I always want the latest style and design in a vehicle.
## 47 When it comes to cars my heart rules my head.
## 48 My car must have a very individual interior.
## 49 Nowadays smart cars are mainly foreign brands.
## 50 People ought to buy domestic products for the good of the country.
## 57 I want a car that has character.
## 58 For me a car is a symbol of freedom and independence.
## 59 I am interested in car maintenance.
## 60 When buying a car I only consider a national make.
## 61 The government should implement policies that favor public transportation.
## 62 The government is right to tax large cars more heavily than small cars.
## cluster
## 1 2
## 2 2
## 6 2
## 7 2
## 9 2
## 10 2
## 11 2
## 12 2
## 13 2
## 23 2
## 39 2
## 40 2
## 41 2
## 42 2
## 43 2
## 44 2
## 45 2
## 46 2
## 47 2
## 48 2
## 49 2
## 50 2
## 57 2
## 58 2
## 59 2
## 60 2
## 61 2
## 62 2
## Cluster 1 :
## Question_Number
## 3 3
## 4 4
## 5 5
## 8 8
## 24 24
## 25 25
## 26 26
## 27 27
## 28 28
## 29 29
## 30 30
## 32 32
## 33 33
## 34 34
## 35 35
## 36 36
## 37 37
## 38 38
## Statement
## 3 I do not have the time to worry about car maintenance.
## 4 Basic transportation is all I need.
## 5 Small cars are not prestigious.
## 8 I want a car that is easy to handle.
## 24 I want a car that drives well on country roads.
## 25 I consider myself an authority on cars.
## 26 Small cars are for kids.
## 27 Small cars are for women.
## 28 Domestic made is best made.
## 29 A car is a fashion accessory to me.
## 30 Having a masculine car is important to me.
## 32 City driving is my main concern.
## 33 Fuel economy comes at the expense of performance.
## 34 I want a practical car.
## 35 I have always been fascinated by cars which have a cult following.
## 36 I like to believe that the car I drive will one day become a cult car.
## 37 I prefer cars with high performance.
## 38 I do not believe that a Swatch branded car will be successful.
## cluster
## 3 1
## 4 1
## 5 1
## 8 1
## 24 1
## 25 1
## 26 1
## 27 1
## 28 1
## 29 1
## 30 1
## 32 1
## 33 1
## 34 1
## 35 1
## 36 1
## 37 1
## 38 1
## Cluster 3 :
## Question_Number
## 14 14
## 15 15
## 16 16
## 17 17
## 18 18
## 19 19
## 20 20
## 21 21
## 22 22
## 31 31
## 51 51
## 52 52
## 53 53
## 54 54
## 55 55
## 56 56
## Statement
## 14 The car I buy must be able to handle long motorway journeys.
## 15 I want the most equipment I can get for my money.
## 16 I want a vehicle that is environmentally friendly.
## 17 I want a car that is nippy and zippy.
## 18 I prefer buying my next car from the same car manufacturer.
## 19 I wish there were stricter exhaust regulations.
## 20 One should not spend beyond ones means.
## 21 Good aerodynamics help fuel economy.
## 22 Small cars are much safer nowadays.
## 31 I want a comfortable car.
## 51 I want a car equipped with the latest features and technology.
## 52 I have a relationship with my car.
## 53 Quality and reliability of products are my main concerns.
## 54 Image is not important to me in a car.
## 55 Cars all look the same these days.
## 56 Most environmentally friendly products do not perform as well as those they replaced
## cluster
## 14 3
## 15 3
## 16 3
## 17 3
## 18 3
## 19 3
## 20 3
## 21 3
## 22 3
## 31 3
## 51 3
## 52 3
## 53 3
## 54 3
## 55 3
## 56 3
With 3 clusters we see that there may be 3 un-observed attitudes that our survey is trying to measure. These unobserved attitudes could be useful for two things, marketing and and understanding our customer segmentation that will happen later on.
The First cluster seems to be measuring if people feel like a the car they buy is a reflection of ones self image. Questions in this group seek to understand the way people relate to there cars.
The Second cluster seeks to understand individuals transportation needs. All of these questions are getting at what customers expect out of there cars and are less about how the cars make them feel.
The Third cluster is a bit of a mess. This could be due to bad survey design or the assumptions made when doing the factor analysis. There are few recognizable patterns in the questions here so we will call this cluster the Misc category.
# z score normaliztion
norm_df <- scale(pys_data[2:63])
# Set the number of clusters (k)
k <- 3
# Run k-means clustering
kmeans_result <- kmeans(norm_df, centers = k)
# View the clustering results
print(kmeans_result)
## K-means clustering with 3 clusters of sizes 78, 107, 65
##
## Cluster means:
## Q1 Q2 Q3 Q4 Q5 Q6
## 1 0.9189956 1.2478298 -0.4339693 -0.1291026 0.08983196 -0.01694160
## 2 -0.2413421 -0.8197727 -0.3879569 -0.5692846 -0.78308955 0.02563333
## 3 -0.7055084 -0.1479238 1.1593999 1.0920532 1.18128752 -0.02186649
## Q7 Q8 Q9 Q10 Q11 Q12
## 1 0.15075278 -0.1601580588 0.15814888 0.13058472 0.09902469 -0.06344246
## 2 -0.05959071 -0.0001125489 -0.04041168 -0.03619340 -0.01127253 0.01067256
## 3 -0.08280787 0.1923749433 -0.12325482 -0.09712177 -0.10027330 0.05856227
## Q13 Q14 Q15 Q16 Q17 Q18
## 1 -0.06784445 -1.25377987 -0.7456005 -0.4473847 -0.2021133 -0.4972018
## 2 -0.04661848 0.95682895 0.8901652 0.5921163 0.9372114 0.5942913
## 3 0.15815453 -0.07055181 -0.5706283 -0.4378529 -1.3002583 -0.3816528
## Q19 Q20 Q21 Q22 Q23 Q24 Q25
## 1 -0.5335093 -1.1968542 -0.7676171 -0.6609645 1.1978664 -0.83001425 0.6041397
## 2 0.6074743 0.8005130 0.8710725 0.8753576 -0.7984887 -0.02375309 -0.9010950
## 3 -0.3597850 0.1184574 -0.5127788 -0.6478159 -0.1230045 1.03511834 0.7583735
## Q26 Q27 Q28 Q29 Q30 Q31 Q32
## 1 0.7002076 0.3499128 0.6571490 0.3534616 0.4389293 -1.2025333 -0.4021197
## 2 -0.8541464 -0.5806311 -0.8749025 -0.5558236 -0.5781547 0.2977006 -0.3752446
## 3 0.5658073 0.5359129 0.6516453 0.4908172 0.4250165 0.9529789 1.1002539
## Q33 Q34 Q35 Q36 Q37 Q38 Q39
## 1 -0.3306456 -0.4262422 -0.3712513 -0.4296230 -0.68811960 -0.4164967 0.4318999
## 2 -0.4262760 -0.3717443 -0.4287214 -0.4177455 -0.01552838 -0.3676184 0.3835858
## 3 1.0984905 1.1234389 1.1512429 1.2032210 0.85130562 1.1049525 -1.1497212
## Q40 Q41 Q42 Q43 Q44 Q45 Q46
## 1 0.3587600 1.2335194 0.56308163 0.3360024 1.1835836 1.1622906 1.1927935
## 2 0.3750989 -0.3366881 0.07456546 0.4486609 -0.1415085 -0.5046888 -0.4777561
## 3 -1.0479825 -0.9259828 -0.79844418 -1.1417678 -1.1873555 -0.5639533 -0.6448922
## Q47 Q48 Q49 Q50 Q51 Q52 Q53
## 1 1.1533235 1.1661246 1.1290136 1.1331727 -1.1983904 -1.1492991 -1.1357711
## 2 -0.6147888 -0.4816946 -0.4785828 -0.4797348 0.5847159 0.6972978 0.7055274
## 3 -0.3719512 -0.6064062 -0.5669954 -0.5700900 0.4755361 0.2312995 0.2015186
## Q54 Q55 Q56 Q57 Q58 Q59 Q60
## 1 -1.1559146 -1.1984553 -1.1659789 -0.2345310 -0.2317600 -0.1820337 0.26802998
## 2 0.5780356 0.5785220 0.5262789 0.3758655 0.3026818 0.3676517 -0.25525667
## 3 0.4355620 0.4858101 0.5328387 -0.3372953 -0.2201488 -0.3867709 0.09855577
## Q61 Q62
## 1 0.4754763 0.3113705
## 2 -0.4463421 -0.3658248
## 3 0.1641761 0.2285592
##
## Clustering vector:
## [1] 2 1 3 2 3 1 1 1 2 2 1 3 2 2 1 2 1 2 1 2 3 3 1 3 3 2 3 2 2 2 3 1 2 2 3 3 3
## [38] 3 1 2 1 2 3 2 3 3 2 2 3 2 2 2 3 3 2 2 2 3 1 3 2 2 1 2 2 3 1 1 2 1 1 2 2 3
## [75] 1 2 2 3 2 2 1 3 1 1 3 1 2 1 2 3 2 2 3 2 3 1 2 2 1 2 3 3 3 1 2 1 2 3 1 1 3
## [112] 2 2 1 1 3 2 1 1 1 1 2 1 3 2 3 2 1 2 3 3 2 1 2 2 1 3 2 3 2 2 1 2 2 2 3 2 1
## [149] 1 1 1 2 1 1 1 2 1 1 2 2 2 3 1 1 1 1 1 3 1 2 1 2 1 1 3 1 2 2 3 3 2 2 3 2 3
## [186] 1 3 3 2 2 3 3 1 1 3 3 2 2 2 2 3 2 3 2 2 3 2 2 2 1 2 1 2 2 2 3 1 2 2 1 3 1
## [223] 3 1 1 1 1 2 2 2 2 2 2 2 3 2 2 1 1 2 2 1 3 2 1 1 3 2 3 1
##
## Within cluster sum of squares by cluster:
## [1] 2367.564 4517.195 1940.985
## (between_SS / total_SS = 42.8 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# Create a vector of total within-cluster sum of squares
wss_values <- numeric(10)
for(k in 1:10) {
kmeans_temp <- kmeans(norm_df, centers = k)
wss_values[k] <- kmeans_temp$tot.withinss
}
# Plot the scree plot
plot(1:10, wss_values, type = "b", pch = 19, frame = FALSE,
xlab = "Number of Clusters (k)", ylab = "Total Within Sum of Squares",
main = "Scree Plot for K-Means Clustering")
# Visualize the clustering results with factoextra
fviz_cluster(kmeans_result, data = norm_df, geom = "point", stand = FALSE, main = "Kmeans Cluster Plot")
# Set the number of clusters (k)
k <- 4
# Run k-means clustering
kmeans_result4 <- kmeans(norm_df, centers = k)
# View the clustering results
print(kmeans_result4)
## K-means clustering with 4 clusters of sizes 43, 65, 35, 107
##
## Cluster means:
## Q1 Q2 Q3 Q4 Q5 Q6
## 1 0.9182198 1.2117301 -0.4707469 -0.1020211 0.03214404 0.07468416
## 2 -0.7055084 -0.1479238 1.1593999 1.0920532 1.18128752 -0.02186649
## 3 0.9199487 1.2921808 -0.3887855 -0.1623742 0.16070570 -0.12951039
## 4 -0.2413421 -0.8197727 -0.3879569 -0.5692846 -0.78308955 0.02563333
## Q7 Q8 Q9 Q10 Q11 Q12
## 1 0.05199564 0.4343785863 0.25794903 0.26066983 -0.35524519 0.24393378
## 2 -0.08280787 0.1923749433 -0.12325482 -0.09712177 -0.10027330 0.05856227
## 3 0.27208299 -0.8905887941 0.03553727 -0.02923412 0.65712768 -0.44107612
## 4 -0.05959071 -0.0001125489 -0.04041168 -0.03619340 -0.01127253 0.01067256
## Q13 Q14 Q15 Q16 Q17 Q18
## 1 0.17282911 -1.22095365 -0.6909217 -0.5708548 -0.1001188 -0.4789201
## 2 0.15815453 -0.07055181 -0.5706283 -0.4378529 -1.3002583 -0.3816528
## 3 -0.36352910 -1.29410922 -0.8127774 -0.2956928 -0.3274207 -0.5196622
## 4 -0.04661848 0.95682895 0.8901652 0.5921163 0.9372114 0.5942913
## Q19 Q20 Q21 Q22 Q23 Q24 Q25
## 1 -0.4175266 -1.1734665 -0.8578025 -0.7096714 1.2102381 -0.78945260 0.5808917
## 2 -0.3597850 0.1184574 -0.5127788 -0.6478159 -0.1230045 1.03511834 0.7583735
## 3 -0.6760023 -1.2255876 -0.6568178 -0.6011247 1.1826669 -0.87984714 0.6327015
## 4 0.6074743 0.8005130 0.8710725 0.8753576 -0.7984887 -0.02375309 -0.9010950
## Q26 Q27 Q28 Q29 Q30 Q31 Q32
## 1 0.8263017 0.2323109 0.7294659 0.4106931 0.5445465 -1.1717427 -0.3898365
## 2 0.5658073 0.5359129 0.6516453 0.4908172 0.4250165 0.9529789 1.1002539
## 3 0.5452921 0.4943950 0.5683026 0.2831487 0.3091710 -1.2403617 -0.4172105
## 4 -0.8541464 -0.5806311 -0.8749025 -0.5558236 -0.5781547 0.2977006 -0.3752446
## Q33 Q34 Q35 Q36 Q37 Q38 Q39
## 1 -0.3863492 -0.1689528 -0.1634086 -0.6719795 -0.67680564 -0.2423483 0.4715412
## 2 1.0984905 1.1234389 1.1512429 1.2032210 0.85130562 1.1049525 -1.1497212
## 3 -0.2622097 -0.7423406 -0.6266010 -0.1318708 -0.70201960 -0.6304505 0.3831978
## 4 -0.4262760 -0.3717443 -0.4287214 -0.4177455 -0.01552838 -0.3676184 0.3835858
## Q40 Q41 Q42 Q43 Q44 Q45
## 1 0.3904349 1.2390616 0.59227332 -0.01515587 1.1340846 1.1663871
## 2 -1.0479825 -0.9259828 -0.79844418 -1.14176779 -1.1873555 -0.5639533
## 3 0.3198452 1.2267104 0.52721756 0.76742538 1.2443966 1.1572578
## 4 0.3750989 -0.3366881 0.07456546 0.44866094 -0.1415085 -0.5046888
## Q46 Q47 Q48 Q49 Q50 Q51 Q52
## 1 1.1762319 1.1561749 1.1470693 1.1228693 1.1654422 -1.1865069 -1.1715320
## 2 -0.6448922 -0.3719512 -0.6064062 -0.5669954 -0.5700900 0.4755361 0.2312995
## 3 1.2131405 1.1498203 1.1895354 1.1365622 1.0935274 -1.2129900 -1.1219843
## 4 -0.4777561 -0.6147888 -0.4816946 -0.4785828 -0.4797348 0.5847159 0.6972978
## Q53 Q54 Q55 Q56 Q57 Q58 Q59
## 1 -1.2012957 -1.1693068 -1.2122303 -1.1588818 -0.2172708 -0.07919395 -0.2162115
## 2 0.2015186 0.4355620 0.4858101 0.5328387 -0.3372953 -0.22014882 -0.3867709
## 3 -1.0552693 -1.1394614 -1.1815317 -1.1746983 -0.2557363 -0.41919839 -0.1400437
## 4 0.7055274 0.5780356 0.5785220 0.5262789 0.3758655 0.30268184 0.3676517
## Q60 Q61 Q62
## 1 0.33451169 0.6251088 0.2879853
## 2 0.09855577 0.1641761 0.2285592
## 3 0.18635246 0.2916421 0.3401010
## 4 -0.25525667 -0.4463421 -0.3658248
##
## Clustering vector:
## [1] 4 1 2 4 2 3 3 3 4 4 1 2 4 4 1 4 1 4 3 4 2 2 1 2 2 4 2 4 4 4 2 1 4 4 2 2 2
## [38] 2 1 4 3 4 2 4 2 2 4 4 2 4 4 4 2 2 4 4 4 2 1 2 4 4 3 4 4 2 3 3 4 1 3 4 4 2
## [75] 3 4 4 2 4 4 1 2 1 1 2 1 4 1 4 2 4 4 2 4 2 1 4 4 3 4 2 2 2 1 4 1 4 2 1 1 2
## [112] 4 4 3 1 2 4 1 1 1 1 4 3 2 4 2 4 1 4 2 2 4 1 4 4 1 2 4 2 4 4 1 4 4 4 2 4 1
## [149] 3 3 1 4 1 3 1 4 3 3 4 4 4 2 3 3 3 3 3 2 3 4 3 4 1 1 2 1 4 4 2 2 4 4 2 4 2
## [186] 3 2 2 4 4 2 2 1 3 2 2 4 4 4 4 2 4 2 4 4 2 4 4 4 3 4 1 4 4 4 2 1 4 4 1 2 3
## [223] 2 3 3 3 3 4 4 4 4 4 4 4 2 4 4 3 1 4 4 1 2 4 1 3 2 4 2 1
##
## Within cluster sum of squares by cluster:
## [1] 1251.5890 1940.9853 995.0902 4517.1950
## (between_SS / total_SS = 43.6 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# Visualize the clustering results with factoextra
fviz_cluster(kmeans_result4, data = norm_df, geom = "point", stand = FALSE, main = "Kmeans Cluster Plot")
# Set the number of clusters (k)
k <- 5
# Run k-means clustering
kmeans_result5 <- kmeans(norm_df, centers = k)
# View the clustering results
print(kmeans_result5)
## K-means clustering with 5 clusters of sizes 30, 24, 24, 97, 75
##
## Cluster means:
## Q1 Q2 Q3 Q4 Q5 Q6
## 1 0.9973854 1.1904343 -0.3703141 -0.18575417 -0.007993759 0.07170632
## 2 0.8835534 1.2837019 -0.6677033 -0.06812350 0.128263496 0.00768282
## 3 0.8564505 1.2837019 -0.2798044 -0.11926727 0.173682581 -0.15237593
## 4 -0.1723407 -0.1826193 0.6072977 0.17784112 0.807676821 0.05718552
## 5 -0.7328615 -1.0615554 -0.3341102 -0.09574114 -1.138020596 -0.05634068
## Q7 Q8 Q9 Q10 Q11 Q12
## 1 0.91778719 0.01739506 -0.33899766 0.43501840 -0.07866268 -0.004699441
## 2 -0.47959556 0.54426464 0.27833172 -0.16023607 0.56125197 0.266986999
## 3 -0.17769188 -1.08652215 0.65939923 0.04086342 -0.14109338 -0.467300677
## 4 -0.16390836 0.10499111 -0.13848544 -0.11794298 -0.06225048 -0.018022599
## 5 0.05520524 0.03077587 0.01463299 0.01673148 -0.02247505 0.089289381
## Q13 Q14 Q15 Q16 Q17 Q18
## 1 -0.10588388 -1.2439605 -0.619839211 -0.5541409 -0.05887694 -0.4668939
## 2 -0.28041774 -1.2200260 -0.927863284 -0.1638136 -0.28055218 -0.4545813
## 3 0.19227814 -1.2998079 -0.720539389 -0.5975106 -0.30271971 -0.5777073
## 4 0.06445158 0.3217401 -0.009409268 -0.4573095 -0.51593929 -0.4234824
## 5 -0.01279915 0.8878138 0.787593860 1.0567337 0.87747927 1.0647938
## Q19 Q20 Q21 Q22 Q23 Q24
## 1 -0.7267532 -1.2550772 -0.75284384 -0.67786994 1.2212217 -0.7937198
## 2 -0.1431176 -1.1604648 -0.80685847 -0.72670784 1.1832694 -0.8526983
## 3 -0.6823461 -1.1604648 -0.74684222 -0.57408942 1.1832694 -0.8526983
## 4 -0.3432437 0.1452226 0.01882902 -0.05455746 -0.1076751 1.0176571
## 5 0.9987781 1.0569071 0.77396956 0.75796409 -1.1065213 -0.4529550
## Q25 Q26 Q27 Q28 Q29 Q30
## 1 0.2861514 0.66154452 0.1813501 0.41498355 0.4850941 0.3276268
## 2 0.9111792 1.18809950 0.4710066 1.01718670 0.4106931 0.8314693
## 3 0.6945854 0.26064471 0.4395222 0.59981818 0.1316896 0.1855174
## 4 0.1350780 0.01359074 0.5404670 0.09116951 0.4797249 0.4845199
## 5 -0.8030061 -0.74579332 -1.0629133 -0.80134755 -0.9880443 -1.0831321
## Q31 Q32 Q33 Q34 Q35 Q36
## 1 -1.18521001 -0.5116677 -0.0239726 -0.04880697 -0.45034192 -0.5434951
## 2 -1.19325297 -0.4608061 -0.5386019 -0.35385050 -0.09690577 -0.3254097
## 3 -1.23346775 -0.2064981 -0.5060304 -0.97042785 -0.54673359 -0.3914962
## 4 1.03763089 0.5695346 0.6067723 0.62538006 0.67853381 0.6553270
## 5 -0.09136798 -0.3183936 -0.4408875 -0.36553302 -0.49146903 -0.4007483
## Q37 Q38 Q39 Q40 Q41 Q42
## 1 -0.6568446 -0.1668840 0.5492640 0.3424971 1.28118245 0.3706722
## 2 -0.9335415 -0.2827756 0.2525936 0.5671287 1.19380021 0.6386711
## 3 -0.4817914 -0.8622338 0.4645011 0.1707200 1.21365981 0.7280040
## 4 0.9773729 0.6630754 -0.6146976 -0.6023450 -1.00959158 -0.9309488
## 5 -0.5484246 -0.4244209 0.3458329 0.4059225 0.02287826 0.6184223
## Q43 Q44 Q45 Q46 Q47 Q48 Q49
## 1 0.05684632 1.2118732 1.1376300 1.1389483 1.1467842 1.1677968 1.1460589
## 2 0.30428010 1.1330664 1.1777035 1.2687848 1.2105417 1.1514929 1.0906616
## 3 0.71666973 1.1987387 1.1777035 1.1841088 1.1042793 1.1786661 1.1460589
## 4 -0.56530279 -0.7757187 -0.5323558 -0.5094106 -0.4699385 -0.5733070 -0.5295660
## 5 0.38168246 -0.2276640 -0.5202688 -0.5816675 -0.5916693 -0.4712925 -0.4892688
## Q50 Q51 Q52 Q53 Q54 Q55 Q56
## 1 1.0516520 -1.2424411 -1.1689817 -1.1385746 -1.1751101 -1.1400811 -1.1337658
## 2 1.1418452 -1.1279093 -1.1598434 -1.1112401 -1.1580959 -1.2349391 -1.1447861
## 3 1.2264012 -1.2138081 -1.1141514 -1.1567976 -1.1297390 -1.2349391 -1.2274382
## 4 -0.5213814 0.5534649 0.6692461 0.6495362 0.4453846 0.4404631 0.5701731
## 5 -0.5041797 0.5305113 0.3297128 0.3411350 0.6261204 0.6767279 0.4751942
## Q57 Q58 Q59 Q60 Q61 Q62
## 1 -0.1850519 -0.3904468 -0.4110961 0.2382323 0.5567713 0.4316441
## 2 -0.2963799 -0.1011338 0.2740641 0.3169168 0.5696595 0.2201795
## 3 -0.2345310 -0.1640279 -0.3518034 0.2563903 0.2796745 0.2522196
## 4 0.3010676 0.3948870 0.2849310 -0.4783516 -0.4700348 -0.3740486
## 5 -0.1454686 -0.2696900 -0.1791957 0.3399168 0.1134164 0.1599441
##
## Clustering vector:
## [1] 5 1 4 5 4 3 2 3 4 5 1 4 4 5 2 5 3 5 3 4 4 4 2 4 4 4 4 5 5 5 4 1 5 5 4 4 4
## [38] 4 2 5 2 5 4 5 4 4 5 5 4 5 5 4 4 4 4 4 5 4 2 4 5 5 3 5 4 4 1 3 5 3 1 5 4 4
## [75] 1 5 5 4 5 5 3 4 2 2 4 1 5 1 4 4 5 4 4 5 4 2 5 5 2 4 4 4 4 1 4 1 5 4 2 1 4
## [112] 4 5 3 1 4 5 1 1 1 2 5 3 4 4 4 5 2 5 4 4 4 3 4 5 2 4 5 4 5 5 1 5 5 4 4 4 1
## [149] 2 1 2 5 3 3 2 4 1 3 5 5 5 4 3 3 1 1 3 4 3 4 1 4 3 1 4 1 5 5 4 4 4 5 4 4 4
## [186] 1 4 4 5 5 4 4 2 2 4 4 4 5 5 5 4 5 4 4 5 4 5 5 4 3 5 1 4 5 5 4 1 5 5 2 4 2
## [223] 4 3 2 3 3 4 5 5 5 5 4 5 4 5 5 1 2 4 5 2 4 5 1 3 4 4 4 1
##
## Within cluster sum of squares by cluster:
## [1] 846.1442 678.1318 645.8130 4675.0457 2328.8392
## (between_SS / total_SS = 40.6 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# Visualize the clustering results with factoextra
fviz_cluster(kmeans_result5, data = norm_df, geom = "point", stand = FALSE, main = "Kmeans Cluster Plot")
# Add cluster labels to the original dataframe
pys_data$Cluster <- kmeans_result$cluster
# View the updated dataframe
str(pys_data)
## 'data.frame': 250 obs. of 64 variables:
## $ Respondent.Number: num 1 2 3 4 5 6 7 8 9 10 ...
## $ Q1 : num 6 7 5 4 5 6 7 6 6 6 ...
## $ Q2 : num 2 7 4 2 5 6 7 7 4 2 ...
## $ Q3 : num 4 7 6 5 7 4 3 3 6 4 ...
## $ Q4 : num 3 5 5 4 6 4 3 4 1 5 ...
## $ Q5 : num 1 4 7 2 7 5 5 5 3 1 ...
## $ Q6 : num 5 4 5 4 3 3 4 3 4 5 ...
## $ Q7 : num 5 5 3 5 4 5 4 4 4 3 ...
## $ Q8 : num 3 4 5 4 5 2 4 3 4 4 ...
## $ Q9 : num 4 5 4 3 4 5 4 5 3 5 ...
## $ Q10 : num 4 5 5 4 2 3 3 3 4 3 ...
## $ Q11 : num 4 4 5 4 5 4 7 4 4 4 ...
## $ Q12 : num 5 4 5 4 4 4 4 4 3 5 ...
## $ Q13 : num 4 4 6 6 4 5 5 5 6 4 ...
## $ Q14 : num 7 2 3 5 5 1 1 1 7 5 ...
## $ Q15 : num 6 3 3 6 4 2 3 4 7 7 ...
## $ Q16 : num 7 4 4 7 3 4 5 4 5 6 ...
## $ Q17 : num 6 4 2 6 2 4 3 5 7 6 ...
## $ Q18 : num 5 3 4 5 5 5 3 4 5 6 ...
## $ Q19 : num 5 4 3 5 4 4 4 4 2 6 ...
## $ Q20 : num 6 2 4 5 5 1 2 2 5 7 ...
## $ Q21 : num 7 4 2 6 5 3 2 4 6 7 ...
## $ Q22 : num 5 4 3 6 5 5 4 5 7 7 ...
## $ Q23 : num 2 7 3 1 3 7 7 7 3 2 ...
## $ Q24 : num 1 1 4 1 4 1 1 1 5 1 ...
## $ Q25 : num 2 4 2 1 5 5 5 4 1 3 ...
## $ Q26 : num 3 3 4 2 2 3 3 4 2 3 ...
## $ Q27 : num 1 4 5 2 2 5 4 5 4 3 ...
## $ Q28 : num 1 5 3 2 3 4 6 5 2 2 ...
## $ Q29 : num 2 7 3 3 1 5 4 5 5 1 ...
## $ Q30 : num 2 4 4 1 4 3 5 5 4 3 ...
## $ Q31 : num 4 1 7 2 6 2 1 1 7 3 ...
## $ Q32 : num 4 5 5 5 7 6 3 5 5 2 ...
## $ Q33 : num 5 5 7 4 7 3 5 4 5 3 ...
## $ Q34 : num 4 5 5 5 7 4 2 3 4 6 ...
## $ Q35 : num 3 3 7 4 5 4 3 4 5 4 ...
## $ Q36 : num 4 3 6 4 7 6 4 5 3 4 ...
## $ Q37 : num 4 4 5 3 7 4 3 5 7 4 ...
## $ Q38 : num 3 7 7 3 7 2 5 4 4 5 ...
## $ Q39 : num 5 4 3 6 2 3 3 6 5 4 ...
## $ Q40 : num 3 3 2 2 1 3 2 4 5 4 ...
## $ Q41 : num 5 7 1 4 3 6 7 7 1 4 ...
## $ Q42 : num 5 6 1 5 2 4 5 4 2 3 ...
## $ Q43 : num 4 4 1 4 2 3 5 4 3 4 ...
## $ Q44 : num 3 7 1 2 3 7 6 7 3 3 ...
## $ Q45 : num 4 6 4 4 4 6 7 7 5 4 ...
## $ Q46 : num 4 6 3 5 5 6 7 7 5 4 ...
## $ Q47 : num 4 7 4 4 5 7 6 6 2 4 ...
## $ Q48 : num 5 6 4 3 2 7 6 7 5 4 ...
## $ Q49 : num 4 6 3 3 5 6 6 6 3 5 ...
## $ Q50 : num 4 7 2 4 3 7 6 6 4 4 ...
## $ Q51 : num 5 1 4 5 4 2 1 1 3 6 ...
## $ Q52 : num 4 1 4 2 4 2 2 1 7 5 ...
## $ Q53 : num 2 1 3 3 6 1 2 1 6 5 ...
## $ Q54 : num 4 1 5 5 4 2 1 2 4 3 ...
## $ Q55 : num 5 1 6 4 5 1 1 1 3 5 ...
## $ Q56 : num 4 1 3 4 5 2 1 1 4 5 ...
## $ Q57 : num 5 5 4 4 4 5 4 5 6 3 ...
## $ Q58 : num 3 4 4 2 5 4 5 5 7 4 ...
## $ Q59 : num 4 3 5 5 4 4 4 4 6 4 ...
## $ Q60 : num 4 5 3 5 3 4 3 6 2 3 ...
## $ Q61 : num 4 4 4 5 4 4 5 4 2 3 ...
## $ Q62 : num 2 5 4 3 5 4 4 4 2 4 ...
## $ Cluster : int 2 1 3 2 3 1 1 1 2 2 ...
str(questions)
## chr [1:62] "I want a car that is trendy." "I am fashion conscious." ...
# Splitting the merged data frame into separate data sets based on 'Cluster'
cluster_list <- split(pys_data, pys_data$Cluster)
# Calculating cluster averages for each question
cluster_avg <- pys_data %>%
group_by(Cluster) %>%
summarise(across(starts_with("Q"), mean, na.rm = TRUE))
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(starts_with("Q"), mean, na.rm = TRUE)`.
## ℹ In group 1: `Cluster = 1`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
# Viewing the cluster averages
print(cluster_avg)
## # A tibble: 3 × 63
## Cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.51 6.51 3.88 4.03 4.01 3.97 4.03 3.76 4.08 4.05 4.09
## 2 2 4.73 2.45 3.94 3.31 2.41 4.02 3.82 3.92 3.86 3.88 3.97
## 3 3 4.02 3.77 5.94 6.02 6.02 3.97 3.8 4.11 3.77 3.82 3.88
## # ℹ 51 more variables: Q12 <dbl>, Q13 <dbl>, Q14 <dbl>, Q15 <dbl>, Q16 <dbl>,
## # Q17 <dbl>, Q18 <dbl>, Q19 <dbl>, Q20 <dbl>, Q21 <dbl>, Q22 <dbl>,
## # Q23 <dbl>, Q24 <dbl>, Q25 <dbl>, Q26 <dbl>, Q27 <dbl>, Q28 <dbl>,
## # Q29 <dbl>, Q30 <dbl>, Q31 <dbl>, Q32 <dbl>, Q33 <dbl>, Q34 <dbl>,
## # Q35 <dbl>, Q36 <dbl>, Q37 <dbl>, Q38 <dbl>, Q39 <dbl>, Q40 <dbl>,
## # Q41 <dbl>, Q42 <dbl>, Q43 <dbl>, Q44 <dbl>, Q45 <dbl>, Q46 <dbl>,
## # Q47 <dbl>, Q48 <dbl>, Q49 <dbl>, Q50 <dbl>, Q51 <dbl>, Q52 <dbl>, …
# Reshape data for plotting
cluster_avg_long <- cluster_avg %>%
pivot_longer(cols = starts_with("Q"), names_to = "Question", values_to = "Average")
# Plot heatmap with rotated y-axis labels
ggplot(cluster_avg_long, aes(x = as.factor(Cluster), y = Question, fill = Average)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "steelblue") +
labs(x = "Cluster", y = "Question", fill = "Average") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(angle = 0, hjust = 0.5, vjust = 0.5), # Rotate y-axis labels
plot.margin = unit(c(2, 10, 2, 2), "mm")) # Adjust plot margin for y-axis labels
# Creat crostab of cluster and buyers
df$cluster <- kmeans_result$cluster
table(df$cluster, df$Preference.Group)
##
## Ka Chooser Ka Non-Chooser Middle
## 1 34 13 31
## 2 53 27 27
## 3 29 32 4
# create data frame with results of crostab
cluster.prefrence.data <- data.frame(
Cluster = c(1, 2, 3),
Ka_Chooser = c(53, 34, 29),
Ka_Non_Chooser = c(27, 13, 32),
Middle = c(27, 31, 4)
)
# Calculate the total count for each cluster
cluster.prefrence.data $total <- rowSums(cluster.prefrence.data[, c("Ka_Chooser", "Ka_Non_Chooser", "Middle")])
# Calculate proportions for each category within each cluster
cluster.prefrence.data$Ka_Chooser_Proportion <- cluster.prefrence.data$Ka_Chooser / cluster.prefrence.data$total
cluster.prefrence.data$Ka_Non_Chooser_Proportion <- cluster.prefrence.data$Ka_Non_Chooser / cluster.prefrence.data$total
cluster.prefrence.data$Middle_Proportion <- cluster.prefrence.data$Middle / cluster.prefrence.data$total
# Print the updated data frame with proportions
print(cluster.prefrence.data )
## Cluster Ka_Chooser Ka_Non_Chooser Middle total Ka_Chooser_Proportion
## 1 1 53 27 27 107 0.4953271
## 2 2 34 13 31 78 0.4358974
## 3 3 29 32 4 65 0.4461538
## Ka_Non_Chooser_Proportion Middle_Proportion
## 1 0.2523364 0.25233645
## 2 0.1666667 0.39743590
## 3 0.4923077 0.06153846
# Create a bar chart
ggplot(cluster.prefrence.data, aes(x = Cluster, y = Ka_Chooser)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Ka Buyers by Cluster",
x = "Cluster",
y = "Count")
# Create a bar chart
ggplot(cluster.prefrence.data, aes(x = Cluster, y = Ka_Non_Chooser)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Non-Ka Buyers by Cluster",
x = "Cluster",
y = "Count")
# Create a bar chart
ggplot(cluster.prefrence.data, aes(x = Cluster, y = Middle)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Middle Buyers by Cluster",
x = "Cluster",
y = "Count")
Value Buyers (Cluster 1) |
---|
Agree with questions (Higher Average Scores): |
The car I buy must be able to handle long motorway journeys (Q14) |
I want the most equipment I can get for my money (Q15) |
I want a car that is nippy and zippy (Q17) |
Good aerodynamics help fuel economy (Q21) |
Small cars are much safer nowadays (Q22) |
Disagree with questions (Lower Average Scores): |
Small cars are not prestigious (Q5) |
I consider myself an authority on cars (Q25) |
Small cars are for kids (Q26) |
Small cars are for women (Q27) |
Domestic made is best made (Q28) |
Having a masculine car is important to me (Q30) |
I want a comfortable car (Q31) |
The government should implement policies that favor public transportation (Q61) |
The government is right to tax large cars more heavily than small cars (Q62) |
****Cluster 1 Profile**** |
Cluster 1’sattitudes suggest a practical approach to car buying, they tend to focus on |
functionality, good features for the money, and performance. They are open to smaller cars but still value certain aspects of larger, more prestigious vehicles. Ford may be doing well with this group because they are a foreign auto maker.|
Car Enthuists (Cluster 2) |
---|
Agree with questions (Higher Average Scores): |
I want a car that is trendy (Q1) |
I am fashion conscious (Q2) |
I am fashion conscious (Q7) |
I want a car that is fuel economic (Q12) |
I love to drive (Q13) |
I want to buy a car that makes a statement about me (Q44) |
Disagree with questions (Lower Average Scores): |
The car I buy must be able to handle long motorway journeys (Q14) |
One should not spend beyond one’s means (Q20) |
Buying a car on a lower interest rate does not interest me (Q23) |
I want a car that drives well on country roads (Q24) |
I want a comfortable car (Q31) |
I want a car equipped with the latest features and technology (Q51) |
I have a relationship with my car (Q52) |
Cluster 2 Profile |
Cluster 3s people are Car enthusiasts who view their vehicles as expressions |
of their personal style and identity. They prioritize trendy and fashion-conscious cars that make a statement about them. Fuel efficiency and driving enjoyment are important, along with a desire for comfortable and technologically advanced vehicles. Surpisingly this group buys the Ka at a high proportion. |
Bare-Bones car buyers (Cluster 3) |
---|
Agree with questions (Higher Average Scores): |
I do not have the time to worry about car maintenance (Q3) |
Basic transportation is all I need (Q4) |
I want a car that is easy to handle (Q8) |
Today’s cars are more efficient than yesterday’s (Q11) |
I want a car that is fuel economic (Q12) |
Disagree with questions (Lower Average Scores): |
I want a car that is nippy and zippy (Q17) |
I prefer buying my next car from the same car manufacturer (Q18) |
I wish there were stricter exhaust regulations (Q19) |
I want a car that drives well on country roads (Q24) |
Small cars are for women (Q27) |
I want to buy a car that makes a statement about me (Q44) |
Cluster 3 Profile |
Cluster 3s attaudes consists of “bare-bones” car buyers who prioritize |
and basic transportation needs.They are not loyal to any one Brand. These people dont care too mucn about what they drive and they just want to make it from point A to B. I would consider them prespective customers of the Ka but currently they are not buying them at a high rate. This information could be used to advertize the Ka better |
Based on the data collection process and the results from analysis I think that the best way to segment customers in this situation is to use the kmeans clustering technique based on the attitudinal data. This is due to the changing demographics of the car small car market in France, the older secondary data used in this analysis is no longer as informative as it once was. This can be seen in the low levels of significance in the chi-saquared cross tabulations and the multinomial regression. Only two variables “Gender” and “Firs time buyer” are significant at the .05 level. Meanwhile the 3 clusters created by the kmeans algorithms did a great job of segmenting the sample into 3 groups. These groups are defined by their car buying attitudes and they give great insight into the ways that ford can market the Ka. The groups are defined by “bare-bones” buyers, “Value”, and car “enthusiasts”. All three groups buy the Ka at simmalr rates but because we have attiude data we can better market to these groups of people.
The small car market had some demographic shifts during the 80s and 90s. Small cars became increasingly popular for several reasons, but the end result was that more people would be willing to buy small cars. To me, this means that attitudes about small cars themselves may not be that informative because individuals may buy these cars even if they don’t really want them. Also, I think this means that the sample that Ford has may be too narrow because there could be plenty of potential buyers out there who are not covered by the sample. For example, they collected their data by looking into 3 groups that should cover much of the population, but I think because of changes in the car market, they may not be capturing all the data they need. I also think that Ford should have tried to collect more demographic data such as race, and rual vs urban because these categories could tell us a lot of information about potential buyers. The ford sample has no information about where in France individuals are from this also could limit the analysis becaue there could be cultural differences among regions that could have been handy to know. Overall I think that fords system is fairly sound.