1.0 Research Case Synopsis

In response to changes in the European small car market and the success of the Renault Twingo, Ford decided to launch a second small car, the Ford Ka. In 1990s, Ford was the third-largest car manufacturer in Europe and the Ford Fiesta – Ford’s existing small car – is its best-selling car.

The Ford Ka has already been developed, the production capacity determined, and the launch set for October 1996 in France. Ford Ka France retained highly reputable Goldfarb Market Research to conduct a market research study for developing a marketing strategy for the launch of the Ford Ka. Goldfarb used both secondary and primary research to inform their marketing strategy recommendations. Secondary research data contains publicly available car sales data by demographics and other relevant information and data.

Background and Secondary Research

To inform their thinking, the market research firm relied on data and other information that were collected by Ford previously or publicly available.

The belief has been that the size of a car was strongly correlated with production costs and thus price. Small cars sold to younger, lower income buyers. Large cars, on the other hand, sold mostly to older, wealthier buyers and families. In previous marketing efforts, Ford mainly relied on a breakdown of small car buyers by household income, age and life stage for France.

During the 1980s and 1990s, a series of environmental and demographic changes significantly affected the French car market. Increased road congestion and the problem of parking in large cities made small cars increasingly attractive to customers. A further advantage of small cars was their low fuel consumption. In France, fuel prices remained high throughout the 1990s due to tax accounting for more than 85% of the retail price of fuel. Needless to say, environmentally conscious people also appreciated the reduced toxic emissions associated with smaller cars.

They were also important demographic changes that made small cars a more attractive option to the buying public in mid-1990s. Exhibit 1 shows the changes in household demographics for France. As Exhibit 1 indicates that the percentage of couples with children has been decreasing, increasing the feasibility of small cars as primary source of family transportation. In addition, the rise in the number of working women in France led to an increase in the number of women car buyers. This too, led to stronger demand for small cars as women tend to have a relatively higher preference for small cars. These changes brought new buyers to the small car market and raised questions about the underlying rationale for the traditional market segmentation based on the size of the car using demographics. In other words, traditional product categories that were based on car size, engine output, and price of a carwere no longer by themselves appropriate.

By 1996, car manufacturers were confronted with an increasingly fragmented small car market, reflecting an increase in the variety of consumer needs and manufacturers’ positioning strategies. For many customers considering the purchase of a small car, price was no longer the most important factor. Older customers who valued the advantages of smaller cars expected features similar to larger models due to (in many cases) their previous experience with large cars. Positioning research conducted for Ford in January 1996 showed that future small car buyers “…will be looking for more space/functionality, power steering, greater comfort (air conditioning) and greater performance (power).”

Ford, given the challenge from Renault Twingo and the importance of the small car market to Ford, needed to respond quickly. Ford management decided therefore to develop the Ka using the same chassis as the Fiesta. While this saved time and cut the development costs to $250 million (instead of the usual $1 billion or more to develop a new car), it restricted Ford’s ability to build the Ka on the basis of technical innovation and market input. Instead, the idea was to use innovative styling, features and maneuverability as a basis for marketing the Ka. Also, Ford had already located a production capacity of 250,000 units per year, which corresponded with the sales objectives for Europe. Exhibit 2 provides detailed information about the Ford Ka.

Primary Research

The market research to identify customers for the Ka was conducted in two stages. First, following Ford’s standard approach, reactions and attitudes of potential customers towards the Ka were elicited in focus groups from three potential target groups: - First-time buyers - Single working professionals - Multi-car households

Thirty focus group sessions were conducted. Focus group participants were recruited based on demographic and lifestyle information from Ford’s existing buyer surveys. In addition, more structured personal interviews with potential customers were conducted to better understand the competitors in the small car market and customer perceptions of the Ka relative to its potential rivals. Through these interviews, a series of customer characteristics and general attitudes were collected.

Question #	Statement
1	I want a car that is trendy.
2	I am fashion conscious.
3	I do not have the time to worry about car maintenance.
4	Basic transportation is all I need.
5	Small cars are not prestigious.
6	Today’s cars last much longer than yesterday’s.
7	My car must function with total reliability.
8	I want a car that is easy to handle.
9	I am looking for a car which delivers a smooth ride.
10	When I buy a car, dependability is most important to me.
11	Today’s cars are more efficient than yesterday’s.
12	I want a car that is fuel economic.
13	I love to drive.
14	The car I buy must be able to handle long motorway journeys.
15	I want the most equipment I can get for my money.
16	I want a vehicle that is environmentally friendly.
17	I want a car that is nippy and zippy.
18	I prefer buying my next car from the same car manufacturer.
19	I wish there were stricter exhaust regulations.
20	One should not spend beyond ones means.
21	Good aerodynamics help fuel economy.
22	Small cars are much safer nowadays.
23	Buying a car on a lower interest rate does not interest me.
24	I want a car that drives well on country roads.
25	I consider myself an authority on cars.
26	Small cars are for kids.
27	Small cars are for women.
28	Domestic made is best made.
29	A car is a fashion accessory to me.
30	Having a masculine car is important to me.
31	I want a comfortable car.
32	City driving is my main concern.
33	Fuel economy comes at the expense of performance.
34	I want a practical car.
35	I have always been fascinated by cars which have a cult following.
36	I like to believe that the car I drive will one day become a cult car.
37	I prefer cars with high performance.
38	I do not believe that a Swatch branded car will be successful.
39	Small cars take up less room in today’s traffic.
40	I prefer small cars.
41	In today’s world it is anti-social to drive big cars.
42	Many manufacturers do not really care about their customers needs.
43	I would rather deal with a manufacturer’s rep than a salesperson.
44	I want to buy a car that makes a statement about me.
45	A car is an extension of oneself.
46	I always want the latest style and design in a vehicle.
47	When it comes to cars my heart rules my head.
48	My car must have a very individual interior.
49	Nowadays smart cars are mainly foreign brands.
50	People ought to buy domestic products for the good of the country.
51	I want a car equipped with the latest features and technology.
52	I have a relationship with my car.
53	Quality and reliability of products are my main concerns.
54	Image is not important to me in a car.
55	Cars all look the same these days.
56	Most environmentally friendly products do not perform as well as those they replaced
57	I want a car that has character.
58	For me a car is a symbol of freedom and independence.
59	I am interested in car maintenance.
60	When buying a car I only consider a national make.
61	The government should implement policies that favor public transportation.
62	The government is right to tax large cars more heavily than small cars.

Data

The dataset contains responses from 250 potential small car buyers to different tasks and questions. First it contains the preference ranking of the Ford Ka from a set of 10 small cars along with six demographic variables describing these respondents. The demographic variables are: age, gender, marital status, number of children in the household, first-time car purchase, and household income gap. Second, it contains the responses for the same 250 respondents to a set of 62 attitudinal questions about cars and driving.

1.1 Prepare Workspace

# clean workspace
rm(list = ls())

# Set the seed for reproducibility
set.seed(123)

# Prepare needed libraries
packages <- c("ggplot2",     # Visualizations 
              "dplyr",       # Data manipulation
              "tidyr",       # Data formatting 
              "stargazer",   # LaTex tables 
              "knitr",       # markdown output formatting
              "factoextra",  # clustering methods 
              "readxl",      # read in excel files
              "nnet",        # mutinomial logit
              "Hmisc",       # 
              "psych"       # psycholoical research 
              )

  for (i in 1:length(packages)) {
    if (!packages[i] %in% rownames(installed.packages())) {
      install.packages(packages[i]
                       , repos = "http://cran.rstudio.com/"
                       , dependencies = TRUE
                       )
    }
    library(packages[i], character.only = TRUE)
  }

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:dplyr':
## 
##     src, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

## 
## Attaching package: 'psych'

## The following object is masked from 'package:Hmisc':
## 
##     describe

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

# Set working directory for chunk  
setwd("~/Documents/MarketResearch")

# load data from excel file and then make them into a data frame
demographic <- data.frame(
  read_excel("Ford_Ka.xls",
             sheet = "Demographic Data", 
             range = "A7:J257")
)

mds <- data.frame(read_excel("Ford_Ka.xls", 
                             sheet = "MDS Data", 
                             range = "A12:M1892")
)

## New names:
## • `` -> `...3`

pys_data <- data.frame(read_excel("Ford_Ka.xls",
                                  sheet = "Psychographic Data", 
                                  range = "A7:BK257")
)

questions <- data.frame(read_excel("Ford_Ka.xls",
                                   sheet = "Psychographic questionnaire", 
                                   range = "A7:B69")
)

2.0 Explore Data

2.1 MetaData table

df <- merge(pys_data,demographic,by = "Respondent.Number",all.x = TRUE)

Demographic Data
Preference Group
1	Ka Chooser (top 3)
2	Ka Non-Chooser (bottom 3)
3	Middle (middle 4)
Gender
1	Male
2	Female
Marital Status
1	Married
2	Living Together
3	Single
Number of Children
Number of children living in household
1st Time Purchase
1	yes
2	no
Age Category	Age
1	<25
2	25 - 29
3	30 - 34
4	35 - 39
5	40 - 44
6	>44
Children Category	Number of Children in Household
0	0
1	1
2	>1
Income Category	Annual Household Income
1	<100K
2	100K - 150K
3	150K - 200K
4	200K - 250K
5	250K - 300K
6	>300K

2.2 Cross Tabulations.

2.3 Summary Statistics

# make dependent variable factor
df$Preference.Group <- factor(df$Preference.Group, levels = c(1, 2, 3), labels = c("Ka Chooser", "Ka Non-Chooser", "Middle"))

# summary stats for our data sets 
stargazer(df[2:62], type = "text", title = "Survey Questions")

## 
## Survey Questions
## ====================================
## Statistic  N  Mean  St. Dev. Min Max
## ------------------------------------
## Q1        250 5.100  1.537    1   7 
## Q2        250 4.060  1.966    1   7 
## Q3        250 4.444  1.289    1   7 
## Q4        250 4.236  1.629    1   7 
## Q5        250 3.848  1.835    1   7 
## Q6        250 3.992  1.041    1   7 
## Q7        250 3.880  0.966    2   6 
## Q8        250 3.916  0.996    1   7 
## Q9        250 3.904  1.093    1   7 
## Q10       250 3.916  1.036    1   7 
## Q11       250 3.984  1.068    2   7 
## Q12       250 4.072  1.135    1   7 
## Q13       250 3.988  1.146    1   6 
## Q14       250 4.132  2.089    1   7 
## Q15       250 4.972  1.407    2   7 
## Q16       250 4.512  1.345    1   7 
## Q17       250 4.444  1.880    1   7 
## Q18       250 4.532  1.354    1   7 
## Q19       250 4.688  1.314    1   7 
## Q20       250 3.832  1.938    1   7 
## Q21       250 4.912  1.389    2   7 
## Q22       250 4.992  1.365    1   7 
## Q23       250 4.120  1.976    1   7 
## Q24       250 2.376  1.272    1   6 
## Q25       250 3.148  1.347    1   7 
## Q26       250 3.012  1.393    1   7 
## Q27       250 3.460  1.323    1   7 
## Q28       250 3.120  1.398    1   7 
## Q29       250 3.448  1.344    1   7 
## Q30       250 3.344  1.290    1   6 
## Q31       250 4.056  2.072    1   7 
## Q32       250 4.604  1.311    2   7 
## Q33       250 4.564  1.279    1   7 
## Q34       250 4.496  1.284    1   7 
## Q35       250 4.584  1.297    1   7 
## Q36       250 4.452  1.261    2   7 
## Q37       250 4.836  1.476    1   7 
## Q38       250 4.616  1.294    2   7 
## Q39       250 3.444  1.376    1   7 
## Q40       250 3.368  1.261    1   7 
## Q41       250 3.912  2.098    1   7 
## Q42       250 3.148  1.399    1   7 
## Q43       250 3.392  1.313    1   7 
## Q44       250 4.260  1.903    1   7 
## Q45       250 4.744  1.456    1   7 
## Q46       250 4.752  1.476    1   7 
## Q47       250 4.768  1.568    1   7 
## Q48       250 4.776  1.533    1   7 
## Q49       250 4.776  1.504    2   7 
## Q50       250 4.812  1.478    1   7 
## Q51       250 3.308  1.455    1   7 
## Q52       250 3.532  1.824    1   7 
## Q53       250 3.616  1.829    1   7 
## Q54       250 3.160  1.469    1   7 
## Q55       250 3.136  1.493    1   6 
## Q56       250 3.148  1.512    1   7 
## Q57       250 4.316  1.347    1   7 
## Q58       250 4.384  1.325    2   7 
## Q59       250 4.320  1.265    1   7 
## Q60       250 3.772  1.377    1   7 
## Q61       250 3.680  1.293    1   7 
## ------------------------------------

stargazer(df[64:72], type = "text", title = "Demographic Data")

## 
## Demographic Data
## ==============================================
## Statistic           N   Mean  St. Dev. Min Max
## ----------------------------------------------
## Gender             250 1.480   0.501    1   2 
## Age                250 36.364  9.107   20  58 
## Marital.Status     250 1.872   0.935    1   3 
## Number.of.Children 250 0.728   1.036    0   4 
## X1st.Time.Purchase 250 1.852   0.356    1   2 
## Age.Category       250 3.768   1.619    1   6 
## Children.Category  250 0.624   0.818    0   2 
## Income.Category    250 3.680   1.571    1   6 
## ----------------------------------------------

2.4 Analysis.

The Chi-squared test for cross tabulations is testing if the selected variables are assoicated with the car prefrence groups. Most of the vairbales are accepting the null hypotheses that the vairbales are independent of each other AKA no Association. Based on the chi-squared test results of our crosstabs, First Time Purchase stands out as a potentially useful demographic variable for differentiating between the preference groups, while the other variables do not show significant associations at the .05 level. Age catagory and gender show significance at the .1 level but this is not quite enoght cofidence.

3.0 Mutinomial logit on demographic data.

# Fit multinomial logistic regression model
model <- multinom(Preference.Group ~ Gender + Marital.Status + Number.of.Children + X1st.Time.Purchase + Age.Category + Income.Category, data = df)

## # weights:  24 (14 variable)
## initial  value 274.653072 
## iter  10 value 252.853559
## final  value 251.798781 
## converged

# Summary of the model
stargazer(model,type = "text")

## 
## ===============================================
##                        Dependent variable:     
##                    ----------------------------
##                     Ka Non-Chooser     Middle  
##                           (1)           (2)    
## -----------------------------------------------
## Gender                  -0.104        -0.880** 
##                         (0.310)       (0.347)  
##                                                
## Marital.Status           0.240         0.264   
##                         (0.164)       (0.178)  
##                                                
## Number.of.Children      -0.064         -0.030  
##                         (0.147)       (0.164)  
##                                                
## X1st.Time.Purchase      -0.073        -1.065** 
##                         (0.493)       (0.450)  
##                                                
## Age.Category             0.088        -0.190*  
##                         (0.099)       (0.105)  
##                                                
## Income.Category         -0.128         -0.064  
##                         (0.099)       (0.105)  
##                                                
## Constant                -0.455        3.035**  
##                         (1.215)       (1.184)  
##                                                
## -----------------------------------------------
## Akaike Inf. Crit.       531.598       531.598  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

3.1 Analysis.

Multinomial model allows us to understand the interaction of covariates and take advantage of correlation in our data set. Multinomial Logit is and extension of the logit model that allows us to compare more than two categories. In the context of this business question we can use it to find relationship between ouput categories and the independent variables.

Interestingly enough the results of the crosstab table chi-squared tests are showing basically the same results in-terms of variable importance. The regression seems to have identified the same variables as influential. It seems that women are still less likly to by the ford ka then men, also as individuals get older the odds that they by a ford ka get lower.

4.0 Factor Analysis of survey questions.

Factor analysis can be a powerful tool for both customer segmentation and interpreting survey questions. It helps in simplifying complex data, uncovering hidden patterns, and providing insights into customer preferences. This tool will be very helpful in our case because of the high number of questions. We hope to better understand unobserved beliefs and attitudes of our sample. This will really help us interpret the results of the cluster analysis that we will do later.

4.1 Explore Viablitly of data.

# Provide example of surevy data 
head(df[2:5])

##   Q1 Q2 Q3 Q4
## 1  6  2  4  3
## 2  7  7  7  5
## 3  5  4  6  5
## 4  4  2  5  4
## 5  5  5  7  6
## 6  6  6  4  4

Our survey data is liker scale data, this can often be treated as interval data due to the assumption that the intervals between each response option are equal. This assumption is often considered reasonable for Likert scales with five or more response options like ours.

# Pearsons Correlation used to create square matrix
corr_matrix <- as.data.frame(cor(df[2:62]))

Kaiser, Meyer, Olkin Measure of Sampling Adequacy. This test is used to see if our data is suitable for factor analysis

KMO Value	Interpretation
> 0.9	Marvelous
> 0.80	Meritorious
> 0.70	Middling
> 0.60	Mediocre
> 0.50	Miserable
< 0.50	Unacceptable

# Kaiser, Meyer, Olkin Measure of Sampling Adequacy
KMO(corr_matrix)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = corr_matrix)
## Overall MSA =  0.95
## MSA for each item = 
##   Q1   Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  Q10  Q11  Q12  Q13  Q14  Q15  Q16 
## 0.95 0.97 0.93 0.93 0.95 0.35 0.51 0.57 0.53 0.54 0.35 0.47 0.39 0.96 0.95 0.92 
##  Q17  Q18  Q19  Q20  Q21  Q22  Q23  Q24  Q25  Q26  Q27  Q28  Q29  Q30  Q31  Q32 
## 0.95 0.91 0.94 0.97 0.96 0.96 0.97 0.96 0.94 0.94 0.92 0.94 0.93 0.92 0.96 0.91 
##  Q33  Q34  Q35  Q36  Q37  Q38  Q39  Q40  Q41  Q42  Q43  Q44  Q45  Q46  Q47  Q48 
## 0.95 0.95 0.94 0.93 0.94 0.95 0.92 0.91 0.96 0.91 0.92 0.95 0.97 0.95 0.97 0.96 
##  Q49  Q50  Q51  Q52  Q53  Q54  Q55  Q56  Q57  Q58  Q59  Q60  Q61 
## 0.97 0.97 0.97 0.98 0.97 0.97 0.96 0.97 0.90 0.85 0.88 0.91 0.92

ev <- eigen(corr_matrix,        
            symmetric = TRUE
            )# get eigenvalues
ev$values

##  [1] 16.53133961 10.76348062  5.52320090  1.46448014  1.27596377  1.21277417
##  [7]  1.16913747  1.12298138  1.01034021  0.97680604  0.89987797  0.84614765
## [13]  0.79386927  0.76649120  0.73966062  0.72385450  0.69953134  0.64355283
## [19]  0.63649088  0.62564452  0.60636969  0.58976371  0.56902964  0.54626872
## [25]  0.52157922  0.51461163  0.49515295  0.46637663  0.45699510  0.44306563
## [31]  0.41499714  0.40119046  0.39764526  0.37740356  0.36108326  0.34455595
## [37]  0.32981949  0.32343878  0.31899475  0.30304474  0.29428602  0.27931649
## [43]  0.26084867  0.25932726  0.23781229  0.23303562  0.22466281  0.21078973
## [49]  0.20458061  0.19155644  0.18296289  0.16257573  0.15271931  0.14826550
## [55]  0.13985984  0.12503473  0.12238596  0.10675513  0.09262940  0.08801333
## [61]  0.07557084

# plot eigenvalues in scree plot 
scree(corr_matrix,
      pc=FALSE
      )  # Use pc=FALSE for factor analysis

Nfacs <- 3  # Number of facrors based on total variation in eigenvalues.

fit <- factanal(df[2:63],
                Nfacs,
                rotation="varimax"
                )


print(fit,
      digits=2,
      cutoff=0.3,
      sort=TRUE
      )

## 
## Call:
## factanal(x = df[2:63], factors = Nfacs, rotation = "varimax")
## 
## Uniquenesses:
##   Q1   Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  Q10  Q11  Q12  Q13  Q14  Q15  Q16 
## 0.35 0.16 0.53 0.30 0.27 0.99 0.97 0.98 0.98 0.99 0.99 0.99 0.99 0.12 0.39 0.53 
##  Q17  Q18  Q19  Q20  Q21  Q22  Q23  Q24  Q25  Q26  Q27  Q28  Q29  Q30  Q31  Q32 
## 0.17 0.52 0.58 0.20 0.40 0.41 0.18 0.31 0.37 0.44 0.50 0.41 0.57 0.51 0.12 0.57 
##  Q33  Q34  Q35  Q36  Q37  Q38  Q39  Q40  Q41  Q42  Q43  Q44  Q45  Q46  Q47  Q48 
## 0.58 0.55 0.52 0.49 0.39 0.56 0.55 0.64 0.12 0.41 0.55 0.17 0.38 0.34 0.38 0.38 
##  Q49  Q50  Q51  Q52  Q53  Q54  Q55  Q56  Q57  Q58  Q59  Q60  Q61  Q62 
## 0.42 0.40 0.34 0.21 0.22 0.39 0.34 0.38 0.62 0.58 0.59 0.57 0.51 0.65 
## 
## Loadings:
##     Factor1 Factor2 Factor3
## Q1   0.64   -0.31    0.38  
## Q2   0.91                  
## Q14 -0.88                  
## Q15 -0.61   -0.40          
## Q18 -0.53   -0.30   -0.33  
## Q19 -0.52                  
## Q20 -0.89                  
## Q21 -0.62   -0.37          
## Q22 -0.56   -0.44          
## Q23  0.90                  
## Q25  0.55    0.51          
## Q26  0.57    0.40          
## Q28  0.57    0.46          
## Q31 -0.60    0.53    0.50  
## Q41  0.63   -0.50   -0.47  
## Q44  0.65   -0.63          
## Q45  0.73                  
## Q46  0.76   -0.30          
## Q47  0.75                  
## Q48  0.72                  
## Q49  0.70                  
## Q50  0.71                  
## Q51 -0.76                  
## Q52 -0.70            0.54  
## Q53 -0.70            0.54  
## Q54 -0.76                  
## Q55 -0.79                  
## Q56 -0.72                  
## Q3           0.67          
## Q4           0.63   -0.55  
## Q5   0.35    0.75          
## Q17 -0.36   -0.81          
## Q24 -0.31    0.62    0.45  
## Q32          0.64          
## Q33          0.64          
## Q34          0.66          
## Q35          0.69          
## Q36          0.70          
## Q38          0.66          
## Q39         -0.66          
## Q40         -0.60          
## Q43         -0.66          
## Q37          0.51    0.55  
## Q42         -0.49   -0.57  
## Q57                  0.58  
## Q58                  0.64  
## Q59                  0.60  
## Q60                 -0.65  
## Q61                 -0.65  
## Q62                 -0.56  
## Q6                         
## Q7                         
## Q8                         
## Q9                         
## Q10                        
## Q11                        
## Q12                        
## Q13                        
## Q16 -0.50   -0.33   -0.34  
## Q27  0.46    0.41    0.35  
## Q29  0.44    0.37    0.30  
## Q30  0.50    0.32    0.37  
## 
##                Factor1 Factor2 Factor3
## SS loadings      15.35   10.43    6.18
## Proportion Var    0.25    0.17    0.10
## Cumulative Var    0.25    0.42    0.52
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 1697.61 on 1708 degrees of freedom.
## The p-value is 0.566

# create factor loadings data frame 
load <- as.data.frame(fit$loadings[,1:3])

# Plot first two Factor loadings 
ggplot(load, aes(x = Factor1, y = Factor2, label = rownames(load))) +
  geom_text(size = 3, hjust = 0, vjust = 0) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") + 
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
  labs(x = "Factor 1", y = "Factor 2", title = "Factor Loadings Plot") +
  theme_update()

4.2 Cluster Analysis of Loadings

# Set the number of clusters (k)
k1 <- 3

# Run k-means clustering
kmeans_question_result <- kmeans(load[,1:2], centers = k1)

# View the clustering results
print(kmeans_question_result)

## K-means clustering with 3 clusters of sizes 18, 28, 16
## 
## Cluster means:
##      Factor1    Factor2
## 1  0.1232949  0.5419066
## 2  0.3165536 -0.2018250
## 3 -0.6558390 -0.1017303
## 
## Clustering vector:
##  Q1  Q2  Q3  Q4  Q5  Q6  Q7  Q8  Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 
##   2   2   1   1   1   2   2   1   2   2   2   2   2   3   3   3   3   3   3   3 
## Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32 Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40 
##   3   3   2   1   1   1   1   1   1   1   3   1   1   1   1   1   1   1   2   2 
## Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48 Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 Q59 Q60 
##   2   2   2   2   2   2   2   2   2   2   3   3   3   3   3   3   2   2   2   2 
## Q61 Q62 
##   2   2 
## 
## Within cluster sum of squares by cluster:
## [1] 2.136509 5.037992 2.052705
##  (between_SS / total_SS =  64.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

# Create a vector of total within-cluster sum of squares
wss_values <- numeric(10)  
for(k in 1:10) {
  kmeans_temp <- kmeans(load[,1:2], centers = k)
  wss_values[k] <- kmeans_temp$tot.withinss
}

# Plot the scree plot
plot(1:10, wss_values, type = "b", pch = 19, frame = FALSE,
     xlab = "Number of Clusters (k)", ylab = "Total Within Sum of Squares",
     main = "Scree Plot for K-Means Clustering")

# Visualize the clustering results with factoextra
fviz_cluster(kmeans_question_result, 
             data = load[,1:2],
             geom = "point",
             stand = FALSE,
             main = "Kmeans Cluster Plot of Questions"
             )

4.3 Analysis of results.

We have 62 questions in total so it is helpful to break down our questions into the underlying attitudes that each question is trying to understand. Each of the 4 clusters is measuring a unobserved attitude in our sample, we used factor analysis to find the structure of correlations between questions and then clusters them to make interpretation easier. We have labeled each cluster based off of the description of the questions and

# make data frame with question description and cluster.

questions_df$cluster <- kmeans_question_result$cluster

# Iterate over unique cluster labels
unique_clusters <- unique(questions_df$cluster)
for (cluster_label in unique_clusters) {
  cat("Cluster", cluster_label, " :\n")
  # Filter DataFrame for current cluster
  cluster_df <- subset(questions_df, cluster == cluster_label)
  print(cluster_df)
}

## Cluster 2  :
##    Question_Number
## 1                1
## 2                2
## 6                6
## 7                7
## 9                9
## 10              10
## 11              11
## 12              12
## 13              13
## 23              23
## 39              39
## 40              40
## 41              41
## 42              42
## 43              43
## 44              44
## 45              45
## 46              46
## 47              47
## 48              48
## 49              49
## 50              50
## 57              57
## 58              58
## 59              59
## 60              60
## 61              61
## 62              62
##                                                                     Statement
## 1                                                I want a car that is trendy.
## 2                                                     I am fashion conscious.
## 6                             Today's cars last much longer than yesterday's.
## 7                                My car must function with total reliability.
## 9                        I am looking for a car which delivers a smooth ride.
## 10                   When I buy a car, dependability is most important to me.
## 11                          Today's cars are more efficient than yesterday's.
## 12                                        I want a car that is fuel economic.
## 13                                                           I love to drive.
## 23                Buying a car on a lower interest rate does not interest me.
## 39                           Small cars take up less room in today's traffic.
## 40                                                       I prefer small cars.
## 41                      In today's world it is anti-social to drive big cars.
## 42         Many manufacturers do not really care about their customers needs.
## 43          I would rather deal with a manufacturer's rep than a salesperson.
## 44                       I want to buy a car that makes a statement about me.
## 45                                          A car is an extension of oneself.
## 46                    I always want the latest style and design in a vehicle.
## 47                              When it comes to cars my heart rules my head.
## 48                               My car must have a very individual interior.
## 49                             Nowadays smart cars are mainly foreign brands.
## 50         People ought to buy domestic products for the good of the country.
## 57                                           I want a car that has character.
## 58                      For me a car is a symbol of freedom and independence.
## 59                                        I am interested in car maintenance.
## 60                         When buying a car I only consider a national make.
## 61 The government should implement policies that favor public transportation.
## 62    The government is right to tax large cars more heavily than small cars.
##    cluster
## 1        2
## 2        2
## 6        2
## 7        2
## 9        2
## 10       2
## 11       2
## 12       2
## 13       2
## 23       2
## 39       2
## 40       2
## 41       2
## 42       2
## 43       2
## 44       2
## 45       2
## 46       2
## 47       2
## 48       2
## 49       2
## 50       2
## 57       2
## 58       2
## 59       2
## 60       2
## 61       2
## 62       2
## Cluster 1  :
##    Question_Number
## 3                3
## 4                4
## 5                5
## 8                8
## 24              24
## 25              25
## 26              26
## 27              27
## 28              28
## 29              29
## 30              30
## 32              32
## 33              33
## 34              34
## 35              35
## 36              36
## 37              37
## 38              38
##                                                                 Statement
## 3                  I do not have the time to worry about car maintenance.
## 4                                     Basic transportation is all I need.
## 5                                         Small cars are not prestigious.
## 8                                    I want a car that is easy to handle.
## 24                        I want a car that drives well on country roads.
## 25                                I consider myself an authority on cars.
## 26                                               Small cars are for kids.
## 27                                              Small cars are for women.
## 28                                            Domestic made is best made.
## 29                                    A car is a fashion accessory to me.
## 30                             Having a masculine car is important to me.
## 32                                       City driving is my main concern.
## 33                      Fuel economy comes at the expense of performance.
## 34                                                I want a practical car.
## 35     I have always been fascinated by cars which have a cult following.
## 36 I like to believe that the car I drive will one day become a cult car.
## 37                                   I prefer cars with high performance.
## 38         I do not believe that a Swatch branded car will be successful.
##    cluster
## 3        1
## 4        1
## 5        1
## 8        1
## 24       1
## 25       1
## 26       1
## 27       1
## 28       1
## 29       1
## 30       1
## 32       1
## 33       1
## 34       1
## 35       1
## 36       1
## 37       1
## 38       1
## Cluster 3  :
##    Question_Number
## 14              14
## 15              15
## 16              16
## 17              17
## 18              18
## 19              19
## 20              20
## 21              21
## 22              22
## 31              31
## 51              51
## 52              52
## 53              53
## 54              54
## 55              55
## 56              56
##                                                                               Statement
## 14                         The car I buy must be able to handle long motorway journeys.
## 15                                    I want the most equipment I can get for my money.
## 16                                   I want a vehicle that is environmentally friendly.
## 17                                                I want a car that is nippy and zippy.
## 18                          I prefer buying my next car from the same car manufacturer.
## 19                                      I wish there were stricter exhaust regulations.
## 20                                              One should not spend beyond ones means.
## 21                                                 Good aerodynamics help fuel economy.
## 22                                                  Small cars are much safer nowadays.
## 31                                                            I want a comfortable car.
## 51                       I want a car equipped with the latest features and technology.
## 52                                                   I have a relationship with my car.
## 53                            Quality and reliability of products are my main concerns.
## 54                                               Image is not important to me in a car.
## 55                                                   Cars all look the same these days.
## 56 Most environmentally friendly products do not perform as well as those they replaced
##    cluster
## 14       3
## 15       3
## 16       3
## 17       3
## 18       3
## 19       3
## 20       3
## 21       3
## 22       3
## 31       3
## 51       3
## 52       3
## 53       3
## 54       3
## 55       3
## 56       3

With 3 clusters we see that there may be 3 un-observed attitudes that our survey is trying to measure. These unobserved attitudes could be useful for two things, marketing and and understanding our customer segmentation that will happen later on.

The First cluster seems to be measuring if people feel like a the car they buy is a reflection of ones self image. Questions in this group seek to understand the way people relate to there cars.

The Second cluster seeks to understand individuals transportation needs. All of these questions are getting at what customers expect out of there cars and are less about how the cars make them feel.

The Third cluster is a bit of a mess. This could be due to bad survey design or the assumptions made when doing the factor analysis. There are few recognizable patterns in the questions here so we will call this cluster the Misc category.

5.0 Cluster Analysis.

# z score normaliztion
norm_df <- scale(pys_data[2:63])

# Set the number of clusters (k)
k <- 3

# Run k-means clustering
kmeans_result <- kmeans(norm_df, centers = k)

# View the clustering results
print(kmeans_result)

## K-means clustering with 3 clusters of sizes 78, 107, 65
## 
## Cluster means:
##           Q1         Q2         Q3         Q4          Q5          Q6
## 1  0.9189956  1.2478298 -0.4339693 -0.1291026  0.08983196 -0.01694160
## 2 -0.2413421 -0.8197727 -0.3879569 -0.5692846 -0.78308955  0.02563333
## 3 -0.7055084 -0.1479238  1.1593999  1.0920532  1.18128752 -0.02186649
##            Q7            Q8          Q9         Q10         Q11         Q12
## 1  0.15075278 -0.1601580588  0.15814888  0.13058472  0.09902469 -0.06344246
## 2 -0.05959071 -0.0001125489 -0.04041168 -0.03619340 -0.01127253  0.01067256
## 3 -0.08280787  0.1923749433 -0.12325482 -0.09712177 -0.10027330  0.05856227
##           Q13         Q14        Q15        Q16        Q17        Q18
## 1 -0.06784445 -1.25377987 -0.7456005 -0.4473847 -0.2021133 -0.4972018
## 2 -0.04661848  0.95682895  0.8901652  0.5921163  0.9372114  0.5942913
## 3  0.15815453 -0.07055181 -0.5706283 -0.4378529 -1.3002583 -0.3816528
##          Q19        Q20        Q21        Q22        Q23         Q24        Q25
## 1 -0.5335093 -1.1968542 -0.7676171 -0.6609645  1.1978664 -0.83001425  0.6041397
## 2  0.6074743  0.8005130  0.8710725  0.8753576 -0.7984887 -0.02375309 -0.9010950
## 3 -0.3597850  0.1184574 -0.5127788 -0.6478159 -0.1230045  1.03511834  0.7583735
##          Q26        Q27        Q28        Q29        Q30        Q31        Q32
## 1  0.7002076  0.3499128  0.6571490  0.3534616  0.4389293 -1.2025333 -0.4021197
## 2 -0.8541464 -0.5806311 -0.8749025 -0.5558236 -0.5781547  0.2977006 -0.3752446
## 3  0.5658073  0.5359129  0.6516453  0.4908172  0.4250165  0.9529789  1.1002539
##          Q33        Q34        Q35        Q36         Q37        Q38        Q39
## 1 -0.3306456 -0.4262422 -0.3712513 -0.4296230 -0.68811960 -0.4164967  0.4318999
## 2 -0.4262760 -0.3717443 -0.4287214 -0.4177455 -0.01552838 -0.3676184  0.3835858
## 3  1.0984905  1.1234389  1.1512429  1.2032210  0.85130562  1.1049525 -1.1497212
##          Q40        Q41         Q42        Q43        Q44        Q45        Q46
## 1  0.3587600  1.2335194  0.56308163  0.3360024  1.1835836  1.1622906  1.1927935
## 2  0.3750989 -0.3366881  0.07456546  0.4486609 -0.1415085 -0.5046888 -0.4777561
## 3 -1.0479825 -0.9259828 -0.79844418 -1.1417678 -1.1873555 -0.5639533 -0.6448922
##          Q47        Q48        Q49        Q50        Q51        Q52        Q53
## 1  1.1533235  1.1661246  1.1290136  1.1331727 -1.1983904 -1.1492991 -1.1357711
## 2 -0.6147888 -0.4816946 -0.4785828 -0.4797348  0.5847159  0.6972978  0.7055274
## 3 -0.3719512 -0.6064062 -0.5669954 -0.5700900  0.4755361  0.2312995  0.2015186
##          Q54        Q55        Q56        Q57        Q58        Q59         Q60
## 1 -1.1559146 -1.1984553 -1.1659789 -0.2345310 -0.2317600 -0.1820337  0.26802998
## 2  0.5780356  0.5785220  0.5262789  0.3758655  0.3026818  0.3676517 -0.25525667
## 3  0.4355620  0.4858101  0.5328387 -0.3372953 -0.2201488 -0.3867709  0.09855577
##          Q61        Q62
## 1  0.4754763  0.3113705
## 2 -0.4463421 -0.3658248
## 3  0.1641761  0.2285592
## 
## Clustering vector:
##   [1] 2 1 3 2 3 1 1 1 2 2 1 3 2 2 1 2 1 2 1 2 3 3 1 3 3 2 3 2 2 2 3 1 2 2 3 3 3
##  [38] 3 1 2 1 2 3 2 3 3 2 2 3 2 2 2 3 3 2 2 2 3 1 3 2 2 1 2 2 3 1 1 2 1 1 2 2 3
##  [75] 1 2 2 3 2 2 1 3 1 1 3 1 2 1 2 3 2 2 3 2 3 1 2 2 1 2 3 3 3 1 2 1 2 3 1 1 3
## [112] 2 2 1 1 3 2 1 1 1 1 2 1 3 2 3 2 1 2 3 3 2 1 2 2 1 3 2 3 2 2 1 2 2 2 3 2 1
## [149] 1 1 1 2 1 1 1 2 1 1 2 2 2 3 1 1 1 1 1 3 1 2 1 2 1 1 3 1 2 2 3 3 2 2 3 2 3
## [186] 1 3 3 2 2 3 3 1 1 3 3 2 2 2 2 3 2 3 2 2 3 2 2 2 1 2 1 2 2 2 3 1 2 2 1 3 1
## [223] 3 1 1 1 1 2 2 2 2 2 2 2 3 2 2 1 1 2 2 1 3 2 1 1 3 2 3 1
## 
## Within cluster sum of squares by cluster:
## [1] 2367.564 4517.195 1940.985
##  (between_SS / total_SS =  42.8 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

5.1 Scree Plot.

# Create a vector of total within-cluster sum of squares
wss_values <- numeric(10)  
for(k in 1:10) {
  kmeans_temp <- kmeans(norm_df, centers = k)
  wss_values[k] <- kmeans_temp$tot.withinss
}

# Plot the scree plot
plot(1:10, wss_values, type = "b", pch = 19, frame = FALSE,
     xlab = "Number of Clusters (k)", ylab = "Total Within Sum of Squares",
     main = "Scree Plot for K-Means Clustering")

# Visualize the clustering results with factoextra
fviz_cluster(kmeans_result, data = norm_df, geom = "point", stand = FALSE, main = "Kmeans Cluster Plot")

5.2 Test four Clusters

# Set the number of clusters (k)
k <- 4

# Run k-means clustering
kmeans_result4 <- kmeans(norm_df, centers = k)

# View the clustering results
print(kmeans_result4)

## K-means clustering with 4 clusters of sizes 43, 65, 35, 107
## 
## Cluster means:
##           Q1         Q2         Q3         Q4          Q5          Q6
## 1  0.9182198  1.2117301 -0.4707469 -0.1020211  0.03214404  0.07468416
## 2 -0.7055084 -0.1479238  1.1593999  1.0920532  1.18128752 -0.02186649
## 3  0.9199487  1.2921808 -0.3887855 -0.1623742  0.16070570 -0.12951039
## 4 -0.2413421 -0.8197727 -0.3879569 -0.5692846 -0.78308955  0.02563333
##            Q7            Q8          Q9         Q10         Q11         Q12
## 1  0.05199564  0.4343785863  0.25794903  0.26066983 -0.35524519  0.24393378
## 2 -0.08280787  0.1923749433 -0.12325482 -0.09712177 -0.10027330  0.05856227
## 3  0.27208299 -0.8905887941  0.03553727 -0.02923412  0.65712768 -0.44107612
## 4 -0.05959071 -0.0001125489 -0.04041168 -0.03619340 -0.01127253  0.01067256
##           Q13         Q14        Q15        Q16        Q17        Q18
## 1  0.17282911 -1.22095365 -0.6909217 -0.5708548 -0.1001188 -0.4789201
## 2  0.15815453 -0.07055181 -0.5706283 -0.4378529 -1.3002583 -0.3816528
## 3 -0.36352910 -1.29410922 -0.8127774 -0.2956928 -0.3274207 -0.5196622
## 4 -0.04661848  0.95682895  0.8901652  0.5921163  0.9372114  0.5942913
##          Q19        Q20        Q21        Q22        Q23         Q24        Q25
## 1 -0.4175266 -1.1734665 -0.8578025 -0.7096714  1.2102381 -0.78945260  0.5808917
## 2 -0.3597850  0.1184574 -0.5127788 -0.6478159 -0.1230045  1.03511834  0.7583735
## 3 -0.6760023 -1.2255876 -0.6568178 -0.6011247  1.1826669 -0.87984714  0.6327015
## 4  0.6074743  0.8005130  0.8710725  0.8753576 -0.7984887 -0.02375309 -0.9010950
##          Q26        Q27        Q28        Q29        Q30        Q31        Q32
## 1  0.8263017  0.2323109  0.7294659  0.4106931  0.5445465 -1.1717427 -0.3898365
## 2  0.5658073  0.5359129  0.6516453  0.4908172  0.4250165  0.9529789  1.1002539
## 3  0.5452921  0.4943950  0.5683026  0.2831487  0.3091710 -1.2403617 -0.4172105
## 4 -0.8541464 -0.5806311 -0.8749025 -0.5558236 -0.5781547  0.2977006 -0.3752446
##          Q33        Q34        Q35        Q36         Q37        Q38        Q39
## 1 -0.3863492 -0.1689528 -0.1634086 -0.6719795 -0.67680564 -0.2423483  0.4715412
## 2  1.0984905  1.1234389  1.1512429  1.2032210  0.85130562  1.1049525 -1.1497212
## 3 -0.2622097 -0.7423406 -0.6266010 -0.1318708 -0.70201960 -0.6304505  0.3831978
## 4 -0.4262760 -0.3717443 -0.4287214 -0.4177455 -0.01552838 -0.3676184  0.3835858
##          Q40        Q41         Q42         Q43        Q44        Q45
## 1  0.3904349  1.2390616  0.59227332 -0.01515587  1.1340846  1.1663871
## 2 -1.0479825 -0.9259828 -0.79844418 -1.14176779 -1.1873555 -0.5639533
## 3  0.3198452  1.2267104  0.52721756  0.76742538  1.2443966  1.1572578
## 4  0.3750989 -0.3366881  0.07456546  0.44866094 -0.1415085 -0.5046888
##          Q46        Q47        Q48        Q49        Q50        Q51        Q52
## 1  1.1762319  1.1561749  1.1470693  1.1228693  1.1654422 -1.1865069 -1.1715320
## 2 -0.6448922 -0.3719512 -0.6064062 -0.5669954 -0.5700900  0.4755361  0.2312995
## 3  1.2131405  1.1498203  1.1895354  1.1365622  1.0935274 -1.2129900 -1.1219843
## 4 -0.4777561 -0.6147888 -0.4816946 -0.4785828 -0.4797348  0.5847159  0.6972978
##          Q53        Q54        Q55        Q56        Q57         Q58        Q59
## 1 -1.2012957 -1.1693068 -1.2122303 -1.1588818 -0.2172708 -0.07919395 -0.2162115
## 2  0.2015186  0.4355620  0.4858101  0.5328387 -0.3372953 -0.22014882 -0.3867709
## 3 -1.0552693 -1.1394614 -1.1815317 -1.1746983 -0.2557363 -0.41919839 -0.1400437
## 4  0.7055274  0.5780356  0.5785220  0.5262789  0.3758655  0.30268184  0.3676517
##           Q60        Q61        Q62
## 1  0.33451169  0.6251088  0.2879853
## 2  0.09855577  0.1641761  0.2285592
## 3  0.18635246  0.2916421  0.3401010
## 4 -0.25525667 -0.4463421 -0.3658248
## 
## Clustering vector:
##   [1] 4 1 2 4 2 3 3 3 4 4 1 2 4 4 1 4 1 4 3 4 2 2 1 2 2 4 2 4 4 4 2 1 4 4 2 2 2
##  [38] 2 1 4 3 4 2 4 2 2 4 4 2 4 4 4 2 2 4 4 4 2 1 2 4 4 3 4 4 2 3 3 4 1 3 4 4 2
##  [75] 3 4 4 2 4 4 1 2 1 1 2 1 4 1 4 2 4 4 2 4 2 1 4 4 3 4 2 2 2 1 4 1 4 2 1 1 2
## [112] 4 4 3 1 2 4 1 1 1 1 4 3 2 4 2 4 1 4 2 2 4 1 4 4 1 2 4 2 4 4 1 4 4 4 2 4 1
## [149] 3 3 1 4 1 3 1 4 3 3 4 4 4 2 3 3 3 3 3 2 3 4 3 4 1 1 2 1 4 4 2 2 4 4 2 4 2
## [186] 3 2 2 4 4 2 2 1 3 2 2 4 4 4 4 2 4 2 4 4 2 4 4 4 3 4 1 4 4 4 2 1 4 4 1 2 3
## [223] 2 3 3 3 3 4 4 4 4 4 4 4 2 4 4 3 1 4 4 1 2 4 1 3 2 4 2 1
## 
## Within cluster sum of squares by cluster:
## [1] 1251.5890 1940.9853  995.0902 4517.1950
##  (between_SS / total_SS =  43.6 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

# Visualize the clustering results with factoextra
fviz_cluster(kmeans_result4, data = norm_df, geom = "point", stand = FALSE, main = "Kmeans Cluster Plot")

5.3 Test 5 Clusters.

# Set the number of clusters (k)
k <- 5

# Run k-means clustering
kmeans_result5 <- kmeans(norm_df, centers = k)

# View the clustering results
print(kmeans_result5)

## K-means clustering with 5 clusters of sizes 30, 24, 24, 97, 75
## 
## Cluster means:
##           Q1         Q2         Q3          Q4           Q5          Q6
## 1  0.9973854  1.1904343 -0.3703141 -0.18575417 -0.007993759  0.07170632
## 2  0.8835534  1.2837019 -0.6677033 -0.06812350  0.128263496  0.00768282
## 3  0.8564505  1.2837019 -0.2798044 -0.11926727  0.173682581 -0.15237593
## 4 -0.1723407 -0.1826193  0.6072977  0.17784112  0.807676821  0.05718552
## 5 -0.7328615 -1.0615554 -0.3341102 -0.09574114 -1.138020596 -0.05634068
##            Q7          Q8          Q9         Q10         Q11          Q12
## 1  0.91778719  0.01739506 -0.33899766  0.43501840 -0.07866268 -0.004699441
## 2 -0.47959556  0.54426464  0.27833172 -0.16023607  0.56125197  0.266986999
## 3 -0.17769188 -1.08652215  0.65939923  0.04086342 -0.14109338 -0.467300677
## 4 -0.16390836  0.10499111 -0.13848544 -0.11794298 -0.06225048 -0.018022599
## 5  0.05520524  0.03077587  0.01463299  0.01673148 -0.02247505  0.089289381
##           Q13        Q14          Q15        Q16         Q17        Q18
## 1 -0.10588388 -1.2439605 -0.619839211 -0.5541409 -0.05887694 -0.4668939
## 2 -0.28041774 -1.2200260 -0.927863284 -0.1638136 -0.28055218 -0.4545813
## 3  0.19227814 -1.2998079 -0.720539389 -0.5975106 -0.30271971 -0.5777073
## 4  0.06445158  0.3217401 -0.009409268 -0.4573095 -0.51593929 -0.4234824
## 5 -0.01279915  0.8878138  0.787593860  1.0567337  0.87747927  1.0647938
##          Q19        Q20         Q21         Q22        Q23        Q24
## 1 -0.7267532 -1.2550772 -0.75284384 -0.67786994  1.2212217 -0.7937198
## 2 -0.1431176 -1.1604648 -0.80685847 -0.72670784  1.1832694 -0.8526983
## 3 -0.6823461 -1.1604648 -0.74684222 -0.57408942  1.1832694 -0.8526983
## 4 -0.3432437  0.1452226  0.01882902 -0.05455746 -0.1076751  1.0176571
## 5  0.9987781  1.0569071  0.77396956  0.75796409 -1.1065213 -0.4529550
##          Q25         Q26        Q27         Q28        Q29        Q30
## 1  0.2861514  0.66154452  0.1813501  0.41498355  0.4850941  0.3276268
## 2  0.9111792  1.18809950  0.4710066  1.01718670  0.4106931  0.8314693
## 3  0.6945854  0.26064471  0.4395222  0.59981818  0.1316896  0.1855174
## 4  0.1350780  0.01359074  0.5404670  0.09116951  0.4797249  0.4845199
## 5 -0.8030061 -0.74579332 -1.0629133 -0.80134755 -0.9880443 -1.0831321
##           Q31        Q32        Q33         Q34         Q35        Q36
## 1 -1.18521001 -0.5116677 -0.0239726 -0.04880697 -0.45034192 -0.5434951
## 2 -1.19325297 -0.4608061 -0.5386019 -0.35385050 -0.09690577 -0.3254097
## 3 -1.23346775 -0.2064981 -0.5060304 -0.97042785 -0.54673359 -0.3914962
## 4  1.03763089  0.5695346  0.6067723  0.62538006  0.67853381  0.6553270
## 5 -0.09136798 -0.3183936 -0.4408875 -0.36553302 -0.49146903 -0.4007483
##          Q37        Q38        Q39        Q40         Q41        Q42
## 1 -0.6568446 -0.1668840  0.5492640  0.3424971  1.28118245  0.3706722
## 2 -0.9335415 -0.2827756  0.2525936  0.5671287  1.19380021  0.6386711
## 3 -0.4817914 -0.8622338  0.4645011  0.1707200  1.21365981  0.7280040
## 4  0.9773729  0.6630754 -0.6146976 -0.6023450 -1.00959158 -0.9309488
## 5 -0.5484246 -0.4244209  0.3458329  0.4059225  0.02287826  0.6184223
##           Q43        Q44        Q45        Q46        Q47        Q48        Q49
## 1  0.05684632  1.2118732  1.1376300  1.1389483  1.1467842  1.1677968  1.1460589
## 2  0.30428010  1.1330664  1.1777035  1.2687848  1.2105417  1.1514929  1.0906616
## 3  0.71666973  1.1987387  1.1777035  1.1841088  1.1042793  1.1786661  1.1460589
## 4 -0.56530279 -0.7757187 -0.5323558 -0.5094106 -0.4699385 -0.5733070 -0.5295660
## 5  0.38168246 -0.2276640 -0.5202688 -0.5816675 -0.5916693 -0.4712925 -0.4892688
##          Q50        Q51        Q52        Q53        Q54        Q55        Q56
## 1  1.0516520 -1.2424411 -1.1689817 -1.1385746 -1.1751101 -1.1400811 -1.1337658
## 2  1.1418452 -1.1279093 -1.1598434 -1.1112401 -1.1580959 -1.2349391 -1.1447861
## 3  1.2264012 -1.2138081 -1.1141514 -1.1567976 -1.1297390 -1.2349391 -1.2274382
## 4 -0.5213814  0.5534649  0.6692461  0.6495362  0.4453846  0.4404631  0.5701731
## 5 -0.5041797  0.5305113  0.3297128  0.3411350  0.6261204  0.6767279  0.4751942
##          Q57        Q58        Q59        Q60        Q61        Q62
## 1 -0.1850519 -0.3904468 -0.4110961  0.2382323  0.5567713  0.4316441
## 2 -0.2963799 -0.1011338  0.2740641  0.3169168  0.5696595  0.2201795
## 3 -0.2345310 -0.1640279 -0.3518034  0.2563903  0.2796745  0.2522196
## 4  0.3010676  0.3948870  0.2849310 -0.4783516 -0.4700348 -0.3740486
## 5 -0.1454686 -0.2696900 -0.1791957  0.3399168  0.1134164  0.1599441
## 
## Clustering vector:
##   [1] 5 1 4 5 4 3 2 3 4 5 1 4 4 5 2 5 3 5 3 4 4 4 2 4 4 4 4 5 5 5 4 1 5 5 4 4 4
##  [38] 4 2 5 2 5 4 5 4 4 5 5 4 5 5 4 4 4 4 4 5 4 2 4 5 5 3 5 4 4 1 3 5 3 1 5 4 4
##  [75] 1 5 5 4 5 5 3 4 2 2 4 1 5 1 4 4 5 4 4 5 4 2 5 5 2 4 4 4 4 1 4 1 5 4 2 1 4
## [112] 4 5 3 1 4 5 1 1 1 2 5 3 4 4 4 5 2 5 4 4 4 3 4 5 2 4 5 4 5 5 1 5 5 4 4 4 1
## [149] 2 1 2 5 3 3 2 4 1 3 5 5 5 4 3 3 1 1 3 4 3 4 1 4 3 1 4 1 5 5 4 4 4 5 4 4 4
## [186] 1 4 4 5 5 4 4 2 2 4 4 4 5 5 5 4 5 4 4 5 4 5 5 4 3 5 1 4 5 5 4 1 5 5 2 4 2
## [223] 4 3 2 3 3 4 5 5 5 5 4 5 4 5 5 1 2 4 5 2 4 5 1 3 4 4 4 1
## 
## Within cluster sum of squares by cluster:
## [1]  846.1442  678.1318  645.8130 4675.0457 2328.8392
##  (between_SS / total_SS =  40.6 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

# Visualize the clustering results with factoextra
fviz_cluster(kmeans_result5, data = norm_df, geom = "point", stand = FALSE, main = "Kmeans Cluster Plot")

5.0 Evaluate Customer Segmentation.

# Add cluster labels to the original dataframe
pys_data$Cluster <- kmeans_result$cluster

# View the updated dataframe
str(pys_data)

## 'data.frame':    250 obs. of  64 variables:
##  $ Respondent.Number: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Q1               : num  6 7 5 4 5 6 7 6 6 6 ...
##  $ Q2               : num  2 7 4 2 5 6 7 7 4 2 ...
##  $ Q3               : num  4 7 6 5 7 4 3 3 6 4 ...
##  $ Q4               : num  3 5 5 4 6 4 3 4 1 5 ...
##  $ Q5               : num  1 4 7 2 7 5 5 5 3 1 ...
##  $ Q6               : num  5 4 5 4 3 3 4 3 4 5 ...
##  $ Q7               : num  5 5 3 5 4 5 4 4 4 3 ...
##  $ Q8               : num  3 4 5 4 5 2 4 3 4 4 ...
##  $ Q9               : num  4 5 4 3 4 5 4 5 3 5 ...
##  $ Q10              : num  4 5 5 4 2 3 3 3 4 3 ...
##  $ Q11              : num  4 4 5 4 5 4 7 4 4 4 ...
##  $ Q12              : num  5 4 5 4 4 4 4 4 3 5 ...
##  $ Q13              : num  4 4 6 6 4 5 5 5 6 4 ...
##  $ Q14              : num  7 2 3 5 5 1 1 1 7 5 ...
##  $ Q15              : num  6 3 3 6 4 2 3 4 7 7 ...
##  $ Q16              : num  7 4 4 7 3 4 5 4 5 6 ...
##  $ Q17              : num  6 4 2 6 2 4 3 5 7 6 ...
##  $ Q18              : num  5 3 4 5 5 5 3 4 5 6 ...
##  $ Q19              : num  5 4 3 5 4 4 4 4 2 6 ...
##  $ Q20              : num  6 2 4 5 5 1 2 2 5 7 ...
##  $ Q21              : num  7 4 2 6 5 3 2 4 6 7 ...
##  $ Q22              : num  5 4 3 6 5 5 4 5 7 7 ...
##  $ Q23              : num  2 7 3 1 3 7 7 7 3 2 ...
##  $ Q24              : num  1 1 4 1 4 1 1 1 5 1 ...
##  $ Q25              : num  2 4 2 1 5 5 5 4 1 3 ...
##  $ Q26              : num  3 3 4 2 2 3 3 4 2 3 ...
##  $ Q27              : num  1 4 5 2 2 5 4 5 4 3 ...
##  $ Q28              : num  1 5 3 2 3 4 6 5 2 2 ...
##  $ Q29              : num  2 7 3 3 1 5 4 5 5 1 ...
##  $ Q30              : num  2 4 4 1 4 3 5 5 4 3 ...
##  $ Q31              : num  4 1 7 2 6 2 1 1 7 3 ...
##  $ Q32              : num  4 5 5 5 7 6 3 5 5 2 ...
##  $ Q33              : num  5 5 7 4 7 3 5 4 5 3 ...
##  $ Q34              : num  4 5 5 5 7 4 2 3 4 6 ...
##  $ Q35              : num  3 3 7 4 5 4 3 4 5 4 ...
##  $ Q36              : num  4 3 6 4 7 6 4 5 3 4 ...
##  $ Q37              : num  4 4 5 3 7 4 3 5 7 4 ...
##  $ Q38              : num  3 7 7 3 7 2 5 4 4 5 ...
##  $ Q39              : num  5 4 3 6 2 3 3 6 5 4 ...
##  $ Q40              : num  3 3 2 2 1 3 2 4 5 4 ...
##  $ Q41              : num  5 7 1 4 3 6 7 7 1 4 ...
##  $ Q42              : num  5 6 1 5 2 4 5 4 2 3 ...
##  $ Q43              : num  4 4 1 4 2 3 5 4 3 4 ...
##  $ Q44              : num  3 7 1 2 3 7 6 7 3 3 ...
##  $ Q45              : num  4 6 4 4 4 6 7 7 5 4 ...
##  $ Q46              : num  4 6 3 5 5 6 7 7 5 4 ...
##  $ Q47              : num  4 7 4 4 5 7 6 6 2 4 ...
##  $ Q48              : num  5 6 4 3 2 7 6 7 5 4 ...
##  $ Q49              : num  4 6 3 3 5 6 6 6 3 5 ...
##  $ Q50              : num  4 7 2 4 3 7 6 6 4 4 ...
##  $ Q51              : num  5 1 4 5 4 2 1 1 3 6 ...
##  $ Q52              : num  4 1 4 2 4 2 2 1 7 5 ...
##  $ Q53              : num  2 1 3 3 6 1 2 1 6 5 ...
##  $ Q54              : num  4 1 5 5 4 2 1 2 4 3 ...
##  $ Q55              : num  5 1 6 4 5 1 1 1 3 5 ...
##  $ Q56              : num  4 1 3 4 5 2 1 1 4 5 ...
##  $ Q57              : num  5 5 4 4 4 5 4 5 6 3 ...
##  $ Q58              : num  3 4 4 2 5 4 5 5 7 4 ...
##  $ Q59              : num  4 3 5 5 4 4 4 4 6 4 ...
##  $ Q60              : num  4 5 3 5 3 4 3 6 2 3 ...
##  $ Q61              : num  4 4 4 5 4 4 5 4 2 3 ...
##  $ Q62              : num  2 5 4 3 5 4 4 4 2 4 ...
##  $ Cluster          : int  2 1 3 2 3 1 1 1 2 2 ...

str(questions)

##  chr [1:62] "I want a car that is trendy." "I am fashion conscious." ...

# Splitting the merged data frame into separate data sets based on 'Cluster'
cluster_list <- split(pys_data, pys_data$Cluster)

# Calculating cluster averages for each question
cluster_avg <- pys_data %>%
  group_by(Cluster) %>%
  summarise(across(starts_with("Q"), mean, na.rm = TRUE))

## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(starts_with("Q"), mean, na.rm = TRUE)`.
## ℹ In group 1: `Cluster = 1`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

# Viewing the cluster averages
print(cluster_avg)

## # A tibble: 3 × 63
##   Cluster    Q1    Q2    Q3    Q4    Q5    Q6    Q7    Q8    Q9   Q10   Q11
##     <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1       1  6.51  6.51  3.88  4.03  4.01  3.97  4.03  3.76  4.08  4.05  4.09
## 2       2  4.73  2.45  3.94  3.31  2.41  4.02  3.82  3.92  3.86  3.88  3.97
## 3       3  4.02  3.77  5.94  6.02  6.02  3.97  3.8   4.11  3.77  3.82  3.88
## # ℹ 51 more variables: Q12 <dbl>, Q13 <dbl>, Q14 <dbl>, Q15 <dbl>, Q16 <dbl>,
## #   Q17 <dbl>, Q18 <dbl>, Q19 <dbl>, Q20 <dbl>, Q21 <dbl>, Q22 <dbl>,
## #   Q23 <dbl>, Q24 <dbl>, Q25 <dbl>, Q26 <dbl>, Q27 <dbl>, Q28 <dbl>,
## #   Q29 <dbl>, Q30 <dbl>, Q31 <dbl>, Q32 <dbl>, Q33 <dbl>, Q34 <dbl>,
## #   Q35 <dbl>, Q36 <dbl>, Q37 <dbl>, Q38 <dbl>, Q39 <dbl>, Q40 <dbl>,
## #   Q41 <dbl>, Q42 <dbl>, Q43 <dbl>, Q44 <dbl>, Q45 <dbl>, Q46 <dbl>,
## #   Q47 <dbl>, Q48 <dbl>, Q49 <dbl>, Q50 <dbl>, Q51 <dbl>, Q52 <dbl>, …

# Reshape data for plotting
cluster_avg_long <- cluster_avg %>%
  pivot_longer(cols = starts_with("Q"), names_to = "Question", values_to = "Average")

# Plot heatmap with rotated y-axis labels
ggplot(cluster_avg_long, aes(x = as.factor(Cluster), y = Question, fill = Average)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "steelblue") +
  labs(x = "Cluster", y = "Question", fill = "Average") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 0, hjust = 0.5, vjust = 0.5),  # Rotate y-axis labels
        plot.margin = unit(c(2, 10, 2, 2), "mm"))  # Adjust plot margin for y-axis labels

# Creat crostab of cluster and buyers 
df$cluster <- kmeans_result$cluster

table(df$cluster, df$Preference.Group)

##    
##     Ka Chooser Ka Non-Chooser Middle
##   1         34             13     31
##   2         53             27     27
##   3         29             32      4

# create data frame with results of crostab
cluster.prefrence.data <- data.frame(
  Cluster = c(1, 2, 3),
  Ka_Chooser = c(53, 34, 29),
  Ka_Non_Chooser = c(27, 13, 32),
  Middle = c(27, 31, 4)
)

# Calculate the total count for each cluster
cluster.prefrence.data $total <- rowSums(cluster.prefrence.data[, c("Ka_Chooser", "Ka_Non_Chooser", "Middle")])

# Calculate proportions for each category within each cluster
cluster.prefrence.data$Ka_Chooser_Proportion <- cluster.prefrence.data$Ka_Chooser / cluster.prefrence.data$total
cluster.prefrence.data$Ka_Non_Chooser_Proportion <- cluster.prefrence.data$Ka_Non_Chooser / cluster.prefrence.data$total
cluster.prefrence.data$Middle_Proportion <- cluster.prefrence.data$Middle / cluster.prefrence.data$total

# Print the updated data frame with proportions
print(cluster.prefrence.data )

##   Cluster Ka_Chooser Ka_Non_Chooser Middle total Ka_Chooser_Proportion
## 1       1         53             27     27   107             0.4953271
## 2       2         34             13     31    78             0.4358974
## 3       3         29             32      4    65             0.4461538
##   Ka_Non_Chooser_Proportion Middle_Proportion
## 1                 0.2523364        0.25233645
## 2                 0.1666667        0.39743590
## 3                 0.4923077        0.06153846

# Create a bar chart
ggplot(cluster.prefrence.data, aes(x = Cluster, y = Ka_Chooser)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Ka Buyers by Cluster",
       x = "Cluster",
       y = "Count")

# Create a bar chart
ggplot(cluster.prefrence.data, aes(x = Cluster, y = Ka_Non_Chooser)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Non-Ka Buyers by Cluster",
       x = "Cluster",
       y = "Count")

# Create a bar chart
ggplot(cluster.prefrence.data, aes(x = Cluster, y = Middle)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Middle Buyers by Cluster",
       x = "Cluster",
       y = "Count")

5.1 Cluster Profilings.

Value Buyers (Cluster 1)
Agree with questions (Higher Average Scores):
The car I buy must be able to handle long motorway journeys (Q14)
I want the most equipment I can get for my money (Q15)
I want a car that is nippy and zippy (Q17)
Good aerodynamics help fuel economy (Q21)
Small cars are much safer nowadays (Q22)
Disagree with questions (Lower Average Scores):
Small cars are not prestigious (Q5)
I consider myself an authority on cars (Q25)
Small cars are for kids (Q26)
Small cars are for women (Q27)
Domestic made is best made (Q28)
Having a masculine car is important to me (Q30)
I want a comfortable car (Q31)
The government should implement policies that favor public transportation (Q61)
The government is right to tax large cars more heavily than small cars (Q62)
**Cluster 1 Profile**
Cluster 1’sattitudes suggest a practical approach to car buying, they tend to focus on

functionality, good features for the money, and performance. They are open to smaller cars but still value certain aspects of larger, more prestigious vehicles. Ford may be doing well with this group because they are a foreign auto maker.|

Car Enthuists (Cluster 2)
Agree with questions (Higher Average Scores):
I want a car that is trendy (Q1)
I am fashion conscious (Q2)
I am fashion conscious (Q7)
I want a car that is fuel economic (Q12)
I love to drive (Q13)
I want to buy a car that makes a statement about me (Q44)
Disagree with questions (Lower Average Scores):
The car I buy must be able to handle long motorway journeys (Q14)
One should not spend beyond one’s means (Q20)
Buying a car on a lower interest rate does not interest me (Q23)
I want a car that drives well on country roads (Q24)
I want a comfortable car (Q31)
I want a car equipped with the latest features and technology (Q51)
I have a relationship with my car (Q52)
Cluster 2 Profile
Cluster 3s people are Car enthusiasts who view their vehicles as expressions

of their personal style and identity. They prioritize trendy and fashion-conscious cars that make a statement about them. Fuel efficiency and driving enjoyment are important, along with a desire for comfortable and technologically advanced vehicles. Surpisingly this group buys the Ka at a high proportion. |

Bare-Bones car buyers (Cluster 3)
Agree with questions (Higher Average Scores):
I do not have the time to worry about car maintenance (Q3)
Basic transportation is all I need (Q4)
I want a car that is easy to handle (Q8)
Today’s cars are more efficient than yesterday’s (Q11)
I want a car that is fuel economic (Q12)
Disagree with questions (Lower Average Scores):
I want a car that is nippy and zippy (Q17)
I prefer buying my next car from the same car manufacturer (Q18)
I wish there were stricter exhaust regulations (Q19)
I want a car that drives well on country roads (Q24)
Small cars are for women (Q27)
I want to buy a car that makes a statement about me (Q44)
Cluster 3 Profile
Cluster 3s attaudes consists of “bare-bones” car buyers who prioritize

and basic transportation needs.They are not loyal to any one Brand. These people dont care too mucn about what they drive and they just want to make it from point A to B. I would consider them prespective customers of the Ka but currently they are not buying them at a high rate. This information could be used to advertize the Ka better |

Based on the data collection process and the results from analysis I think that the best way to segment customers in this situation is to use the kmeans clustering technique based on the attitudinal data. This is due to the changing demographics of the car small car market in France, the older secondary data used in this analysis is no longer as informative as it once was. This can be seen in the low levels of significance in the chi-saquared cross tabulations and the multinomial regression. Only two variables “Gender” and “Firs time buyer” are significant at the .05 level. Meanwhile the 3 clusters created by the kmeans algorithms did a great job of segmenting the sample into 3 groups. These groups are defined by their car buying attitudes and they give great insight into the ways that ford can market the Ka. The groups are defined by “bare-bones” buyers, “Value”, and car “enthusiasts”. All three groups buy the Ka at simmalr rates but because we have attiude data we can better market to these groups of people.

The small car market had some demographic shifts during the 80s and 90s. Small cars became increasingly popular for several reasons, but the end result was that more people would be willing to buy small cars. To me, this means that attitudes about small cars themselves may not be that informative because individuals may buy these cars even if they don’t really want them. Also, I think this means that the sample that Ford has may be too narrow because there could be plenty of potential buyers out there who are not covered by the sample. For example, they collected their data by looking into 3 groups that should cover much of the population, but I think because of changes in the car market, they may not be capturing all the data they need. I also think that Ford should have tried to collect more demographic data such as race, and rual vs urban because these categories could tell us a lot of information about potential buyers. The ford sample has no information about where in France individuals are from this also could limit the analysis becaue there could be cultural differences among regions that could have been handy to know. Overall I think that fords system is fairly sound.

Market Research -European Car Market

Avery Davis

2024-03-27