The study uses an old dataset for a survey done in 2015 with 1000 respondents, and analysis will be mainly based on pyschographic questions on attitude, opinion and interest (AIO) questions asked to respondents measuring areas on environmental matters. As the rating questions are many, unobserved characteristics will thus be determined by two unsupervised machine learning models:
After identifying clusters on kind of green consumer segments we have, we will build a robust classification ML model that can in future predict a consumer based on the underlying demographic variables. Three of the most robust models will be evaluated their performance and one selected.
Cluster analysis revealed different profile of green consumer, which are:
Champions: These are dedicated people on environmental matters. They are emotionally invested on environment and concerned with pollution.
Hunters Information seekers on environmental matters, and if called upon would make contribution on environmental conservation.
Remoaners:They are guilty about with environmental pollution but are not willing to participate in conservation.
Unwary: Uninformed on environmental matters; to them, source of livelihood is important even if it leads to destruction of the environment.
Uncommitted: Not committed at all on environmental matters- they have other pressing issues.
library(haven)
library(dplyr)
library(factoextra)
library(psych)
library(expss)
library(kableExtra)
library(nnet)
library(caret)
library(e1071)
library(randomForest)
data <- read_sav("E:/PARS Folder/Technical stuff/Environment consumer/data2.sav")
attach(data)
data$gender=factor(data$gender,labels=c('Male','Female'), levels=c(1,2))
data$location=factor(data$location,labels=c('Nairobi','Central','Coast',
'Eastern','North Eastern','Nyanza','Rift Valley',
'Western'),levels=c(1,2,3,4,5,6,7,8))
data$age=factor(data$age,labels=c('Below 18 years','18-24','25-34','35-44','45-54',
'55-65','>65'), levels=c(1,2,3,4,5,6,7))
data$marital=factor(data$marital,labels=c('Single','Married','Divorced/ Separated',
'Widowed/ Widower','Other(specify)'),levels=c(1,2,3,4,5))
data$sec=factor(data$sec,labels=c('A','B','C1','C2','D','E'),levels=c(1,2,3,4,5,6))
data$education=factor(data$education,labels=c('No formal education','Some primary','Completed primary education','Some secondary','Completed secondary','University / Polytechnic incomplete','University / Polytechnic complete','Post-university incomplete','Post-university complete'), levels=c(1,2,3,4,5,6,7,8,9))
data$renewable=factor(data$renewable,labels=c('yes','No'), levels=c(1,0))
data$aware_govt_initiative=factor(data$aware_govt_initiative,labels=c('yes','No'), levels=c(1,0))
data$envt_information=factor(data$envt_information,labels=c('yes','No'), levels=c(1,0))
data$non_renewable=factor(data$non_renewable,labels=c('yes','No'), levels=c(1,0))
Build two indices to measure awareness on environmental matters based on rating questions and a participation on environmental issues. The two indices are generated using the first principal component of PCA.
data.scale_aware=scale(data[10:13])
data.scale_participate=scale(data[15:16])
awareness_index<-principal(data.scale_participate, rotate="varimax", nfactors=1,covar=T, scores=TRUE)
participation_index<-principal(data.scale_participate, rotate="varimax", nfactors=1,covar=T, scores=TRUE)
awareness_index=awareness_index$scores
participation_index=participation_index$scores
data2<-cbind(data,awareness_index,participation_index)
names(data2)[42]<-"awareness_index"
names(data2)[43]<-"participation_index"
pca.plot<-prcomp(data[18:41],scale=T)
fviz_eig(pca.plot)
data.scale=scale(data[18:41])
prn<-principal(data.scale, rotate="varimax", nfactors=5,covar=T, scores=TRUE)
output<-prn$loadings
newd=read.csv("output.csv")
newd %>%
kable("html") %>%
kable_styling(font_size=12) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| Rating.Question | Champs | Hunters | Remoaners | Unwary | Uncommitted |
|---|---|---|---|---|---|
| I play a role in protecting the environment in Kenya | 0.582 | . | 0.133 | . | -0.116 |
| I want to protect the planet | 0.602 | 0.212 | . | . | . |
| I love the environment and animals | 0.664 | 0.227 | 0.131 | . | . |
| I feel guilty when I carry items packed with plastic materials | 0.152 | . | 0.678 | . | . |
| I am African and I care about my community | 0.714 | . | 0.182 | . | . |
| I want to make a difference and leave a mark | 0.682 | 0.138 | . | . | . |
| In my daily life I try to find ways to conserve water or power | 0.673 | . | 0.159 | . | . |
| If I come across information about environment, I will tend to look at it | 0.411 | 0.548 | . | . | . |
| I would like to join and actively participate in an environmentalist group. | 0.398 | 0.567 | 0.117 | . | . |
| I’ld donate some money to an environmental organization. | 0.153 | 0.65 | 0.188 | 0.123 | . |
| I’ld certainly devote some of it to working for environmental causes | 0.342 | 0.55 | 0.235 | . | . |
| I am not the kind of person who makes efforts to conserve natural resources. | -0.17 | . | . | . | 0.712 |
| I feel angry when I see waste been dumped in open sites | 0.523 | 0.153 | 0.24 | . | . |
| I feel angry when people cut forest trees for farming | 0.219 | 0.148 | 0.666 | . | -0.11 |
| I’m proud of the government efforts in environmental conservation | 0.181 | 0.24 | 0.204 | . | 0.372 |
| I feel guilty whenI throw plastic materials in the street | 0.244 | 0.154 | 0.71 | . | . |
| I’m sad when I hear news about pollution of our rivers by industries | 0.55 | . | 0.371 | . | . |
| I would feel proud after volunteering to an environmental activity | 0.503 | 0.291 | 0.28 | . | . |
| I have more pressing issues to worry about other than environment | -0.114 | . | -0.107 | . | 0.723 |
| I will give thanks and cherish “Mother Nature” | 0.592 | 0.183 | 0.208 | . | -0.133 |
| There is nothing we can do about climate change as it is already too late | -0.255 | 0.206 | . | 0.69 | . |
| The problems of the environment are not as bad as most people think | -0.112 | 0.178 | . | 0.755 | . |
| It is right for humans to use nature as a resource for economic purposes. | 0.243 | -0.259 | . | 0.636 | . |
| Protecting peoples’ source of livelihood is more important | . | -0.419 | . | 0.395 | 0.315 |
y=prn$scores %>%
data.frame()
y=prn$scores %>%
data.frame()
data2$Champs<-y$RC1
data2$Hunters<-y$RC3
data2$Remoaners<-y$RC5
data2$Unwary<-y$RC2
data2$Uncommitted<-y$RC4
data2<-data2[-c(18:41)]
head(data2) %>%
kable("html") %>%
kable_styling(font_size=12) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| SbjNum | location | age | gender | marital | renewable | aware_govt_initiative | envt_information | sec | aware_one | aware_two | aware_three | aware_four | education | participate_one | participate_two | non_renewable | awareness_index | participation_index | Champs | Hunters | Remoaners | Unwary | Uncommitted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11999193 | Central | 25-34 | Male | Single | yes | No | No | A | 0 | 5 | 5 | 5 | University / Polytechnic incomplete | 2 | 2 | No | 1.4224644 | 1.4224644 | 1.2181461 | 0.7810937 | -1.4490291 | 1.4479224 | -1.1009969 |
| 11999305 | Central | 35-44 | Male | Single | yes | No | No | C2 | 0 | 5 | 3 | 5 | University / Polytechnic incomplete | 2 | 1 | No | 0.0771632 | 0.0771632 | 1.5745056 | 0.3204658 | -3.2564672 | 0.1197839 | 0.1084916 |
| 11999984 | Central | 25-34 | Male | Single | No | yes | No | C1 | 0 | 1 | 5 | 5 | University / Polytechnic incomplete | 2 | 1 | yes | 0.0771632 | 0.0771632 | 0.5531007 | 0.0361897 | 0.5727102 | -0.3309916 | -0.5028206 |
| 12000457 | Central | 55-65 | Male | Divorced/ Separated | No | yes | No | C1 | 1 | 3 | 1 | 5 | Some primary | 2 | 1 | No | 0.0771632 | 0.0771632 | -2.8555465 | 2.7320131 | 0.5336392 | -1.8903485 | -0.2272935 |
| 12001126 | Central | 25-34 | Male | Single | yes | No | No | C1 | 0 | 5 | 5 | 5 | Post-university complete | 2 | 2 | No | 1.4224644 | 1.4224644 | -1.5132486 | 0.6531634 | -0.6242752 | -0.3480355 | -0.2504428 |
| 12001937 | Central | 45-54 | Male | Married | yes | yes | No | C1 | 0 | 5 | 5 | 5 | Completed secondary | 2 | 1 | No | 0.0771632 | 0.0771632 | 0.1529545 | 1.6651352 | 0.5015768 | -0.0147494 | -1.3145899 |
cluster_data = data2[20:24]
set.seed(1234)
kmeans = kmeans(x = cluster_data,iter.max=1000, centers = 5)
kmeans$centers
## Champs Hunters Remoaners Unwary Uncommitted
## 1 0.4405515 -1.2089632 0.05739103 -0.08682849 -0.4912145
## 2 0.3231998 0.1983640 0.34432095 -0.62873858 1.1063522
## 3 0.3623228 0.6689641 -0.08868706 -0.40004118 -0.8257980
## 4 -1.5917301 -0.1109957 -0.36407797 0.14819016 0.1125356
## 5 0.4721589 0.5187240 0.07329285 1.65298345 0.4204302
data2$Member<-kmeans$cluster
data2$Member=factor(data2$Member,labels=c('Champs','Reproachables','Hunters','Unpassionate','Unwary'), levels=c(1,2,3,4,5))
data2<-data2[-c(20:24)]
head(data2) %>%
kable("html") %>%
kable_styling(font_size=12) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| SbjNum | location | age | gender | marital | renewable | aware_govt_initiative | envt_information | sec | aware_one | aware_two | aware_three | aware_four | education | participate_one | participate_two | non_renewable | awareness_index | participation_index | Member |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11999193 | Central | 25-34 | Male | Single | yes | No | No | A | 0 | 5 | 5 | 5 | University / Polytechnic incomplete | 2 | 2 | No | 1.4224644 | 1.4224644 | Unwary |
| 11999305 | Central | 35-44 | Male | Single | yes | No | No | C2 | 0 | 5 | 3 | 5 | University / Polytechnic incomplete | 2 | 1 | No | 0.0771632 | 0.0771632 | Hunters |
| 11999984 | Central | 25-34 | Male | Single | No | yes | No | C1 | 0 | 1 | 5 | 5 | University / Polytechnic incomplete | 2 | 1 | yes | 0.0771632 | 0.0771632 | Hunters |
| 12000457 | Central | 55-65 | Male | Divorced/ Separated | No | yes | No | C1 | 1 | 3 | 1 | 5 | Some primary | 2 | 1 | No | 0.0771632 | 0.0771632 | Unpassionate |
| 12001126 | Central | 25-34 | Male | Single | yes | No | No | C1 | 0 | 5 | 5 | 5 | Post-university complete | 2 | 2 | No | 1.4224644 | 1.4224644 | Unpassionate |
| 12001937 | Central | 45-54 | Male | Married | yes | yes | No | C1 | 0 | 5 | 5 | 5 | Completed secondary | 2 | 1 | No | 0.0771632 | 0.0771632 | Hunters |
Anticipated interaction terms before building models and splitting into training and test data
interaction_1<-data2$awareness_index^2
interaction_2<-as.numeric(data2$sec)*as.numeric(data2$location)
interaction_3<-data2$participation_index^2
data2<-cbind(data2,interaction_1,interaction_2,interaction_3)
We start with splitting data to training and testing data
data<-na.omit(data2)
set.seed(1235)
data<-data[2:23]
data<- data[sample(nrow(data)),]
split <- floor(nrow(data)/2)
data_train <- data[0:split,]
data_test <- data[(split+1):(nrow(data)-1),]
mlogit=multinom(Member~.,data=data_train,maxit=1000)
## # weights: 240 (188 variable)
## initial value 730.684812
## iter 10 value 651.636407
## iter 20 value 560.137281
## iter 30 value 535.205359
## iter 40 value 526.486360
## iter 50 value 522.040933
## iter 60 value 520.218390
## iter 70 value 519.861618
## iter 80 value 519.783404
## iter 90 value 519.673790
## iter 100 value 519.641983
## iter 110 value 519.631458
## final value 519.627484
## converged
predictedML <- predict(mlogit,data_test,na.action =na.pass, type="probs")
predicted_classML <- predict(mlogit,data_test)
confusionMatrix(as.factor(predicted_classML),as.factor(data_test$Member))
## Confusion Matrix and Statistics
##
## Reference
## Prediction Champs Reproachables Hunters Unpassionate Unwary
## Champs 25 25 30 16 6
## Reproachables 11 17 17 7 5
## Hunters 28 28 42 18 19
## Unpassionate 15 18 11 51 5
## Unwary 14 15 18 1 11
##
## Overall Statistics
##
## Accuracy : 0.3223
## 95% CI : (0.2794, 0.3675)
## No Information Rate : 0.2605
## P-Value [Acc > NIR] : 0.001951
##
## Kappa : 0.141
##
## Mcnemar's Test P-Value : 0.004229
##
## Statistics by Class:
##
## Class: Champs Class: Reproachables Class: Hunters
## Sensitivity 0.26882 0.16505 0.35593
## Specificity 0.78611 0.88571 0.72239
## Pos Pred Value 0.24510 0.29825 0.31111
## Neg Pred Value 0.80627 0.78283 0.76101
## Prevalence 0.20530 0.22737 0.26049
## Detection Rate 0.05519 0.03753 0.09272
## Detection Prevalence 0.22517 0.12583 0.29801
## Balanced Accuracy 0.52746 0.52538 0.53916
## Class: Unpassionate Class: Unwary
## Sensitivity 0.5484 0.23913
## Specificity 0.8639 0.88206
## Pos Pred Value 0.5100 0.18644
## Neg Pred Value 0.8810 0.91117
## Prevalence 0.2053 0.10155
## Detection Rate 0.1126 0.02428
## Detection Prevalence 0.2208 0.13024
## Balanced Accuracy 0.7061 0.56060
model.svm = svm(formula = Member ~ .,
data = data_train,
type = 'C-classification',
kernel = 'radial')
y_pred = predict(model.svm, newdata = data_test[-19])
x.1<-as.factor(data_test$Member)
y_pred<-as.factor(y_pred)
confusionMatrix(as.factor(y_pred),as.factor(data_test$Member))
## Confusion Matrix and Statistics
##
## Reference
## Prediction Champs Reproachables Hunters Unpassionate Unwary
## Champs 40 46 51 14 18
## Reproachables 1 1 1 0 0
## Hunters 35 34 59 12 22
## Unpassionate 16 22 6 67 6
## Unwary 1 0 1 0 0
##
## Overall Statistics
##
## Accuracy : 0.3687
## 95% CI : (0.3241, 0.4149)
## No Information Rate : 0.2605
## P-Value [Acc > NIR] : 2.698e-07
##
## Kappa : 0.1857
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Champs Class: Reproachables Class: Hunters
## Sensitivity 0.4301 0.009709 0.5000
## Specificity 0.6417 0.994286 0.6925
## Pos Pred Value 0.2367 0.333333 0.3642
## Neg Pred Value 0.8134 0.773333 0.7973
## Prevalence 0.2053 0.227373 0.2605
## Detection Rate 0.0883 0.002208 0.1302
## Detection Prevalence 0.3731 0.006623 0.3576
## Balanced Accuracy 0.5359 0.501997 0.5963
## Class: Unpassionate Class: Unwary
## Sensitivity 0.7204 0.000000
## Specificity 0.8611 0.995086
## Pos Pred Value 0.5726 0.000000
## Neg Pred Value 0.9226 0.898004
## Prevalence 0.2053 0.101545
## Detection Rate 0.1479 0.000000
## Detection Prevalence 0.2583 0.004415
## Balanced Accuracy 0.7908 0.497543
classifier = randomForest(x = data_train[-19],
y = data_train$Member,
ntree = 1000)
y_pred = predict(classifier, newdata = data_test[-19])
confusionMatrix(as.factor(y_pred),as.factor(data_test$Member))
## Confusion Matrix and Statistics
##
## Reference
## Prediction Champs Reproachables Hunters Unpassionate Unwary
## Champs 27 27 32 10 7
## Reproachables 16 12 16 7 10
## Hunters 31 34 47 16 13
## Unpassionate 9 19 9 59 5
## Unwary 10 11 14 1 11
##
## Overall Statistics
##
## Accuracy : 0.3444
## 95% CI : (0.3007, 0.3901)
## No Information Rate : 0.2605
## P-Value [Acc > NIR] : 4.716e-05
##
## Kappa : 0.1651
##
## Mcnemar's Test P-Value : 0.02794
##
## Statistics by Class:
##
## Class: Champs Class: Reproachables Class: Hunters
## Sensitivity 0.2903 0.11650 0.3983
## Specificity 0.7889 0.86000 0.7194
## Pos Pred Value 0.2621 0.19672 0.3333
## Neg Pred Value 0.8114 0.76786 0.7724
## Prevalence 0.2053 0.22737 0.2605
## Detection Rate 0.0596 0.02649 0.1038
## Detection Prevalence 0.2274 0.13466 0.3113
## Balanced Accuracy 0.5396 0.48825 0.5589
## Class: Unpassionate Class: Unwary
## Sensitivity 0.6344 0.23913
## Specificity 0.8833 0.91155
## Pos Pred Value 0.5842 0.23404
## Neg Pred Value 0.9034 0.91379
## Prevalence 0.2053 0.10155
## Detection Rate 0.1302 0.02428
## Detection Prevalence 0.2230 0.10375
## Balanced Accuracy 0.7589 0.57534