Before the data was loaded, it was pre-cleanded in Excel. There were 100 responses to the survey, however, 27 participants quit after the informed consent. These were removed from the data. While loading the data in R, the Likert-scale variables from the survey get the following names.
# loading the data into a table with variable names:
data <- read.table("UI_data.csv", header = FALSE, sep = ";", col.names = c("age", "job_study", "tech_skill", "cust_centered", "support", "community", "social_media", "responsiveness", "compensation", "comp_improvement", "comp_high_effort", "accessibility", "communication", "cust_relationship", "informal_culture", "general"))
## Warning in read.table("UI_data.csv", header = FALSE, sep = ";", col.names =
## c("age", : incomplete final line found by readTableHeader on 'UI_data.csv'
The table added 8 empty rows at the end. These are being removed.
data <- head(data, n = nrow(data) - 8)
I start with some summary statistics to obtain a general insight into the data.
summary(data)
## age job_study tech_skill cust_centered
## Length:65 Length:65 Min. :1.000 Min. :1.000
## Class :character Class :character 1st Qu.:4.000 1st Qu.:3.000
## Mode :character Mode :character Median :4.000 Median :4.000
## Mean :3.969 Mean :3.431
## 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000
##
## support community social_media responsiveness compensation
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.0 1st Qu.:2.000
## Median :4.000 Median :2.000 Median :2.000 Median :4.0 Median :4.000
## Mean :3.723 Mean :2.138 Mean :2.615 Mean :3.4 Mean :3.569
## 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:4.000 3rd Qu.:4.0 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.0 Max. :5.000
##
## comp_improvement comp_high_effort accessibility communication
## Min. :1.000 Min. :1.000 Min. :2.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:3.000
## Median :4.000 Median :3.000 Median :4.000 Median :4.000
## Mean :3.609 Mean :3.354 Mean :4.308 Mean :3.446
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :1
## cust_relationship informal_culture general
## Min. :1.000 Min. :1.000 Length:65
## 1st Qu.:4.000 1st Qu.:3.000 Class :character
## Median :4.000 Median :3.000 Mode :character
## Mean :4.092 Mean :3.154
## 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000
##
A subset was made to perform analyses on the variables the literature suggests to improve users’s motivation for user involvement. This is all variables, except comp_improvement and comp_high_effort, since these are there to obtain more insights in the compensation variable. Also age, job_study and general are removed here, because they do not operate on a Likert scale. Lastly, tech_skill is excluded, because the literature suggests it does not improve user involvement motivation, but it does improve the user involvement quality for the company’s side.
variables <- data[, c("cust_centered", "support", "community", "social_media", "responsiveness", "compensation", "accessibility", "communication", "cust_relationship", "informal_culture")]
To check what variables are the most important, a barplot is created. This shows us that accessibility, customer relationship and support obtain the highest mean scores.
means <- colMeans(variables)
sorted_means <- sort(means)
barplot(sorted_means, main = "Means of UI improvement variables", xlab = "Variables\n\n\n", ylab = "Mean", las = 2, cex.names = 0.6)
In order to get an idea about the spread of the variables, boxplots are created. From the boxplot we can see the community variable is highly skewed. To obtain a better insight in the spread, a histogram per variable was made. It looks like we do not deal with normal distributed variables, except for maybe informal_culture. This will be checked later. Due to the distribution of the variables, I think a violin plots might be easier comparable than the boxplots, therefore these are created per variable as well.
#Converting data to long format:
df_long <- reshape2::melt(variables)
## No id variables; using all as measure variables
#Boxplots:
ggplot(df_long, aes(x = variable, y = value)) +
geom_boxplot() +
xlab("Variables") +
ylab("Likert Scale Value") +
ggtitle("Likert Scale Values per Variable") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#Histograms:
ggplot(df_long, aes(value)) +
geom_histogram() +
facet_wrap(~ variable, scales = "free")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#Violin plots:
ggplot(reshape2::melt(variables), aes(variable, value)) +
geom_violin() +
labs(x = "Variables", y = "Likert Scale Values") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
## No id variables; using all as measure variables
The first hypothesis I test is the following.
H0: There is no significant difference between the variables the literature suggests to improve users’ motivation to provide feedback
H1: There is a significant difference between the variables the literature suggests to improve users’ motivation to provide feedback
This hypothesis could tested by repeated measures ANOVA, since there is one group with multiple variables to compare. First, I start checking the assumptions. When doing a Shapiro-Wilk test, it appears all variables receive a p-value smaller than 0.05. This means all variables are not normally distributed at a alpha = 0.05 signifcance level. Therefore, I will continue with a non-parametric test.
#Shapiro-Wilk test per variable:
normality_tests <- lapply(variables, shapiro.test)
#p-values:
(p_values <- sapply(normality_tests, "[[", "p.value"))
## cust_centered support community social_media
## 7.781417e-06 7.778587e-07 4.301240e-08 5.661493e-05
## responsiveness compensation accessibility communication
## 1.112070e-06 3.965562e-06 1.116296e-08 1.315758e-05
## cust_relationship informal_culture
## 4.252946e-09 1.582941e-04
Friedman’s non-parametric test is used to analyze the variance between the variables. The test results in a p-value of 0.0000000000002488. This indicates the differences between some of the variables are statistically significant.
# transposing the data to recognize the variables as separate groups:
var_transposed <- t(data[, c("cust_centered", "support", "community", "social_media", "responsiveness", "compensation", "accessibility", "communication", "cust_relationship", "informal_culture")])
(friedman_res <- friedman.test(var_transposed))
##
## Friedman rank sum test
##
## data: var_transposed
## Friedman chi-squared = 164, df = 64, p-value = 1.006e-10
To find out what variables hold significant differences, I perform a Nemenyi test as post-hoc analysis. The results are converted to a table that shows the variable comparisons with ascending p-values.
# Creating a vector for each variable in the variables dataframe:
cust_centered <- variables$cust_centered #1
support <- variables$support #2
community <- variables$community #3
social_media <- variables$social_media #4
responsiveness <- variables$responsiveness #5
compensation <- variables$compensation #6
accessibility <- variables$accessibility #7
communication <- variables$communication #8
cust_relationship <- variables$cust_relationship #9
informal_culture <- variables$informal_culture #10
# Combining the vectors to a list
data_list <- list(cust_centered, support, community, social_media, responsiveness, compensation, accessibility, communication, cust_relationship, informal_culture)
# Nmenyi:
nemenyi <- NemenyiTest(data_list)
# Creating a vector to map comparisons to variable names
variable_names <- c("cust_centered", "support", "community", "social_media",
"responsiveness", "compensation", "accessibility",
"communication", "cust_relationship", "informal_culture")
#Creating a data frame with the results
nemenyi_df <- data.frame( comparison = c("cust_centered-support", "cust_centered-community", "cust_centered-social_media",
"cust_centered-responsiveness", "cust_centered-compensation", "cust_centered-accessibility",
"cust_centered-communication", "cust_centered-cust_relationship", "cust_centered-informal_culture",
"support-community", "support-social_media", "support-responsiveness", "support-compensation",
"support-accessibility", "support-communication", "support-cust_relationship", "support-informal_culture",
"community-social_media", "community-responsiveness", "community-compensation", "community-accessibility",
"community-communication", "community-cust_relationship", "community-informal_culture",
"social_media-responsiveness", "social_media-compensation", "social_media-accessibility",
"social_media-communication", "social_media-cust_relationship", "social_media-informal_culture",
"responsiveness-compensation", "responsiveness-accessibility", "responsiveness-communication",
"responsiveness-cust_relationship", "responsiveness-informal_culture",
"compensation-accessibility", "compensation-communication", "compensation-cust_relationship",
"compensation-informal_culture", "accessibility-communication", "accessibility-cust_relationship",
"accessibility-informal_culture", "communication-cust_relationship", "communication-informal_culture",
"cust_relationship-informal_culture"),
mean.rank.diff = c(58.191781, -196.047945, -132.178082, 12.554795, 30.143836,
156.917808, -7.821918, 121.869863, -44.315068, -254.239726,
-190.369863, -45.636986, -28.047945, 98.726027, -66.013699,
63.678082, -102.506849, 63.869863, 208.602740, 226.191781,
352.965753, 188.226027, 317.917808, 151.732877, 144.732877,
162.321918, 289.095890, 124.356164, 254.047945, 87.863014,
17.589041, 144.363014, -20.376712, 109.315068, -56.869863,
126.773973, -37.965753, 91.726027, -74.458904, -164.739726,
-35.047945, -201.232877, 129.691781, -36.493151, -166.184932),
pval = c(0.81410, 8.7e-07, 0.00592, 1.00000, 0.99746, 0.00030, 1.00000, 0.01732,
0.96036, 1.5e-11, 2.2e-06, 0.95226, 0.99855, 0.12684, 0.67515, 0.71982,
0.09558, 0.71623, 1.0e-07, 4.1e-09, 1.0e-13, 3.1e-06, 1.2e-13, 0.00058,
0.00139, 0.00014, 1.2e-13, 0.01350, 1.5e-11, 0.25896, 0.99997, 0.00146,
0.99989, 0.05500, 0.83404, 0.01053, 0.98599, 0.20442, 0.50371, 0.00010,
0.99210, 3.7e-07, 0.00775, 0.98942, 8.4e-05))
# Ascending the p-values
asc_nemenyi <- nemenyi_df[order(nemenyi_df$pval), ]
# Creating a better look:
kable(asc_nemenyi, format = "markdown", align = c("c", "c", "c"), caption = "Nemenyi Table")
| comparison | mean.rank.diff | pval | |
|---|---|---|---|
| 21 | community-accessibility | 352.965753 | 0.0000000 |
| 23 | community-cust_relationship | 317.917808 | 0.0000000 |
| 27 | social_media-accessibility | 289.095890 | 0.0000000 |
| 10 | support-community | -254.239726 | 0.0000000 |
| 29 | social_media-cust_relationship | 254.047945 | 0.0000000 |
| 20 | community-compensation | 226.191781 | 0.0000000 |
| 19 | community-responsiveness | 208.602740 | 0.0000001 |
| 42 | accessibility-informal_culture | -201.232877 | 0.0000004 |
| 2 | cust_centered-community | -196.047945 | 0.0000009 |
| 11 | support-social_media | -190.369863 | 0.0000022 |
| 22 | community-communication | 188.226027 | 0.0000031 |
| 45 | cust_relationship-informal_culture | -166.184932 | 0.0000840 |
| 40 | accessibility-communication | -164.739726 | 0.0001000 |
| 26 | social_media-compensation | 162.321918 | 0.0001400 |
| 6 | cust_centered-accessibility | 156.917808 | 0.0003000 |
| 24 | community-informal_culture | 151.732877 | 0.0005800 |
| 25 | social_media-responsiveness | 144.732877 | 0.0013900 |
| 32 | responsiveness-accessibility | 144.363014 | 0.0014600 |
| 3 | cust_centered-social_media | -132.178082 | 0.0059200 |
| 43 | communication-cust_relationship | 129.691781 | 0.0077500 |
| 36 | compensation-accessibility | 126.773973 | 0.0105300 |
| 28 | social_media-communication | 124.356164 | 0.0135000 |
| 8 | cust_centered-cust_relationship | 121.869863 | 0.0173200 |
| 34 | responsiveness-cust_relationship | 109.315068 | 0.0550000 |
| 17 | support-informal_culture | -102.506849 | 0.0955800 |
| 14 | support-accessibility | 98.726027 | 0.1268400 |
| 38 | compensation-cust_relationship | 91.726027 | 0.2044200 |
| 30 | social_media-informal_culture | 87.863014 | 0.2589600 |
| 39 | compensation-informal_culture | -74.458904 | 0.5037100 |
| 15 | support-communication | -66.013699 | 0.6751500 |
| 18 | community-social_media | 63.869863 | 0.7162300 |
| 16 | support-cust_relationship | 63.678082 | 0.7198200 |
| 1 | cust_centered-support | 58.191781 | 0.8141000 |
| 35 | responsiveness-informal_culture | -56.869863 | 0.8340400 |
| 12 | support-responsiveness | -45.636986 | 0.9522600 |
| 9 | cust_centered-informal_culture | -44.315068 | 0.9603600 |
| 37 | compensation-communication | -37.965753 | 0.9859900 |
| 44 | communication-informal_culture | -36.493151 | 0.9894200 |
| 41 | accessibility-cust_relationship | -35.047945 | 0.9921000 |
| 5 | cust_centered-compensation | 30.143836 | 0.9974600 |
| 13 | support-compensation | -28.047945 | 0.9985500 |
| 33 | responsiveness-communication | -20.376712 | 0.9998900 |
| 31 | responsiveness-compensation | 17.589041 | 0.9999700 |
| 4 | cust_centered-responsiveness | 12.554795 | 1.0000000 |
| 7 | cust_centered-communication | -7.821918 | 1.0000000 |
The next hypothesis tested is the following.
H0: There is no significant difference in motivational factors between users who have above average technology skills (scores 4 and 5 from the Likert-scale question in the survey) and users who indicate they are neutral or negative about their amount of technology skills (scores 1, 2 and 3)
H1: There is a significant difference in motivational factors between users who have above average technology skills (scores 4 and 5 from the Likert-scale question in the survey) and users who indicate they are neutral or negative about their amount of technology skills (scores 1, 2 and 3)
This is analyzed by a Wilcoxon test for each variable. None of the same variables are significantly different in the different subsets. Accessibility comes close to reaching significant difference at 90% confidence level with a p-value of 0.1149.
# creating a subset for the participants who scored their technology skills with an 1, 2 or 3:
neg_neutral_skills <- subset(data, tech_skill <= 3, select=c(cust_centered, support, community, social_media, responsiveness, compensation, accessibility, communication, cust_relationship, informal_culture))
# creating a subset for the participants who scored their technology skills with a 4 or 5:
pos_skills <- subset(data, tech_skill >= 4, select=c(cust_centered, support, community, social_media, responsiveness, compensation, accessibility, communication, cust_relationship, informal_culture))
# comparing the cust_centered variable:
cust_centred_vec_pos <- pos_skills$cust_centered
cust_centred_vec_neg <- neg_neutral_skills$cust_centered
wilcox.test(cust_centred_vec_pos, cust_centred_vec_neg, alternative = "two.sided") #p-value = 0.6919
##
## Wilcoxon rank sum test with continuity correction
##
## data: cust_centred_vec_pos and cust_centred_vec_neg
## W = 325.5, p-value = 0.5989
## alternative hypothesis: true location shift is not equal to 0
# comparing the support variable:
support_vec_pos <- pos_skills$support
support_vec_neg <- neg_neutral_skills$support
wilcox.test(support_vec_pos , support_vec_neg, alternative = "two.sided") #p-value = 0.7919
##
## Wilcoxon rank sum test with continuity correction
##
## data: support_vec_pos and support_vec_neg
## W = 350, p-value = 0.9113
## alternative hypothesis: true location shift is not equal to 0
# comparing the community variable:
community_vec_pos <- pos_skills$community
community_vec_neg <- neg_neutral_skills$community
wilcox.test(community_vec_pos, community_vec_neg, alternative = "two.sided") #p-value = 0.9302
##
## Wilcoxon rank sum test with continuity correction
##
## data: community_vec_pos and community_vec_neg
## W = 355.5, p-value = 0.9858
## alternative hypothesis: true location shift is not equal to 0
# comparing the social_media variable:
social_media_vec_pos <- pos_skills$social_media
social_media_vec_neg <- neg_neutral_skills$social_media
wilcox.test(social_media_vec_pos, social_media_vec_neg, alternative = "two.sided") #p-value = 0.5845
##
## Wilcoxon rank sum test with continuity correction
##
## data: social_media_vec_pos and social_media_vec_neg
## W = 332, p-value = 0.6855
## alternative hypothesis: true location shift is not equal to 0
# comparing the responsiveness variable:
responsiveness_vec_pos <- pos_skills$responsiveness
responsiveness_vec_neg <- neg_neutral_skills$responsiveness
wilcox.test(responsiveness_vec_pos, responsiveness_vec_neg, alternative = "two.sided") #p-value = 0.5844
##
## Wilcoxon rank sum test with continuity correction
##
## data: responsiveness_vec_pos and responsiveness_vec_neg
## W = 426.5, p-value = 0.2379
## alternative hypothesis: true location shift is not equal to 0
# comparing the compensation variable:
compensation_vec_pos <- pos_skills$compensation
compensation_vec_neg <- neg_neutral_skills$compensation
wilcox.test(compensation_vec_pos, compensation_vec_neg, alternative = "two.sided") #p-value = 0.4867
##
## Wilcoxon rank sum test with continuity correction
##
## data: compensation_vec_pos and compensation_vec_neg
## W = 353.5, p-value = 0.9604
## alternative hypothesis: true location shift is not equal to 0
# comparing the accessibility variable:
accessibility_vec_pos <- pos_skills$accessibility
accessibility_vec_neg <- neg_neutral_skills$accessibility
wilcox.test(accessibility_vec_pos, accessibility_vec_neg, alternative = "two.sided") #p-value = 0.1149
##
## Wilcoxon rank sum test with continuity correction
##
## data: accessibility_vec_pos and accessibility_vec_neg
## W = 463.5, p-value = 0.05998
## alternative hypothesis: true location shift is not equal to 0
# comparing the communication variable:
communication_vec_pos <- pos_skills$communication
communication_vec_neg <- neg_neutral_skills$communication
wilcox.test(communication_vec_pos, communication_vec_neg, alternative = "two.sided") #p-value = 0.2973
##
## Wilcoxon rank sum test with continuity correction
##
## data: communication_vec_pos and communication_vec_neg
## W = 424, p-value = 0.2628
## alternative hypothesis: true location shift is not equal to 0
# comparing the cust_relationship variable:
cust_relationship_vec_pos <- pos_skills$cust_relationship
cust_relationship_vec_neg <- neg_neutral_skills$cust_relationship
wilcox.test(cust_relationship_vec_pos, cust_relationship_vec_neg, alternative = "two.sided") #p-value = 0.9304
##
## Wilcoxon rank sum test with continuity correction
##
## data: cust_relationship_vec_pos and cust_relationship_vec_neg
## W = 383, p-value = 0.6528
## alternative hypothesis: true location shift is not equal to 0
# comparing the informal_culture variable:
informal_culture_vec_pos <- pos_skills$informal_culture
informal_culture_vec_neg <- neg_neutral_skills$informal_culture
wilcox.test(informal_culture_vec_pos, informal_culture_vec_neg, alternative = "two.sided") #p-value = 0.5907
##
## Wilcoxon rank sum test with continuity correction
##
## data: informal_culture_vec_pos and informal_culture_vec_neg
## W = 304, p-value = 0.3799
## alternative hypothesis: true location shift is not equal to 0
The last hypothesis tested is the following.
H0: There is no significant relationship between age and technology skills H1: There is a significant relationship between age and technology skills
This is tested via ordinal logistic regression analysis. The predictor is age , outcome: technology skills.
There is a weak negative relationship between age and tech skill level. The coefficient for age is -0.02168. So, when age increases by one unit, the log-odds of being in a higher category of tech_skill_cat decrease by 0.02168 units.
The intercept for the “very low|low” category is -5.0507, meaning that the log-odds are -5.0507 for being in the “very low” category instead of the “low” category. This can be continued for the other categories. The estimated model can be written as:
logit(P(Y≤1)) = -5.0507 - 0.02168(age) logit(P(Y≤2)) = -3.9138 - 0.02168(age) logit(P(Y≤3)) = -1.9232 - 0.02168(age) logit(P(Y≤4)) = 0.4838 - 0.02168(age)
As age increases, the probability of falling into the “very low|low” and “low|average” categories increases, and thus the level of tech_skill decreases.
According to the odds ratio of age, a one-unit increase in age is associated with a 2.144% decrease in the odds of moving to a higher category of technology skill.
The confidence interval for tech_skill includes 0. This means that age is not a significant predictor for tech_skill in general. However, it does hold that for the lower levels of tech_skill the earlier mentioned weak negative relationship exists. So, older individuals are more likely to have slightly worse technology skills than younger individuals, but the effect is not strong enough to be detected in the overall analysis.
# Age mean replacement function:
get_midpoint <- function(age_range) {
min_age <- as.numeric(str_extract(age_range, "\\d+"))
max_age <- as.numeric(str_extract(age_range, "\\d+$"))
return((min_age + max_age) / 2)}
# Creating a new dataframe with age as mean from the ranges:
new_df <- data %>% mutate(age = sapply(age, get_midpoint))
# Ordinal logistic regression:
# Creating a vector of labels for the categories:
labels <- c("very low", "low", "average", "high", "very high")
# Cutting the 'tech_skill' variable into 5 categories and assigning the labels
new_df$tech_skill_cat <- cut(new_df$tech_skill,
breaks = c(0, 1, 2, 3, 4, 5),
labels = labels,
include.lowest = TRUE)
olr_model <- polr(tech_skill_cat ~ age, data = new_df, Hess = TRUE)
summary(olr_model)
## Call:
## polr(formula = tech_skill_cat ~ age, data = new_df, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## age -0.009222 0.03391 -0.272
##
## Intercepts:
## Value Std. Error t value
## very low|low -4.4467 1.4644 -3.0366
## low|average -3.7381 1.2831 -2.9132
## average|high -1.5771 1.0907 -1.4459
## high|very high 0.9229 1.0728 0.8603
##
## Residual Deviance: 143.7039
## AIC: 153.7039
# Storing table:
ctable <- coef(summary(olr_model))
# Calculating p-values and combining in table:
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))
## Value Std. Error t value p value
## age -0.009222366 0.03390698 -0.2719902 0.785629566
## very low|low -4.446744381 1.46437507 -3.0366157 0.002392502
## low|average -3.738074474 1.28313997 -2.9132243 0.003577176
## average|high -1.577073268 1.09070962 -1.4459149 0.148201080
## high|very high 0.922857514 1.07277578 0.8602520 0.389650143
# Confidence intervals:
(ci <- confint(olr_model))
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## -0.07568570 0.05795899
## Odds ratio:
exp(coef(olr_model))
## age
## 0.99082