UI_analyse

Before the data was loaded, it was pre-cleanded in Excel. There were 100 responses to the survey, however, 27 participants quit after the informed consent. These were removed from the data. While loading the data in R, the Likert-scale variables from the survey get the following names.

# loading the data into a table with variable names:

data <- read.table("UI_data.csv", header = FALSE, sep = ";", col.names = c("age", "job_study", "tech_skill", "cust_centered", "support", "community", "social_media", "responsiveness", "compensation", "comp_improvement", "comp_high_effort", "accessibility", "communication", "cust_relationship", "informal_culture", "general"))

## Warning in read.table("UI_data.csv", header = FALSE, sep = ";", col.names =
## c("age", : incomplete final line found by readTableHeader on 'UI_data.csv'

The table added 8 empty rows at the end. These are being removed.

data <- head(data, n = nrow(data) - 8)

I start with some summary statistics to obtain a general insight into the data.

summary(data)

##      age             job_study           tech_skill    cust_centered  
##  Length:65          Length:65          Min.   :1.000   Min.   :1.000  
##  Class :character   Class :character   1st Qu.:4.000   1st Qu.:3.000  
##  Mode  :character   Mode  :character   Median :4.000   Median :4.000  
##                                        Mean   :3.969   Mean   :3.431  
##                                        3rd Qu.:4.000   3rd Qu.:4.000  
##                                        Max.   :5.000   Max.   :5.000  
##                                                                       
##     support        community      social_media   responsiveness  compensation  
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.0    Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.0    1st Qu.:2.000  
##  Median :4.000   Median :2.000   Median :2.000   Median :4.0    Median :4.000  
##  Mean   :3.723   Mean   :2.138   Mean   :2.615   Mean   :3.4    Mean   :3.569  
##  3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.:4.0    3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.0    Max.   :5.000  
##                                                                                
##  comp_improvement comp_high_effort accessibility   communication  
##  Min.   :1.000    Min.   :1.000    Min.   :2.000   Min.   :1.000  
##  1st Qu.:3.000    1st Qu.:3.000    1st Qu.:4.000   1st Qu.:3.000  
##  Median :4.000    Median :3.000    Median :4.000   Median :4.000  
##  Mean   :3.609    Mean   :3.354    Mean   :4.308   Mean   :3.446  
##  3rd Qu.:4.000    3rd Qu.:4.000    3rd Qu.:5.000   3rd Qu.:4.000  
##  Max.   :5.000    Max.   :5.000    Max.   :5.000   Max.   :5.000  
##  NA's   :1                                                        
##  cust_relationship informal_culture   general         
##  Min.   :1.000     Min.   :1.000    Length:65         
##  1st Qu.:4.000     1st Qu.:3.000    Class :character  
##  Median :4.000     Median :3.000    Mode  :character  
##  Mean   :4.092     Mean   :3.154                      
##  3rd Qu.:5.000     3rd Qu.:4.000                      
##  Max.   :5.000     Max.   :5.000                      
##

A subset was made to perform analyses on the variables the literature suggests to improve users’s motivation for user involvement. This is all variables, except comp_improvement and comp_high_effort, since these are there to obtain more insights in the compensation variable. Also age, job_study and general are removed here, because they do not operate on a Likert scale. Lastly, tech_skill is excluded, because the literature suggests it does not improve user involvement motivation, but it does improve the user involvement quality for the company’s side.

variables <- data[, c("cust_centered", "support", "community", "social_media", "responsiveness", "compensation", "accessibility", "communication", "cust_relationship", "informal_culture")]

To check what variables are the most important, a barplot is created. This shows us that accessibility, customer relationship and support obtain the highest mean scores.

means <- colMeans(variables)
sorted_means <- sort(means)
barplot(sorted_means, main = "Means of UI improvement variables", xlab = "Variables\n\n\n", ylab = "Mean", las = 2, cex.names = 0.6)

In order to get an idea about the spread of the variables, boxplots are created. From the boxplot we can see the community variable is highly skewed. To obtain a better insight in the spread, a histogram per variable was made. It looks like we do not deal with normal distributed variables, except for maybe informal_culture. This will be checked later. Due to the distribution of the variables, I think a violin plots might be easier comparable than the boxplots, therefore these are created per variable as well.

#Converting data to long format:
df_long <- reshape2::melt(variables)

## No id variables; using all as measure variables

#Boxplots:
ggplot(df_long, aes(x = variable, y = value)) +
  geom_boxplot() +
  xlab("Variables") +
  ylab("Likert Scale Value") +
  ggtitle("Likert Scale Values per Variable") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#Histograms: 
ggplot(df_long, aes(value)) +
  geom_histogram() +
  facet_wrap(~ variable, scales = "free")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Violin plots:
ggplot(reshape2::melt(variables), aes(variable, value)) +
  geom_violin() +
  labs(x = "Variables", y = "Likert Scale Values") +
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

## No id variables; using all as measure variables

The first hypothesis I test is the following.

H0: There is no significant difference between the variables the literature suggests to improve users’ motivation to provide feedback

H1: There is a significant difference between the variables the literature suggests to improve users’ motivation to provide feedback

This hypothesis could tested by repeated measures ANOVA, since there is one group with multiple variables to compare. First, I start checking the assumptions. When doing a Shapiro-Wilk test, it appears all variables receive a p-value smaller than 0.05. This means all variables are not normally distributed at a alpha = 0.05 signifcance level. Therefore, I will continue with a non-parametric test.

#Shapiro-Wilk test per variable:
normality_tests <- lapply(variables, shapiro.test) 

#p-values:
(p_values <- sapply(normality_tests, "[[", "p.value"))

##     cust_centered           support         community      social_media 
##      7.781417e-06      7.778587e-07      4.301240e-08      5.661493e-05 
##    responsiveness      compensation     accessibility     communication 
##      1.112070e-06      3.965562e-06      1.116296e-08      1.315758e-05 
## cust_relationship  informal_culture 
##      4.252946e-09      1.582941e-04

Friedman’s non-parametric test is used to analyze the variance between the variables. The test results in a p-value of 0.0000000000002488. This indicates the differences between some of the variables are statistically significant.

# transposing the data to recognize the variables as separate groups:
var_transposed <- t(data[, c("cust_centered", "support", "community", "social_media", "responsiveness", "compensation", "accessibility", "communication", "cust_relationship", "informal_culture")])

(friedman_res <- friedman.test(var_transposed))

## 
##  Friedman rank sum test
## 
## data:  var_transposed
## Friedman chi-squared = 164, df = 64, p-value = 1.006e-10

To find out what variables hold significant differences, I perform a Nemenyi test as post-hoc analysis. The results are converted to a table that shows the variable comparisons with ascending p-values.

# Creating a vector for each variable in the variables dataframe:

cust_centered <- variables$cust_centered #1
support <- variables$support #2
community <- variables$community #3
social_media <- variables$social_media #4
responsiveness <- variables$responsiveness #5
compensation <- variables$compensation #6
accessibility <- variables$accessibility #7
communication <- variables$communication #8
cust_relationship <- variables$cust_relationship #9
informal_culture <- variables$informal_culture #10

# Combining the vectors to a list
data_list <- list(cust_centered, support, community, social_media, responsiveness, compensation, accessibility, communication, cust_relationship, informal_culture)

# Nmenyi:
nemenyi <- NemenyiTest(data_list)


# Creating a vector to map comparisons to variable names
variable_names <- c("cust_centered", "support", "community", "social_media", 
                    "responsiveness", "compensation", "accessibility", 
                    "communication", "cust_relationship", "informal_culture")

#Creating a data frame with the results
nemenyi_df <- data.frame( comparison = c("cust_centered-support", "cust_centered-community", "cust_centered-social_media", 
                 "cust_centered-responsiveness", "cust_centered-compensation", "cust_centered-accessibility", 
                 "cust_centered-communication", "cust_centered-cust_relationship", "cust_centered-informal_culture",
                 "support-community", "support-social_media", "support-responsiveness", "support-compensation", 
                 "support-accessibility", "support-communication", "support-cust_relationship", "support-informal_culture", 
                 "community-social_media", "community-responsiveness", "community-compensation", "community-accessibility", 
                 "community-communication", "community-cust_relationship", "community-informal_culture", 
                 "social_media-responsiveness", "social_media-compensation", "social_media-accessibility", 
                 "social_media-communication", "social_media-cust_relationship", "social_media-informal_culture", 
                 "responsiveness-compensation", "responsiveness-accessibility", "responsiveness-communication", 
                 "responsiveness-cust_relationship", "responsiveness-informal_culture", 
                 "compensation-accessibility", "compensation-communication", "compensation-cust_relationship", 
                 "compensation-informal_culture", "accessibility-communication", "accessibility-cust_relationship", 
                 "accessibility-informal_culture", "communication-cust_relationship", "communication-informal_culture", 
                 "cust_relationship-informal_culture"), 
  mean.rank.diff = c(58.191781, -196.047945, -132.178082, 12.554795, 30.143836,
                     156.917808, -7.821918, 121.869863, -44.315068, -254.239726,
                     -190.369863, -45.636986, -28.047945, 98.726027, -66.013699,
                     63.678082, -102.506849, 63.869863, 208.602740, 226.191781,
                     352.965753, 188.226027, 317.917808, 151.732877, 144.732877,
                     162.321918, 289.095890, 124.356164, 254.047945, 87.863014,
                     17.589041, 144.363014, -20.376712, 109.315068, -56.869863,
                     126.773973, -37.965753, 91.726027, -74.458904, -164.739726,
                     -35.047945, -201.232877, 129.691781, -36.493151, -166.184932),
  pval = c(0.81410, 8.7e-07, 0.00592, 1.00000, 0.99746, 0.00030, 1.00000, 0.01732,
           0.96036, 1.5e-11, 2.2e-06, 0.95226, 0.99855, 0.12684, 0.67515, 0.71982, 
           0.09558, 0.71623, 1.0e-07, 4.1e-09, 1.0e-13, 3.1e-06, 1.2e-13, 0.00058, 
           0.00139, 0.00014, 1.2e-13, 0.01350, 1.5e-11, 0.25896, 0.99997, 0.00146, 
           0.99989, 0.05500, 0.83404, 0.01053, 0.98599, 0.20442, 0.50371, 0.00010, 
           0.99210, 3.7e-07, 0.00775, 0.98942, 8.4e-05))

# Ascending the p-values
asc_nemenyi <- nemenyi_df[order(nemenyi_df$pval), ]


# Creating a better look:
kable(asc_nemenyi, format = "markdown", align = c("c", "c", "c"), caption = "Nemenyi Table")

Nemenyi Table
	comparison	mean.rank.diff	pval
21	community-accessibility	352.965753	0.0000000
23	community-cust_relationship	317.917808	0.0000000
27	social_media-accessibility	289.095890	0.0000000
10	support-community	-254.239726	0.0000000
29	social_media-cust_relationship	254.047945	0.0000000
20	community-compensation	226.191781	0.0000000
19	community-responsiveness	208.602740	0.0000001
42	accessibility-informal_culture	-201.232877	0.0000004
2	cust_centered-community	-196.047945	0.0000009
11	support-social_media	-190.369863	0.0000022
22	community-communication	188.226027	0.0000031
45	cust_relationship-informal_culture	-166.184932	0.0000840
40	accessibility-communication	-164.739726	0.0001000
26	social_media-compensation	162.321918	0.0001400
6	cust_centered-accessibility	156.917808	0.0003000
24	community-informal_culture	151.732877	0.0005800
25	social_media-responsiveness	144.732877	0.0013900
32	responsiveness-accessibility	144.363014	0.0014600
3	cust_centered-social_media	-132.178082	0.0059200
43	communication-cust_relationship	129.691781	0.0077500
36	compensation-accessibility	126.773973	0.0105300
28	social_media-communication	124.356164	0.0135000
8	cust_centered-cust_relationship	121.869863	0.0173200
34	responsiveness-cust_relationship	109.315068	0.0550000
17	support-informal_culture	-102.506849	0.0955800
14	support-accessibility	98.726027	0.1268400
38	compensation-cust_relationship	91.726027	0.2044200
30	social_media-informal_culture	87.863014	0.2589600
39	compensation-informal_culture	-74.458904	0.5037100
15	support-communication	-66.013699	0.6751500
18	community-social_media	63.869863	0.7162300
16	support-cust_relationship	63.678082	0.7198200
1	cust_centered-support	58.191781	0.8141000
35	responsiveness-informal_culture	-56.869863	0.8340400
12	support-responsiveness	-45.636986	0.9522600
9	cust_centered-informal_culture	-44.315068	0.9603600
37	compensation-communication	-37.965753	0.9859900
44	communication-informal_culture	-36.493151	0.9894200
41	accessibility-cust_relationship	-35.047945	0.9921000
5	cust_centered-compensation	30.143836	0.9974600
13	support-compensation	-28.047945	0.9985500
33	responsiveness-communication	-20.376712	0.9998900
31	responsiveness-compensation	17.589041	0.9999700
4	cust_centered-responsiveness	12.554795	1.0000000
7	cust_centered-communication	-7.821918	1.0000000

The next hypothesis tested is the following.

H0: There is no significant difference in motivational factors between users who have above average technology skills (scores 4 and 5 from the Likert-scale question in the survey) and users who indicate they are neutral or negative about their amount of technology skills (scores 1, 2 and 3)

H1: There is a significant difference in motivational factors between users who have above average technology skills (scores 4 and 5 from the Likert-scale question in the survey) and users who indicate they are neutral or negative about their amount of technology skills (scores 1, 2 and 3)

This is analyzed by a Wilcoxon test for each variable. None of the same variables are significantly different in the different subsets. Accessibility comes close to reaching significant difference at 90% confidence level with a p-value of 0.1149.

# creating a subset for the participants who scored their technology skills with an 1, 2 or 3:

neg_neutral_skills <- subset(data, tech_skill <=  3,  select=c(cust_centered, support, community, social_media, responsiveness, compensation, accessibility, communication, cust_relationship, informal_culture))


# creating a subset for the participants who scored their technology skills with a 4 or 5:

pos_skills <- subset(data, tech_skill >=  4,  select=c(cust_centered, support, community, social_media, responsiveness, compensation, accessibility, communication, cust_relationship, informal_culture))



# comparing the cust_centered variable:
cust_centred_vec_pos <- pos_skills$cust_centered
cust_centred_vec_neg <- neg_neutral_skills$cust_centered

wilcox.test(cust_centred_vec_pos, cust_centred_vec_neg, alternative = "two.sided") #p-value = 0.6919

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  cust_centred_vec_pos and cust_centred_vec_neg
## W = 325.5, p-value = 0.5989
## alternative hypothesis: true location shift is not equal to 0

# comparing the support variable:

support_vec_pos <- pos_skills$support
support_vec_neg <- neg_neutral_skills$support

wilcox.test(support_vec_pos , support_vec_neg, alternative = "two.sided") #p-value = 0.7919

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  support_vec_pos and support_vec_neg
## W = 350, p-value = 0.9113
## alternative hypothesis: true location shift is not equal to 0

# comparing the community variable:

community_vec_pos <- pos_skills$community
community_vec_neg <- neg_neutral_skills$community

wilcox.test(community_vec_pos, community_vec_neg, alternative = "two.sided") #p-value = 0.9302

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  community_vec_pos and community_vec_neg
## W = 355.5, p-value = 0.9858
## alternative hypothesis: true location shift is not equal to 0

# comparing the social_media variable:
social_media_vec_pos <- pos_skills$social_media
social_media_vec_neg <- neg_neutral_skills$social_media

wilcox.test(social_media_vec_pos, social_media_vec_neg, alternative = "two.sided") #p-value = 0.5845

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  social_media_vec_pos and social_media_vec_neg
## W = 332, p-value = 0.6855
## alternative hypothesis: true location shift is not equal to 0

# comparing the responsiveness variable:
responsiveness_vec_pos <- pos_skills$responsiveness
responsiveness_vec_neg <- neg_neutral_skills$responsiveness

wilcox.test(responsiveness_vec_pos, responsiveness_vec_neg, alternative = "two.sided") #p-value = 0.5844

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  responsiveness_vec_pos and responsiveness_vec_neg
## W = 426.5, p-value = 0.2379
## alternative hypothesis: true location shift is not equal to 0

# comparing the compensation variable:
compensation_vec_pos <- pos_skills$compensation
compensation_vec_neg <- neg_neutral_skills$compensation

wilcox.test(compensation_vec_pos, compensation_vec_neg, alternative = "two.sided") #p-value = 0.4867

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  compensation_vec_pos and compensation_vec_neg
## W = 353.5, p-value = 0.9604
## alternative hypothesis: true location shift is not equal to 0

# comparing the accessibility variable:
accessibility_vec_pos <- pos_skills$accessibility
accessibility_vec_neg <- neg_neutral_skills$accessibility

wilcox.test(accessibility_vec_pos, accessibility_vec_neg, alternative = "two.sided") #p-value = 0.1149

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  accessibility_vec_pos and accessibility_vec_neg
## W = 463.5, p-value = 0.05998
## alternative hypothesis: true location shift is not equal to 0

# comparing the communication variable:
communication_vec_pos <- pos_skills$communication
communication_vec_neg <- neg_neutral_skills$communication

wilcox.test(communication_vec_pos, communication_vec_neg, alternative = "two.sided") #p-value = 0.2973

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  communication_vec_pos and communication_vec_neg
## W = 424, p-value = 0.2628
## alternative hypothesis: true location shift is not equal to 0

# comparing the cust_relationship variable:
cust_relationship_vec_pos <- pos_skills$cust_relationship
cust_relationship_vec_neg <- neg_neutral_skills$cust_relationship

wilcox.test(cust_relationship_vec_pos, cust_relationship_vec_neg, alternative = "two.sided") #p-value = 0.9304

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  cust_relationship_vec_pos and cust_relationship_vec_neg
## W = 383, p-value = 0.6528
## alternative hypothesis: true location shift is not equal to 0

# comparing the informal_culture variable:
informal_culture_vec_pos <- pos_skills$informal_culture
informal_culture_vec_neg <- neg_neutral_skills$informal_culture

wilcox.test(informal_culture_vec_pos, informal_culture_vec_neg, alternative = "two.sided") #p-value = 0.5907

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  informal_culture_vec_pos and informal_culture_vec_neg
## W = 304, p-value = 0.3799
## alternative hypothesis: true location shift is not equal to 0

The last hypothesis tested is the following.

H0: There is no significant relationship between age and technology skills H1: There is a significant relationship between age and technology skills

This is tested via ordinal logistic regression analysis. The predictor is age , outcome: technology skills.

There is a weak negative relationship between age and tech skill level. The coefficient for age is -0.02168. So, when age increases by one unit, the log-odds of being in a higher category of tech_skill_cat decrease by 0.02168 units.

The intercept for the “very low|low” category is -5.0507, meaning that the log-odds are -5.0507 for being in the “very low” category instead of the “low” category. This can be continued for the other categories. The estimated model can be written as:

logit(P(Y≤1)) = -5.0507 - 0.02168(age) logit(P(Y≤2)) = -3.9138 - 0.02168(age) logit(P(Y≤3)) = -1.9232 - 0.02168(age) logit(P(Y≤4)) = 0.4838 - 0.02168(age)

As age increases, the probability of falling into the “very low|low” and “low|average” categories increases, and thus the level of tech_skill decreases.

According to the odds ratio of age, a one-unit increase in age is associated with a 2.144% decrease in the odds of moving to a higher category of technology skill.

The confidence interval for tech_skill includes 0. This means that age is not a significant predictor for tech_skill in general. However, it does hold that for the lower levels of tech_skill the earlier mentioned weak negative relationship exists. So, older individuals are more likely to have slightly worse technology skills than younger individuals, but the effect is not strong enough to be detected in the overall analysis.

# Age mean replacement function:
get_midpoint <- function(age_range) {
  min_age <- as.numeric(str_extract(age_range, "\\d+"))
  max_age <- as.numeric(str_extract(age_range, "\\d+$"))
  return((min_age + max_age) / 2)}

# Creating a new dataframe with age as mean from the ranges:
new_df <- data %>% mutate(age = sapply(age, get_midpoint))


# Ordinal logistic regression:
# Creating a vector of labels for the categories:
labels <- c("very low", "low", "average", "high", "very high")

# Cutting the 'tech_skill' variable into 5 categories and assigning the labels
new_df$tech_skill_cat <- cut(new_df$tech_skill, 
                             breaks = c(0, 1, 2, 3, 4, 5), 
                             labels = labels, 
                             include.lowest = TRUE)

olr_model <- polr(tech_skill_cat ~ age, data = new_df, Hess = TRUE)

summary(olr_model)

## Call:
## polr(formula = tech_skill_cat ~ age, data = new_df, Hess = TRUE)
## 
## Coefficients:
##         Value Std. Error t value
## age -0.009222    0.03391  -0.272
## 
## Intercepts:
##                Value   Std. Error t value
## very low|low   -4.4467  1.4644    -3.0366
## low|average    -3.7381  1.2831    -2.9132
## average|high   -1.5771  1.0907    -1.4459
## high|very high  0.9229  1.0728     0.8603
## 
## Residual Deviance: 143.7039 
## AIC: 153.7039

# Storing table:
ctable <- coef(summary(olr_model))

# Calculating p-values and combining in table:
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))

##                       Value Std. Error    t value     p value
## age            -0.009222366 0.03390698 -0.2719902 0.785629566
## very low|low   -4.446744381 1.46437507 -3.0366157 0.002392502
## low|average    -3.738074474 1.28313997 -2.9132243 0.003577176
## average|high   -1.577073268 1.09070962 -1.4459149 0.148201080
## high|very high  0.922857514 1.07277578  0.8602520 0.389650143

# Confidence intervals:
(ci <- confint(olr_model))

## Waiting for profiling to be done...

##       2.5 %      97.5 % 
## -0.07568570  0.05795899

## Odds ratio:
exp(coef(olr_model))

##     age 
## 0.99082