This aim of this document is to communicate an analysis of survey data in the context of a the market research done by students at ULB/Sovay Brussels School on citizen initiatives and, in particular, on civic crowdfunding platforms in the region of Brussels. This research originated from a need by BrusselsTogether to better understand the various factors that could be associated with the success of their own civic crowdfunding platform.
The present document provides all of the associated code written in the R programming language in order for anyone to be able to check and reproduce the findings presented here.
The online survey was launched in April 2018 by the students and was designed based on a qualitative research which was conducted beforehand.
All of the input files, as well as the R code, can also be retrieved on our Github page.
In the survey, the first step was to split the respondents between two categories: citizens and ‘entrepreneurs’ (i.e. active members of a local citizens initiative, social project or social organization). Therefore, in this document we are going to analyze those two groups separately.
library(readr)
library(lubridate)
library(stringr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(hrbrthemes)
library(knitr)
library(kableExtra)
library(lavaan)
library(reshape2)
library(RColorBrewer)
library(ggthemes)
library(tibble)
library(broom)
library(rebus)
There are only three input files (Citizen initiative.csv which comes straight from Google Forms, Replacement_of_csv_header.txt and header_lookup.txt).
survey_df <- read_delim("inputs/Citizen initiative.csv",
delim = ",", skip = 1,
col_names = FALSE)
temp_header <- read_delim("inputs/Replacement_of_csv_header.txt",
delim = ",")
colnames(survey_df) <- colnames(temp_header)
header_lookup <- read_delim("inputs/header_lookup.txt", delim = ";")
The preprocessing code (essentially cleaning and tidying of the data) is available below. If interested, the reader can have a look by clicking on the Code button on the right.
# Dates preprocessing
survey_df$date <- str_replace(string = survey_df$date,
pattern = "M GMT.*$",
replacement = "M")
survey_df$date <- str_replace_all(string = survey_df$date,
pattern = "/",
replacement = "-")
survey_df$date <- ymd_hms(survey_df$date)
survey_df <- survey_df %>%
filter(date > "2018-04-27 20:00:00")
# Language preprocessing
lut_lang <- c("English" = "English",
"Nederlands" = "Dutch",
"Français" = "French")
survey_df$lang <- lut_lang[survey_df$lang]
# Grouping preprocessing
lut_group_eng <- c("Citizen with no active implication in a local citizen's initiative, social project or social organization." = "Citizen_not_active",
"Active member of a local citizen's initiative, social project or social organization." = "Active_member",
"Citizen with casual volonteering for a local citizen's initiative, social project or social organization." = "Citizen_casual_vol")
lut_group_nl <- c("Een burger die geen deel uitmaakt van een lokaal burgerinitiatief, een sociaal project of een sociale organisatie." = "Citizen_not_active",
"Een burger die occasioneel bijdraagt als vrijliger van een lokaal burgerinitiatief, een sociaal project of een sociale organisatie." = "Citizen_casual_vol",
"Een actief lid van een lokaal burgerinitiatief, van een sociaal project of een sociale organisatie." = "Active_member")
lut_group_fr <- c("Citoyen.ne sans appartenance à une initiative citoyenne locale, un projet social ou une organisation sociale." = "Citizen_not_active",
"Citoyen.ne contribuant de manière occasionnelle en tant que volontaire à une initiative citoyenne locale, un projet social ou une organisation sociale." = "Citizen_casual_vol",
"Membre actif d'une initiative citoyenne locale, d'un projet social ou d'une organisation sociale." = "Active_member")
survey_df$grouping_eng <- lut_group_eng[survey_df$grouping_eng]
survey_df$grouping_nl <- lut_group_nl[survey_df$grouping_nl]
survey_df$grouping_fr <- lut_group_fr[survey_df$grouping_fr]
# Entrepreuneurs interest preprocessing
lut_e_int_eng <- c("I am not interested in joining a civic crowdfunding platform." = "Not_interested",
"I could be interested to join a civic crowdfunding platform." = "Could_be_interested",
"I am ready to join a civic crowdfunding platform." = "Is_ready")
survey_df$e_interest_nl <- str_replace(survey_df$e_interest_nl,
pattern = "een " %R% ANY_CHAR %R% "civic crowdfunding" %R% ANY_CHAR %R% " platform",
replacement = "een civic crowdfunding platform")
lut_e_int_nl <- c("Ik ben niet geinteresseerd bij het toetreden van een civic crowdfunding platform." = "Not_interested",
"Ik zou geinteresseerd kunnen zijn in het toetreden van een civic crowdfunding platform." = "Could_be_interested",
"Ik wil deel uitmaken van een civic crowdfunding platform." = "Is_ready")
lut_e_int_fr <- c("Je ne suis pas intéressé.e par l'adhésion à une plateforme de crowdfunding citoyen." = "Not_interested",
"Je pourrais être intéressé.e par l'adhésion à une plateforme de crowdfunding citoyen." = "Could_be_interested",
"Je suis prêt.e à rejoindre une plateforme de crowdfunding citoyen." = "Is_ready")
survey_df$e_interest_eng <- lut_e_int_eng[survey_df$e_interest_eng]
survey_df$e_interest_nl <- lut_e_int_nl[survey_df$e_interest_nl]
survey_df$e_interest_fr <- lut_e_int_fr[survey_df$e_interest_fr]
# Likert scale questions preprocessing
lut_likert_nl <- c("Helemaal niet akkoord" = "Strongly disagree",
"Niet akkoord" = "Disagree",
"Geen mening" = "Neither agree nor disagree",
"Akkoord" = "Agree",
"Helemaal akkoord" = "Strongly agree",
"Helemaal niet akkoord" = "Strongly disagree",
"niet akkoord" = "Disagree",
"geen mening" = "Neither agree nor disagree",
"akkoord" = "Agree",
"helemaal akkoord" = "Strongly agree")
lut_likert_fr <- c("Pas du tout d'accord" = "Strongly disagree",
"Pas d'accord" = "Disagree",
"Sans opinion" = "Neither agree nor disagree",
"D'accord" = "Agree",
"Tout à fait d'accord" = "Strongly agree")
survey_df[] <- lapply(colnames(survey_df),
function(x) {
y <- survey_df[[x]]
if (str_detect(x, pattern = "^._.[0-9]_nl$")) {
y <- lut_likert_nl[y]
}
return(y)
})
survey_df[] <- lapply(colnames(survey_df),
function(x) {
y <- survey_df[[x]]
if (str_detect(x, pattern = "^._.[0-9]_fr$")) {
y <- lut_likert_fr[y]
}
return(y)
})
# Citizen interest preprocessing
lut_c_int_eng <- c("I am not interested in citizen's initiatives." = "Not_interested",
"I could be interested to fund a citizen's initiative." = "Could_be_interested_Fund",
"I could be interested to volunteer for a citizen's initiative." = "Could_be_interested_Vol",
"I am ready to fund a citizen's initiative." = "Is_ready_Fund",
"I am ready to volunteer for a citizen's initiative." = "Is_ready_Vol")
lut_c_int_nl <- c("Ik ben niet geinteresseerd door burgerinitiatieven" = "Not_interested",
"Ik zou eventueel geinteresseerd kunnen zijn door het financieren van burgerinitiatieven" = "Could_be_interested_Fund",
"Ik zou eventueel geinteresseerd kunnen zijn door het financieel te ondersteunen van burgerinitiatieven" = "Could_be_interested_Fund",
"Ik zou eventueel geinteresseerd zijn om een vrijwillige bijdrage te leveren aan een burgerinitiatief" = "Could_be_interested_Vol",
"Ik ben klaar om burgerinitiatieven te financieren" = "Is_ready_Fund",
"Ik ben klaar om vrijwilig mee te draaien in een burgerinitiatief" = "Is_ready_Vol")
lut_c_int_fr <- c("Je ne suis pas intéressé.e par les initiatives citoyennes" = "Not_interested",
"Je pourrais être éventuellement intéressé.e par le financement d'une initiative citoyenne" = "Could_be_interested_Fund",
"Je pourrais être éventuellement intéressé.e par une contribution en tant que volontaire dans une initiative citoyenne" = "Could_be_interested_Vol",
"Je suis prêt.e à financer une initiative citoyenne" = "Is_ready_Fund",
"Je suis prêt.e à contribuer en tant que volontaire dans une initiative citoyenne" = "Is_ready_Vol")
survey_df$c_interest_eng <- lut_c_int_eng[survey_df$c_interest_eng]
survey_df$c_interest_nl <- lut_c_int_nl[survey_df$c_interest_nl]
survey_df$c_interest_fr <- lut_c_int_fr[survey_df$c_interest_fr]
# Citizen already investing question preprocessing
lut_c_invest_eng <- c("0\u20AC" = "0",
"1 - 99\u20AC /year" = "1_99",
"100 - 199\u20AC /year" = "100_199",
"200 - 499\u20AC /year" = "200_499",
"500\u20AC + /year" = "500_plus")
lut_c_invest_nl <- c("0€" = "0",
"1 - 99€ /jaar" = "1_99",
"100 - 199€ /jaar" = "100_199",
"200 - 499€ /jaar" = "200_499",
"500€ + /jaar" = "500_plus")
lut_c_invest_fr <- c("0€" = "0",
"1 - 99€ /an" = "1_99",
"100 - 199€ /an" = "100_199",
"200 - 499€ /an" = "200_499",
"500€ + /an" = "500_plus")
# survey_df$c_invest_eng <- lut_c_invest_eng[survey_df$c_invest_eng]
# survey_df$c_invest_nl <- lut_c_invest_nl[survey_df$c_invest_nl]
# survey_df$c_invest_fr <- lut_c_invest_fr[survey_df$c_invest_fr]
replace_amount <- function(x, a, b) {
mask <- str_detect(x, pattern = paste0("^", a))
x[which(mask)] <- b
return(x)
}
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
a = "0",
b = "0")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
a = "1 ",
b = "1_99")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
a = "10",
b = "100_199")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
a = "2",
b = "200_499")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
a = "5",
b = "500_plus")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
a = "0",
b = "0")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
a = "1 ",
b = "1_99")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
a = "10",
b = "100_199")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
a = "2",
b = "200_499")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
a = "5",
b = "500_plus")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
a = "0",
b = "0")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
a = "1 ",
b = "1_99")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
a = "10",
b = "100_199")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
a = "2",
b = "200_499")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
a = "5",
b = "500_plus")
# Contrib neighb question preprocessing
survey_df$ec_neighb_nl <- lut_likert_nl[survey_df$ec_neighb_nl]
survey_df$ec_neighb_fr <- lut_likert_fr[survey_df$ec_neighb_fr]
# Location question preprocessing
lut_loc_eng <- c("Brussels" = "Brussels",
"Outside Belgium " = "Brussels",
"Flanders" = "Flanders",
"Wallonia" = "Wallonia")
lut_loc_nl <- c("Brussel" = "Brussels",
"Brussels" = "Brussels",
"Vlanderen" = "Flanders",
"Vlaanderen" = "Flanders",
"Wallonië" = "Wallonia")
lut_loc_fr <- c("Bruxelles-Capitale" = "Brussels",
"cameroun/ngaoundéré" = "Brussels",
"Flandre" = "Flanders",
"Wallonie" = "Wallonia")
survey_df$ec_loc_eng <- lut_loc_eng[survey_df$ec_loc_eng]
survey_df$ec_loc_nl <- lut_loc_nl[survey_df$ec_loc_nl]
survey_df$ec_loc_fr <- lut_loc_fr[survey_df$ec_loc_fr]
# Gender question preprocessing
lut_gen_eng <- c("Female" = "Female",
"Male" = "Male",
"Other/I prefer keep it for myself" = "Other")
lut_gen_nl <- c("een vrow" = "Female",
"een man" = "Male",
"anders/Ik hou het liever voor mezelf." = "Other")
lut_gen_fr <- c("Une femme" = "Female",
"Un homme" = "Male",
"Autre/Je préfère le garder pour moi" = "Other")
survey_df$ec_gender_eng <- lut_gen_eng[survey_df$ec_gender_eng]
survey_df$ec_gender_nl <- lut_gen_nl[survey_df$ec_gender_nl]
survey_df$ec_gender_fr <- lut_gen_fr[survey_df$ec_gender_fr]
# Age question preprocessing
lut_age_eng <- c("Under 14 years old." = "14_and_below",
"14-17 years old." = "14_17",
"18-24 years old." = "18_24",
"25-34 years old." = "25_34",
"35-44 years old." = "35_44",
"45-54 years old." = "45_54",
"55-64 years old." = "55_64",
"65-74 years old." = "65_74",
"75 years or older." = "75_plus")
lut_age_nl <- c("onder 14 jaar" = "14_and_below",
"14-17 jaar" = "14_17",
"18-24jaar" = "18_24",
"25-34jaar" = "25_34",
"35-44jaar" = "35_44",
"45-54jaar" = "45_54",
"55-64jaar" = "55_64",
"65-74jaar" = "65_74",
"75 jaar of ouder" = "75_plus")
lut_age_fr <- c("En dessous de 14 ans." = "14_and_below",
"14-17 ans." = "14_17",
"18-24 ans." = "18_24",
"25-34 ans." = "25_34",
"35-44 ans." = "35_44",
"45-54 ans." = "45_54",
"55-64 ans." = "55_64",
"65-74 ans." = "65_74",
"75 ans ou plus." = "75_plus")
survey_df$ec_age_eng <- lut_age_eng[survey_df$ec_age_eng]
survey_df$ec_age_nl <- lut_age_nl[survey_df$ec_age_nl]
survey_df$ec_age_fr <- lut_age_fr[survey_df$ec_age_fr]
# Education question preprocessing
survey_df$ec_edu_eng <- str_replace(survey_df$ec_edu_eng,
pattern = "Bachelor" %R% ANY_CHAR %R% "s",
replacement = "Bachelor")
survey_df$ec_edu_eng <- str_replace(survey_df$ec_edu_eng,
pattern = "Master" %R% ANY_CHAR %R% "s",
replacement = "Masters")
lut_edu_eng <- c("Primary school or no schooling completed" = "Primary_or_None",
"High-School" = "High_school",
"Trade/technical/vocational training." = "Technical",
"Bachelor degree." = "Bachelor",
"Masters degree." = "Master",
"Doctorate degree." = "PhD")
lut_edu_nl <- c("Lagere school of zonder diploma" = "Primary_or_None",
"Humaniora" = "High_school",
"Technische Humaniora" = "Technical",
"Bachelor" = "Bachelor",
"Master" = "Master",
"Doctoraat" = "PhD")
lut_edu_fr <- c("Primaire ou sans diplôme" = "Primary_or_None",
"Diplôme secondaire" = "High_school",
"Diplôme secondaire + des formations qualifiantes" = "High_school",
"Diplôme secondaire technique" = "Technical",
"Bachelier" = "Bachelor",
"Master" = "Master",
"Double master même ;-)" = "Master",
"Doctorat" = "PhD",
"Post universirtaire" = "PhD")
survey_df$ec_edu_eng <- lut_edu_eng[survey_df$ec_edu_eng]
survey_df$ec_edu_nl <- lut_edu_nl[survey_df$ec_edu_nl]
survey_df$ec_edu_fr <- lut_edu_fr[survey_df$ec_edu_fr]
# Merging columns from the different languages
all_cols <- names(survey_df)
eng_cols <- all_cols[str_detect(all_cols, pattern = "eng$")]
nl_cols <- all_cols[str_detect(all_cols, pattern = "nl$")]
fr_cols <- all_cols[str_detect(all_cols, pattern = "fr$")]
merged_cols <- str_replace(eng_cols,
pattern = "_eng$",
replacement = "")
temp_df <- tbl_df(data.frame(matrix(nrow = nrow(survey_df),
ncol = length(merged_cols))))
colnames(temp_df) <- merged_cols
survey_df <- tbl_df(cbind(survey_df, temp_df))
survey_df[is.na(survey_df)] <- ""
for (i in 1:length(merged_cols)) {
m <- merged_cols[i]
e <- eng_cols[i]
n <- nl_cols[i]
f <- fr_cols[i]
survey_df[[m]] <- paste0(survey_df[[e]], survey_df[[n]], survey_df[[f]])
}
survey_df$lang <- factor(survey_df$lang,
levels = c("English", "Dutch", "French"))
survey_df$ec_gender <- factor(survey_df$ec_gender,
levels = c("Female", "Male", "Other"))
survey_df$grouping <- factor(survey_df$grouping,
levels = c("Active_member", "Citizen_casual_vol", "Citizen_not_active"))
# Split survey_df_e/survey_df_c
survey_df_e <- survey_df %>%
filter(grouping == "Active_member") %>%
select(date, lang, grouping:e_g6, ec_neighb:ec_edu)
temp_coln <- str_replace(names(survey_df_e), "^e_", "")
temp_coln <- str_replace(temp_coln, "^ec_", "")
temp_coln <- str_replace(temp_coln, "lang", "language")
names(survey_df_e) <- temp_coln
survey_df_e$age <- factor(survey_df_e$age, ordered = TRUE,
levels = c("14_and_below",
"14_17",
"18_24",
"25_34",
"35_44",
"45_54",
"55_64",
"65_74",
"75_plus"))
survey_df_e$loc <- factor(survey_df_e$loc, ordered = TRUE,
levels = c("Brussels",
"Flanders",
"Wallonia"))
survey_df_e$edu <- factor(survey_df_e$edu, ordered = TRUE,
levels = c("Primary_or_None",
"High_school",
"Technical",
"Bachelor",
"Master",
"PhD"))
survey_df_e$interest <- factor(survey_df_e$interest, ordered = TRUE,
levels = c("Not_interested",
"Could_be_interested",
"Is_ready"))
survey_df_c <- survey_df %>%
filter(grouping != "Active_member") %>%
select(date, lang, grouping, c_interest:ec_edu)
temp_coln <- str_replace(names(survey_df_c), "^c_", "")
temp_coln <- str_replace(temp_coln, "^ec_", "")
temp_coln <- str_replace(temp_coln, "lang", "language")
names(survey_df_c) <- temp_coln
survey_df_c$age <- factor(survey_df_c$age, ordered = TRUE,
levels = c("14_and_below",
"14_17",
"18_24",
"25_34",
"35_44",
"45_54",
"55_64",
"65_74",
"75_plus"))
survey_df_c$loc <- factor(survey_df_c$loc, ordered = TRUE,
levels = c("Brussels",
"Flanders",
"Wallonia"))
survey_df_c$edu <- factor(survey_df_c$edu, ordered = TRUE,
levels = c("Primary_or_None",
"High_school",
"Technical",
"Bachelor",
"Master",
"PhD"))
survey_df_c$interest <- factor(survey_df_c$interest, ordered = TRUE,
levels = c("Not_interested",
"Could_be_interested_Vol",
"Could_be_interested_Fund",
"Is_ready_Vol",
"Is_ready_Fund"))
survey_df_c$grouping <- factor(survey_df_c$grouping, ordered = TRUE,
levels = c("Citizen_not_active",
"Citizen_casual_vol"))
convert_to_fact_lik <- function(df, columns) {
for (column in columns) {
df[[column]] <- factor(df[[column]],
levels = c("Strongly disagree",
"Disagree",
"Neither agree nor disagree",
"Agree",
"Strongly agree"))
}
return(df)
}
e_likert_cols <- names(survey_df_e)[5:(length(survey_df_e) - 5)]
c_likert_cols <- names(survey_df_c)[5:(length(survey_df_c) - 6)]
survey_df_e <- convert_to_fact_lik(survey_df_e, e_likert_cols)
survey_df_c <- convert_to_fact_lik(survey_df_c, c_likert_cols)
survey_df_e <- convert_to_fact_lik(survey_df_e, "neighb")
survey_df_c <- convert_to_fact_lik(survey_df_c, "neighb")
survey_df_c$invest <- factor(survey_df_c$invest, ordered = TRUE,
levels = c("", "0", "1_99", "100_199",
"200_499", "500_plus"))
survey_df_e_j <- survey_df_e %>% select(date,
language,
grouping,
loc,
gender,
age,
edu) %>% mutate(resp_type = "Entrepreneur")
survey_df_c_j <- survey_df_c %>% select(date,
language,
grouping,
loc,
gender,
age,
edu) %>% mutate(resp_type = "Citizen")
survey_df_j <- survey_df_e_j %>% bind_rows(survey_df_c_j)
survey_df_j$resp_type <- factor(survey_df_j$resp_type,
levels = c("Entrepreneur", "Citizen"))
Now that we have imported, cleaned and tidied the data, we can start doing some exploratory analysis. In this document, we will often refer to Exploratory Data Analysis as the acronym EDA.
Note that the respondents were given the possibility to fill the questionnaire in either English, Dutch or French.
The survey was aimed at both social entrepreneurs and citizens of Brussels but each group is associated with a distinct set of hypotheses that we would like to investigate. Therefore, the first question in the survey was discriminating so that the rest of the questionnaire was split into two branches.
The discriminating question is shown in the table below along with the possible choices and the related categories in the R code.
x <- list(q = names(lut_group_eng)[c(2, 3, 1)], v = levels(survey_df$grouping)) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[3], "Category in R code")
x %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| Which of the following groups do you best identify yourself with? | Category in R code |
|---|---|
| Active member of a local citizen’s initiative, social project or social organization. | Active_member |
| Citizen with casual volonteering for a local citizen’s initiative, social project or social organization. | Citizen_casual_vol |
| Citizen with no active implication in a local citizen’s initiative, social project or social organization. | Citizen_not_active |
Let’s have a look at the balance between each group:
survey_df %>%
count(grouping) %>%
rename(grouping_count = n) %>%
mutate(grouping = reorder(grouping, grouping_count)) %>%
ggplot(aes(x = grouping, y = grouping_count)) +
geom_col(fill = "#1C9099", alpha = 0.8) +
theme_ipsum_rc(grid = "X") +
scale_fill_ipsum() +
coord_flip() +
labs(title = "Distribution of responses to discriminating question", x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
options(scipen = 999, digits = 2)
survey_df %>%
count(grouping) %>%
mutate(total = sum(n),
percent = n / total) %>%
select(grouping, n, percent) %>%
arrange(desc(n)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| grouping | n | percent |
|---|---|---|
| Citizen_not_active | 92 | 0.48 |
| Citizen_casual_vol | 60 | 0.31 |
| Active_member | 39 | 0.20 |
Then we split the respondents into two group: first the citizens, including those with no active implication in a local citizen’s initiative as well as those with casual volonteering, and second, the entrepreneurs (the active members of a local citizen’s initiative).
In the survey, we asked several questions that can be referred to as control variables, such as gender, education etc.
The next few plots will decribe our sample in terms of those variables.
First, we can create a bar plot showing the count of respondents, broken down into citizens/entrepreneurs, gender and language:
my_cols <- brewer.pal(3, "Paired")
survey_df_j %>%
ggplot(aes(x = gender, fill = language)) +
geom_bar(alpha = 0.8) +
facet_wrap(~ resp_type) +
labs(title = "Composition of respondents") +
theme_ipsum_rc() +
scale_fill_manual(values = my_cols) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
The corresponding table with the values is given here:
addmargins(table(survey_df_e$language, survey_df_e$gender) %>%
cbind(table(survey_df_c$language, survey_df_c$gender))) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14) %>%
column_spec(8, bold = T) %>%
row_spec(4, bold = T) %>%
add_header_above(c(" ", "Entrepreneur" = 3, "Citizen" = 3, " " = 1))
| Female | Male | Other | Female | Male | Other | Sum | |
|---|---|---|---|---|---|---|---|
| English | 2 | 1 | 0 | 5 | 12 | 0 | 20 |
| Dutch | 0 | 2 | 0 | 6 | 5 | 0 | 13 |
| French | 11 | 22 | 1 | 68 | 55 | 1 | 158 |
| Sum | 13 | 25 | 1 | 79 | 72 | 1 | 191 |
As we can see, the sample size for the entrepreneurs is 39 observations, and the sample size for the citizens is 152 observations.
Let’s now break the count of respondents by location and education.
my_cols <- brewer.pal(9, "BuPu")
survey_df_j %>%
ggplot(aes(x = loc, fill = edu)) +
geom_bar(alpha = 0.8) +
facet_wrap(~ resp_type) +
labs(title = "Composition of respondents") +
theme_ipsum_rc() +
scale_fill_manual(values = my_cols[2:7]) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
The corresponding table with the values is given here:
addmargins(table(survey_df_e$edu, survey_df_e$loc) %>%
cbind(table(survey_df_c$edu, survey_df_c$loc))) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14) %>%
column_spec(8, bold = T) %>%
row_spec(7, bold = T) %>%
add_header_above(c(" ", "Entrepreneur" = 3, "Citizen" = 3, " " = 1))
| Brussels | Flanders | Wallonia | Brussels | Flanders | Wallonia | Sum | |
|---|---|---|---|---|---|---|---|
| Primary_or_None | 1 | 0 | 0 | 0 | 0 | 1 | 2 |
| High_school | 2 | 0 | 0 | 7 | 0 | 0 | 9 |
| Technical | 0 | 0 | 0 | 2 | 1 | 0 | 3 |
| Bachelor | 4 | 1 | 0 | 8 | 9 | 3 | 25 |
| Master | 22 | 4 | 3 | 82 | 15 | 19 | 145 |
| PhD | 2 | 0 | 0 | 4 | 0 | 1 | 7 |
| Sum | 31 | 5 | 3 | 103 | 25 | 24 | 191 |
We can observe that there is a disproportionate share of respondents with a masters degree among both entrepreneurs and citizens. Therefore, the samples are very likely to be biased.
Finally, we can break the count of respondents by location and age:
my_cols <- brewer.pal(9, "YlGnBu")
survey_df_j %>%
ggplot(aes(x = loc, fill = age)) +
geom_bar(alpha = 0.8) +
facet_wrap(~ resp_type) +
labs(title = "Composition of respondents") +
theme_ipsum_rc() +
scale_fill_manual(values = my_cols[3:9]) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
The corresponding table with the values is given here:
addmargins(table(survey_df_e$age, survey_df_e$loc) %>%
cbind(table(survey_df_c$age, survey_df_c$loc))) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14) %>%
column_spec(8, bold = T) %>%
row_spec(10, bold = T) %>%
add_header_above(c(" ", "Entrepreneur" = 3, "Citizen" = 3, " " = 1))
| Brussels | Flanders | Wallonia | Brussels | Flanders | Wallonia | Sum | |
|---|---|---|---|---|---|---|---|
| 14_and_below | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14_17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 18_24 | 3 | 0 | 0 | 11 | 1 | 3 | 18 |
| 25_34 | 13 | 0 | 1 | 57 | 10 | 15 | 96 |
| 35_44 | 4 | 1 | 1 | 11 | 1 | 2 | 20 |
| 45_54 | 6 | 1 | 0 | 17 | 7 | 2 | 33 |
| 55_64 | 3 | 3 | 1 | 5 | 5 | 2 | 19 |
| 65_74 | 1 | 0 | 0 | 1 | 1 | 0 | 3 |
| 75_plus | 1 | 0 | 0 | 1 | 0 | 0 | 2 |
| Sum | 31 | 5 | 3 | 103 | 25 | 24 | 191 |
Our population of interest in this chapter on citizens is the whole population in the Brussels-Capital Region (therefore including all its 19 municipalities).
Concerning the citizens, the objective of this survey is to test two hypotheses that were derived from our qualitative research:
The interest to support citizens initiatives is our dependent variable (this is the outcome that BrusselsTogether would like to better understand) and is measured by the question shown in the table below. Also shown in this table is the related categories in the R code.
x <- list(q = names(lut_c_int_eng)[c(1, 3, 2, 5, 4)], v = levels(survey_df_c$interest)) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[16], "Category in R code")
x %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| Please indicate your level of interest for citizen’s initiatives: | Category in R code |
|---|---|
| I am not interested in citizen’s initiatives. | Not_interested |
| I could be interested to volunteer for a citizen’s initiative. | Could_be_interested_Vol |
| I could be interested to fund a citizen’s initiative. | Could_be_interested_Fund |
| I am ready to volunteer for a citizen’s initiative. | Is_ready_Vol |
| I am ready to fund a citizen’s initiative. | Is_ready_Fund |
The answers are distributed as shown in the figure below:
survey_df_c %>%
count(interest) %>%
rename(interest_count = n) %>%
mutate(interest = reorder(interest, interest_count)) %>%
ggplot(aes(x = interest, y = interest_count)) +
geom_col(fill = "#2C7FB8", alpha = 0.8) +
theme_ipsum_rc(grid = "X") +
scale_fill_ipsum() +
coord_flip() +
labs(title = "Distribution of responses to interest question", x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
options(scipen = 999, digits = 2)
survey_df_c %>%
count(interest) %>%
mutate(total = sum(n),
percent = n / total) %>%
select(interest, n, percent) %>%
arrange(desc(n)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| interest | n | percent |
|---|---|---|
| Could_be_interested_Vol | 48 | 0.32 |
| Could_be_interested_Fund | 46 | 0.30 |
| Is_ready_Vol | 38 | 0.25 |
| Is_ready_Fund | 13 | 0.09 |
| Not_interested | 7 | 0.05 |
The independent variables (the wish for more participation in debates on public issues for H3 and the opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms for H4) are what we think could be associated with the dependent variable (although we could not make causal inference from an observational study like this one). If the correlations are verified, then we could provide BrusselsTogether with actionnable recommendations, such as making the platform more participative or playing a role of intermediary between citizens initiatives and the local authorities.
For each of the two independent variables, we asked a set of questions which, in our opinion, should reflect their corresponding independent variable. In this case, we could say that the independent variables are latent variables that are each measured by a set of several items. We will dig more into the idea of latent variables in the chapter Confirmatory Factor Analysis.
The table below displays the items and their related variable name in the R code. The items related to the need for more participation in debates on public issues are p1 to p5 and the items related to opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms are g1 to g6.
Additionnaly, we asked the respondents whether their interest in contributing to the future of their neighborhood had increased in the past several years (variable name neighb).
All of those items were measured on a 5-points Likert scale, from Strongly_disagree to Strongly_agree.
meta <- data_frame(question = header_lookup$google_f_header[c(17:27, 29)],
variable_name = names(survey_df_c)[c(5:15, 17)])
meta %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| question | variable_name |
|---|---|
| Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project. | p1 |
| I would like to participate in debates on local public issues. | p2 |
| It is relatively easy for citizens to find places where local public issues are collectively debated. | p3 |
| I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project. | p4 |
| Face-to-face interactions are necessary for collective action. | p5 |
| Civic crowdfunding represents a direct threat to public funding of services. | g1 |
| I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. | g2 |
| The local government is able to identify the projects that are most wanted by citizens. | g3 |
| Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. | g4 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. | g5 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. | g6 |
| My interest in contributing to the future of my neighborhood has increased in the past several years. | neighb |
The answers are distributed as shown in the three plots below:
lik_plot <- function(t, my_df_names, mylevels, factor_levels, p_title) {
tab <- rownames_to_column(data.frame(unclass(t)), "rownames_col")
names(tab) <- my_df_names
tab$interest <- factor(tab$interest, ordered = T, levels = factor_levels)
numlevels <- length(tab[1,]) - 1
numcenter <- ceiling(numlevels / 2) + 1
tab$midvalues <- tab[,numcenter] / 2
tab2 <- cbind(tab[,1],
tab[,2:ceiling(numlevels / 2)],
tab$midvalues,
tab$midvalues,
tab[,numcenter:numlevels + 1])
colnames(tab2) <- c("outcome",
mylevels[1:floor(numlevels / 2)],
"midlow",
"midhigh",
mylevels[numcenter:numlevels])
numlevels <- length(mylevels) + 1
point1 <- 2
point2 <- ((numlevels) / 2) + 1
point3 <- point2 + 1
point4 <- numlevels + 1
mymin <- (ceiling(max(rowSums(tab2[,point1:point2])) * 4) / 4) * -100
mymax <- (ceiling(max(rowSums(tab2[,point3:point4])) * 4) / 4) * 100
numlevels <- length(tab[1,]) - 1
temp.rows <- length(tab2[,1])
pal <- brewer.pal((numlevels - 1),"BrBG")
pal[ceiling(numlevels / 2)] <- "#DFDFDF"
legend.pal <- pal
pal <- c(pal[1:(ceiling(numlevels / 2) - 1)],
pal[ceiling(numlevels / 2)],
pal[ceiling(numlevels / 2)],
pal[(ceiling(numlevels / 2) + 1):(numlevels - 1)])
tab3 <- melt(tab2, id = "outcome")
tab3$col <- rep(pal, each = temp.rows)
tab3$value <- tab3$value * 100
tab3$outcome <- str_wrap(tab3$outcome, width = 70)
tab3$outcome <- factor(tab3$outcome, ordered = T,
levels = str_wrap(factor_levels, width = 70))
highs <- na.omit(tab3[(length(tab3[,1]) / 2) + 1:length(tab3[,1]),])
lows <- na.omit(tab3[1:(length(tab3[,1]) / 2),])
lows <- lows[rev(rownames(lows)),]
highs$col <- factor(highs$col, ordered = T, levels = c("#018571",
"#80CDC1",
"#DFDFDF"))
lows$col <- factor(lows$col, ordered = T, levels = c("#A6611A",
"#DFC27D",
"#DFDFDF"))
highs$outcome <- factor(highs$outcome, ordered = T,
levels = rev(levels(highs$outcome)))
lows$outcome <- factor(lows$outcome, ordered = T,
levels = rev(levels(lows$outcome)))
# lows$outcome
# highs$outcome <- factor(highs$outcome, ordered = T,
# levels = tab2$outcome[order(-(tab2[,5] + tab2[,6] + tab2[,7]))])
ggplot() +
geom_bar(data = highs, alpha = 0.8, aes(x = outcome, y = value, fill = col), position = "stack", stat = "identity") +
geom_bar(data = lows, alpha = 0.8, aes(x = outcome, y = -value, fill = col), position = "stack", stat = "identity") +
geom_hline(yintercept = 0, color = c("white")) +
scale_fill_identity("Percent", labels = mylevels, breaks = legend.pal, guide = "legend") +
theme_fivethirtyeight() +
coord_flip() +
theme_ipsum_rc() +
labs(title = p_title, y = "",x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(legend.position = "bottom") +
scale_y_continuous(breaks = seq(-100,100,25), limits = c(-100,100), labels = function(x) paste0(x, "%")) +
theme(plot.margin = unit(c(1,1,1,0), "cm")) +
theme(
# panel.grid.major.x = element_blank(),
# panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()
)
}
questions_c <- header_lookup$google_f_header[17:27]
quest_code_c <- names(survey_df_c)[5:15]
survey_df_c_df <- as.data.frame(survey_df_c)
names(survey_df_c_df)[5:15] <- questions_c
my_df_names_c_p <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_c_p <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_c_p <- c("Personally funding a citizen's initiative is enough to satisfy my desire to engage in this project.",
"I would like to participate in debates on local public issues.",
"It is relatively easy for citizens to find places where local public issues are collectively debated.",
"I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project.",
"Face-to-face interactions are necessary for collective action.")
temp_c_p <- survey_df_c_df[,5:9]
temp_c_p_gathered <- temp_c_p %>%
gather(key = "question", value = "answer")
temp_c_p_gathered$answer <- factor(temp_c_p_gathered$answer, ordered = T,
levels = mylevels_c_p)
t_c_p <- prop.table(table(temp_c_p_gathered$question,
temp_c_p_gathered$answer), 1)
c_p_title <- "\n\nCitizens - Participative questions\n"
lik_plot(t_c_p, my_df_names_c_p, mylevels_c_p, factor_levels_c_p, c_p_title)
We can observe from this plot that the responses to some of the questions are polarized. Overall, it seems that most of the respondents disagree with the statement: It is relatively easy for citizens to find places where local public issues are collectively debated.
my_df_names_c_g <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_c_g <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_c_g <- c("Civic crowdfunding represents a direct threat to public funding of services.",
"I regard crowdfunded citizen's initiatives as a sort of _Do-It-Yourself government_.",
"The local government is able to identify the projects that are most wanted by citizens.",
"Civic crowdfunding platforms could play a role of intermediary between citizen's initiatives and the local government.",
"General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by the local government.",
"General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by non-profit organizations subsidized by the local government.")
temp_c_g <- survey_df_c_df[,10:15]
temp_c_g_gathered <- temp_c_g %>%
gather(key = "question", value = "answer")
temp_c_g_gathered$answer <- factor(temp_c_g_gathered$answer, ordered = T,
levels = mylevels_c_g)
t_c_g <- prop.table(table(temp_c_g_gathered$question,
temp_c_g_gathered$answer), 1)
c_g_title <- "Citizens - Government questions\n"
lik_plot(t_c_g, my_df_names_c_g, mylevels_c_g, factor_levels_c_g, c_g_title)
Here again, we can observe from this plot that the responses to some of the questions are quite polarized:
questions_neighb <- header_lookup$google_f_header[29]
quest_code_neighb <- names(survey_df_c)[17]
names(survey_df_c_df)[17] <- questions_neighb
my_df_names_c_neighb <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_c_neighb <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_c_neighb <- c("My interest in contributing to the future of my neighborhood has increased in the past several years.")
temp_c_neighb <- as.data.frame(survey_df_c_df[,17])
names(temp_c_neighb) <- "My interest in contributing to the future of my neighborhood has increased in the past several years."
temp_c_neighb_gathered <- temp_c_neighb %>%
gather(key = "question", value = "answer")
temp_c_neighb_gathered$answer <- factor(temp_c_neighb_gathered$answer,
ordered = T,
levels = mylevels_c_neighb)
t_c_neighb <- prop.table(table(temp_c_neighb_gathered$question,
temp_c_neighb_gathered$answer), 1)
c_neighb_title <- "Citizens - Interest in future of neighb. question\n"
lik_plot(t_c_neighb, my_df_names_c_neighb, mylevels_c_neighb, factor_levels_c_neighb, c_neighb_title)
Finally, we asked how much the respondents were already investing in citizens initiatives. This could be regarded as a secondary dependent variable capturing the interest in citizens initiatives.
Note that this feature was added a bit later after having launched the online survey, so the first few observations do not contain information for this question.
This question, along with its corresponding choices and categories in the R code, is shown in the table below.
x <- list(q = names(lut_c_invest_eng),
v = levels(survey_df_c$invest)[2:6]) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[28], "Category in R code")
x %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| I already invest in social projects: | Category in R code |
|---|---|
| 0€ | 0 |
| 1 - 99€ /year | 1_99 |
| 100 - 199€ /year | 100_199 |
| 200 - 499€ /year | 200_499 |
| 500€ + /year | 500_plus |
The answers are distributed as shown in the figure below:
survey_df_c %>%
filter(invest != "") %>%
count(invest) %>%
rename(invest_count = n) %>%
mutate(invest = reorder(invest, invest_count)) %>%
ggplot(aes(x = invest, y = invest_count)) +
geom_col(fill = "#31A354", alpha = 0.8) +
theme_ipsum_rc(grid = "X") +
scale_fill_ipsum() +
coord_flip() +
labs(title = "Distribution of responses to past investment question", x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
options(scipen = 999, digits = 2)
survey_df_c %>%
filter(invest != "") %>%
count(invest) %>%
mutate(total = sum(n),
percent = n / total) %>%
select(invest, n, percent) %>%
arrange(desc(n)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| invest | n | percent |
|---|---|---|
| 0 | 73 | 0.57 |
| 1_99 | 43 | 0.33 |
| 100_199 | 7 | 0.05 |
| 500_plus | 5 | 0.04 |
| 200_499 | 1 | 0.01 |
In order to have a rough sense of the relationship between, on the one hand, the answers to the indenpendent variables questions and the control variable questions, and on the other hand, the answers to the dependent variables questions, in the next few sections we are going to break down the previous vizualisations by category of answer to the dependent variables questions.
Note that we will not comment these plots here, as we will attempt to quantify these relationships later in the chapter Logistic Regression Modeling.
First, let us break down the distribution of answers related to the independent variables by interest in citizens initiatives.
While examining the plots below, we should keep in mind that, as shown previously by the distribution of the interest, the proportion of people ‘ready to fund’ and ‘not interested’ is only 9% and 5% of the sample, respectively.
Here below is a series of plots related to the ‘participative’ questions.
my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels <- c("Not_interested",
"Could_be_interested_Vol",
"Could_be_interested_Fund",
"Is_ready_Vol",
"Is_ready_Fund")
t_p1 <- prop.table(table(survey_df_c$interest, survey_df_c$p1), 1)
p1_title <- "\nPersonally funding a citizen's initiative is enough to\nsatisfy my desire to engage in this project.\n"
lik_plot(t_p1, my_df_names, mylevels, factor_levels, p1_title)
t_p2 <- prop.table(table(survey_df_c$interest, survey_df_c$p2), 1)
p2_title <- "\n\nI would like to participate in debates on local public issues.\n"
lik_plot(t_p2, my_df_names, mylevels, factor_levels, p2_title)
t_p3 <- prop.table(table(survey_df_c$interest, survey_df_c$p3), 1)
p3_title <- "\nIt is relatively easy for citizens to find places where\nlocal public issues are collectively debated.\n"
lik_plot(t_p3, my_df_names, mylevels, factor_levels, p3_title)
t_p4 <- prop.table(table(survey_df_c$interest, survey_df_c$p4), 1)
p4_title <- "I would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"
lik_plot(t_p4, my_df_names, mylevels, factor_levels, p4_title)
t_p5 <- prop.table(table(survey_df_c$interest, survey_df_c$p5), 1)
p5_title <- "\n\nFace-to-face interactions are necessary for collective action.\n"
lik_plot(t_p5, my_df_names, mylevels, factor_levels, p5_title)
Here below is a series of plots related to the ‘government’ questions.
my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels <- c("Not_interested",
"Could_be_interested_Vol",
"Could_be_interested_Fund",
"Is_ready_Vol",
"Is_ready_Fund")
t_g1 <- prop.table(table(survey_df_c$interest, survey_df_c$g1), 1)
g1_title <- "\n\nCivic crowdfunding represents a direct threat to public funding of services.\n"
lik_plot(t_g1, my_df_names, mylevels, factor_levels, g1_title)
t_g2 <- prop.table(table(survey_df_c$interest, survey_df_c$g2), 1)
g2_title <- "\nI regard crowdfunded citizen's initiatives as\na sort of _Do-It-Yourself government_.\n"
lik_plot(t_g2, my_df_names, mylevels, factor_levels, g2_title)
t_g3 <- prop.table(table(survey_df_c$interest, survey_df_c$g3), 1)
g3_title <- "\nThe local government is able to identify\nthe projects that are most wanted by citizens.\n"
lik_plot(t_g3, my_df_names, mylevels, factor_levels, g3_title)
t_g4 <- prop.table(table(survey_df_c$interest, survey_df_c$g4), 1)
g4_title <- "\nCivic crowdfunding platforms could play a role of\nintermediary between citizen's initiatives and the local government.\n"
lik_plot(t_g4, my_df_names, mylevels, factor_levels, g4_title)
t_g5 <- prop.table(table(survey_df_c$interest, survey_df_c$g5), 1)
g5_title <- "\nGeneral interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by the local government.\n"
lik_plot(t_g5, my_df_names, mylevels, factor_levels, g5_title)
t_g6 <- prop.table(table(survey_df_c$interest, survey_df_c$g6), 1)
g6_title <- "General interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by\nnon-profit organizations subsidized by the local government.\n"
lik_plot(t_g6, my_df_names, mylevels, factor_levels, g6_title)
Here below is the plot related to the ‘interest in the future of my neighborhood’ question.
t_neighb <- prop.table(table(survey_df_c$interest, survey_df_c$neighb), 1)
neighb_title <- "\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"
lik_plot(t_neighb, my_df_names, mylevels, factor_levels, neighb_title)
Just like in the section above, while examining the following plots, we should keep in mind that, as shown previously by the distribution of ‘past investment’, the proportions are highly imbalanced, since people investing 100_199, 200_499 and 500_plus represent only 5%, 1% and 4% of the sample, respectively.
Here below is a series of plots related to the ‘participative’ questions.
my_df_names_c_inv <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_c_inv <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_c_inv <- c("0",
"1_99",
"100_199",
"200_499",
"500_plus")
survey_df_c_inv <- survey_df_c %>%
filter(invest != "")
t_p1 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p1), 1)
p1_title <- "\nPersonally funding a citizen's initiative is enough to\nsatisfy my desire to engage in this project.\n"
lik_plot(t_p1, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p1_title)
t_p2 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p2), 1)
p2_title <- "\n\nI would like to participate in debates on local public issues.\n"
lik_plot(t_p2, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p2_title)
t_p3 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p3), 1)
p3_title <- "\nIt is relatively easy for citizens to find places where\nlocal public issues are collectively debated.\n"
lik_plot(t_p3, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p3_title)
t_p4 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p4), 1)
p4_title <- "I would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"
lik_plot(t_p4, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p4_title)
t_p5 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p5), 1)
p5_title <- "\nFace-to-face interactions are necessary for collective action.\n"
lik_plot(t_p1, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p1_title)
Here below is a series of plots related to the ‘government’ questions.
t_g1 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g1), 1)
g1_title <- "\n\nCivic crowdfunding represents a direct threat to public funding of services.\n"
lik_plot(t_g1, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, g1_title)
t_g2 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g2), 1)
g2_title <- "\nI regard crowdfunded citizen's initiatives as\na sort of _Do-It-Yourself government_.\n"
lik_plot(t_g2, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, g2_title)
t_g3 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g3), 1)
g3_title <- "\nThe local government is able to identify\nthe projects that are most wanted by citizens.\n"
lik_plot(t_g3, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, g3_title)
t_g4 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g4), 1)
g4_title <- "\nCivic crowdfunding platforms could play a role of\nintermediary between citizen's initiatives and the local government.\n"
lik_plot(t_g4, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, g4_title)
t_g5 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g5), 1)
g5_title <- "\nGeneral interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by the local government.\n"
lik_plot(t_p1, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p1_title)
t_g6 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g6), 1)
g6_title <- "General interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by\nnon-profit organizations subsidized by the local government.\n"
lik_plot(t_g6, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, g6_title)
Here below is the plot related to the ‘interest in the future of my neighborhood’ question.
t_neighb <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$neighb), 1)
neighb_title <- "\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"
lik_plot(t_neighb, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, neighb_title)
Here below is a plot showing the education versus answers to the independent variables questions, broken down by the independent variables questions.
survey_df_c <- survey_df_c %>%
mutate(edu_num = unclass(survey_df_c$edu))
tidy_survey_df_c <- survey_df_c %>%
select(edu_num, p1:g6, neighb) %>%
data.matrix() %>%
as_data_frame() %>%
gather(key, value, -edu_num)
tidy_survey_df_c %>%
group_by(key, value) %>%
summarize(edu_num = mean(edu_num, na.rm = TRUE)) %>%
ggplot(aes(value, edu_num, color = key)) +
geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
geom_point() +
labs(title = "Relationship btw education and answers to likert scale questions",
x = "Answer to likert scale question", y = "Mean education") +
facet_wrap(~factor(key, ordered = T,
levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
"g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
theme_ipsum_rc() +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(legend.position = "none") +
theme(plot.margin = unit(c(1,1,1,0), "cm")) +
theme(
panel.grid.minor.y = element_blank()
)
For this plot, the education is coded as an integer value as shown in the following table (see the column edu_num). This table also shows the count for each level (under the column n).
options(scipen = 999, digits = 3)
survey_df_c %>%
count(edu, edu_num) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| edu | edu_num | n |
|---|---|---|
| Primary_or_None | 1 | 1 |
| High_school | 2 | 7 |
| Technical | 3 | 3 |
| Bachelor | 4 | 20 |
| Master | 5 | 116 |
| PhD | 6 | 5 |
Here below is a reminder of the questions and their variable names.
meta <- data_frame(question = header_lookup$google_f_header[c(17:27, 29)],
variable_name = names(survey_df_c)[c(5:15, 17)])
meta %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| question | variable_name |
|---|---|
| Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project. | p1 |
| I would like to participate in debates on local public issues. | p2 |
| It is relatively easy for citizens to find places where local public issues are collectively debated. | p3 |
| I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project. | p4 |
| Face-to-face interactions are necessary for collective action. | p5 |
| Civic crowdfunding represents a direct threat to public funding of services. | g1 |
| I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. | g2 |
| The local government is able to identify the projects that are most wanted by citizens. | g3 |
| Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. | g4 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. | g5 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. | g6 |
| My interest in contributing to the future of my neighborhood has increased in the past several years. | neighb |
We can observe from this plot that there is a possible positive relationship between education and the extent to which respondents agreed with the statement: Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project, although this should be confirmed/quantified by the statistical modeling that we will perform in the next chapter.
Here below is a plot showing the mean age versus answers to the independent variables questions, broken down by the independent variables questions.
survey_df_c <- survey_df_c %>%
mutate(age_num = ifelse(age == "14_and_below", 12,
ifelse(age == "14_17", 16,
ifelse(age == "18_24", 20,
ifelse(age == "25_34", 30,
ifelse(age == "35_44", 40,
ifelse(age == "45_54", 50,
ifelse(age == "55_64", 60,
ifelse(age == "65_74", 70, 80)))))))))
tidy_survey_df_c <- survey_df_c %>%
select(age_num, p1:g6, neighb) %>%
data.matrix() %>%
as_data_frame() %>%
gather(key, value, -age_num)
tidy_survey_df_c %>%
group_by(key, value) %>%
summarize(age_num = mean(age_num, na.rm = TRUE)) %>%
ggplot(aes(value, age_num, color = key)) +
geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
geom_point() +
labs(title = "Relationship btw age and answers to likert scale questions",
x = "Answer to likert scale question", y = "Mean age") +
facet_wrap(~factor(key, ordered = T,
levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
"g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
theme_ipsum_rc() +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(legend.position = "none") +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
For this plot, the age group is coded as an integer value as shown in the following table (see the column age_num). This table also shows the count for each level (under the column n).
options(scipen = 999, digits = 3)
survey_df_c %>%
count(age, age_num) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| age | age_num | n |
|---|---|---|
| 18_24 | 20 | 15 |
| 25_34 | 30 | 82 |
| 35_44 | 40 | 14 |
| 45_54 | 50 | 26 |
| 55_64 | 60 | 12 |
| 65_74 | 70 | 2 |
| 75_plus | 80 | 1 |
Similar to the previous plot, we can observe from this plot that there is a possible positive relationship between age and the extent to which the participant agreed with the statement: Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project, although this should be confirmed/quantified with the statistical modeling that we will perform in the next chapter.
In this chapter, we are going to fit a series of multiple binary logistic regression models (which are the most basic model used to predict a categorical response, such as our two dependent variables ‘interest’ and ‘past investment’). This is useful because these models will allow us to see which predictors (in our case the independent variables and control variables) are associated with the likelihood of people to be interested in supporting citizens initiatives.
This kind of models can only predict a target variable which has two classes, thus the term binary. However, our dependent variables have more than two classes (for example, there are five classes for the ‘interest’ variable, from ‘Not interested’ up to ‘Ready to fund’). Therefore, we will fit several models one after the other after having split our dependent variable classes into two groups in several ways. This is called the ‘one versus rest’ approach. There are of course more complex models that can handle more than two classes in the target variable, but we will keep things simple here.
Note that designing a survey with ordinal categorical variables (i.e. categorical variables which have a natural order) such as questions with choices on a 5-point Likert scale or age or education makes it easier to interpret the results of the models, as we will see next.
In this section, we will fit a model with ‘Interest’ (in supporting citizens initiatives) as the target variable and all of the independent variables, as well as the control variables, as the explanatory variables.
In this model, we will split the Interest variable into the two following groups:
Not_interested;Could_be_interested_Vol, Could_be_interested_Fund, Is_ready_Vol and Is_ready_Fund).Also, we filter out observations with gender other than Female and Male. Then, Female are represented by the integer value 1 and Male are represented by the integer value 2 (there is no particular reason for this, we could do it the other way around).
Answers to Likert scale questions range from 1 (Strongly_disagree) to 5 (Strongly agree).
Answers to the question about the age of the respondent range from 1 (Under_14) to 9 (75_plus).
Answers to the education question range from 1 (Primary_or_None) to 6 (PhD).
Finally, answers to the location and language questions have three levels:
Brussels = 1, Flanders = 2, Wallonia = 3;English = 1, Dutch = 2, French = 3.Thus, the first few rows of the input to the model look like this:
survey_df_c_mat <- survey_df_c %>%
filter(gender %in% c("Female", "Male")) %>%
mutate(interest_binary = ifelse(interest == "Not_interested",
0,
1)) %>%
select(interest_binary, language, p1:edu, -invest) %>%
data.matrix() %>%
as_data_frame()
survey_df_c_mat %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| interest_binary | language | p1 | p2 | p3 | p4 | p5 | g1 | g2 | g3 | g4 | g5 | g6 | neighb | loc | gender | age | edu |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 2 | 2 | 2 | 4 | 4 | 2 | 3 | 2 | 4 | 2 | 2 | 4 | 1 | 1 | 4 | 5 |
| 1 | 1 | 3 | 3 | 2 | 2 | 4 | 1 | 4 | 2 | 4 | 4 | 4 | 4 | 1 | 2 | 6 | 6 |
| 1 | 3 | 2 | 4 | 2 | 3 | 5 | 4 | 4 | 2 | 5 | 3 | 3 | 5 | 1 | 2 | 5 | 4 |
| 1 | 3 | 3 | 3 | 3 | 3 | 4 | 2 | 3 | 3 | 4 | 4 | 4 | 3 | 2 | 2 | 8 | 3 |
| 1 | 3 | 3 | 3 | 2 | 4 | 1 | 4 | 3 | 2 | 3 | 3 | 4 | 4 | 1 | 2 | 6 | 5 |
| 1 | 2 | 4 | 5 | 4 | 4 | 4 | 2 | 5 | 2 | 4 | 4 | 2 | 4 | 1 | 1 | 4 | 5 |
Let us now examine the output of the model:
glm1 <- glm(interest_binary ~ ., family = "binomial",
data = survey_df_c_mat)
summary(glm1)
##
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_c_mat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.00013589 0.00000002 0.00000002 0.00000002 0.00011225
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -712.10 260294.49 0.00 1
## language 104.17 25318.23 0.00 1
## p1 -2.82 8753.77 0.00 1
## p2 -6.25 13890.32 0.00 1
## p3 -34.86 21457.88 0.00 1
## p4 34.80 17255.02 0.00 1
## p5 -23.91 12686.23 0.00 1
## g1 11.38 18054.07 0.00 1
## g2 35.47 16975.20 0.00 1
## g3 88.56 23372.16 0.00 1
## g4 87.05 24097.56 0.00 1
## g5 28.90 10846.46 0.00 1
## g6 -32.60 14709.20 0.00 1
## neighb 103.56 20100.27 0.01 1
## loc -28.77 15039.57 0.00 1
## gender -134.84 59506.52 0.00 1
## age -38.48 11978.54 0.00 1
## edu 46.11 9663.82 0.00 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 56.669538353445 on 150 degrees of freedom
## Residual deviance: 0.000000084893 on 133 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
This is a lot of output, but we are going to focus on the Coefficients table in the middle.
In order to be able to interpret anything meaningful, we first have to look at the p-values represented in the last column of the Ceofficients table (Pr(>|z|)). The p-value is used to test for statistical significance. We won’t go into the theory of inference for regression, but simply put, low p-values (below 0.05) generally mean that we can rule out that the effect of the coefficient is due to chance. In other words, we are looking for coefficients for which the p-value is below 0.05.
In the present case, all p-values are about 1, so this model is useless. In fact, the model did not converge. This is probably due to the high imbalance in the target variable. Indeed, as already mentionned, the class Not_interested represents only 5% of the sample or 7 out of 152 observations.
Therefore, we cannot infer anything regarding factors that could be associated with the likelihood of people not to be interested in citizens initiatives as opposed to being somewhat interested (‘Could be interested’) or definetly interested (‘Is ready’).
In this second model, we are filtering out the observations where people are not interested in citizens initiatives, therefore we are now focusing on a subset of the data. Then we split the observations into the following two groups, related to the interest variable:
Could_be_interested_Vol and Could_be_interested_Fund;Is_ready_Vol and Is_ready_Fund.The rest of the model is similar to the model 1.
survey_df_c_mat <- survey_df_c %>%
filter(gender %in% c("Female", "Male")) %>%
filter(interest != "Not_interested") %>%
mutate(interest_binary = ifelse(interest %in% c("Is_ready_Vol",
"Is_ready_Fund"),
1,
0)) %>%
select(interest_binary, language, p1:edu, -invest) %>%
data.matrix() %>%
as_data_frame()
glm2 <- glm(interest_binary ~ ., family = "binomial",
data = survey_df_c_mat)
summary(glm2)
##
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_c_mat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.708 -0.860 -0.505 0.952 2.208
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.0048 3.1302 -0.64 0.5219
## language -0.1270 0.3240 -0.39 0.6950
## p1 -0.3370 0.2209 -1.53 0.1272
## p2 0.7007 0.2650 2.64 0.0082 **
## p3 0.1394 0.2150 0.65 0.5166
## p4 -0.6186 0.2582 -2.40 0.0166 *
## p5 -0.2407 0.2119 -1.14 0.2560
## g1 0.2197 0.2415 0.91 0.3629
## g2 0.1062 0.2018 0.53 0.5985
## g3 0.3367 0.2103 1.60 0.1093
## g4 -0.5030 0.2553 -1.97 0.0488 *
## g5 -0.1043 0.2613 -0.40 0.6898
## g6 -0.0468 0.3032 -0.15 0.8774
## neighb 0.4887 0.2442 2.00 0.0453 *
## loc -0.0153 0.2885 -0.05 0.9577
## gender 0.1171 0.4288 0.27 0.7847
## age 0.3703 0.1818 2.04 0.0417 *
## edu 0.0259 0.2613 0.10 0.9211
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 187.20 on 143 degrees of freedom
## Residual deviance: 151.49 on 126 degrees of freedom
## AIC: 187.5
##
## Number of Fisher Scoring iterations: 5
Now we have several variables for which the p-value of the coefficient is less than 0.05. Let’s extract them into the following table, which is ordered by estimate in descending order:
options(scipen = 999, digits = 3)
glm2 %>%
tidy() %>%
filter(p.value < 0.05) %>%
arrange(desc(estimate)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| p2 | 0.701 | 0.265 | 2.64 | 0.008 |
| neighb | 0.489 | 0.244 | 2.00 | 0.045 |
| age | 0.370 | 0.182 | 2.04 | 0.042 |
| g4 | -0.503 | 0.255 | -1.97 | 0.049 |
| p4 | -0.619 | 0.258 | -2.40 | 0.017 |
We can interpret this table as follows:
p2 is above zero, then people agreeing with the statement: I would like to participate in debates on local public issues are more likely to be ‘ready’ to fund or volunteer (they are more likely to have answered ‘I am ready to…’ rather than ‘I could be interested to…’), after controlling for the other variables in the model.Let’s have a look at a plot similar to what we did in the Exploratory Data Analysis (EDA), but without the category Not_interested:
t_p2 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p2[survey_df_c$interest != "Not_interested"]), 1)
p2_title <- "\n\n\nI would like to participate in debates on local public issues.\n"
lik_plot(t_p2, my_df_names, mylevels, factor_levels, p2_title)
This confirms visually the positive relationship identified in point 1.
neighb is positive, then people agreeing with the statement: My interest in contributing to the future of my neighborhood has increased in the past several years are more likely to be ‘ready’ to fund or volunteer, after controlling for the other variables in the model.Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested:
t_neighb <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$neighb[survey_df_c$interest != "Not_interested"]), 1)
neighb_title <- "\n\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"
lik_plot(t_neighb, my_df_names, mylevels, factor_levels, neighb_title)
This confirms visually the positive relationship identified in point 2.
age is above zero, then older people are more likely to be ‘ready’ to fund or volunteer, after controlling for the other variables in the model.Let’s have a look at a bar chart representing the proportions of interest conditionned on the age group:
my_cols <- brewer.pal(9, "YlGnBu")
survey_df_c %>%
filter(interest != "Not_interested") %>%
ggplot(aes(x = interest, fill = age)) +
geom_bar(alpha = 0.8, position = "fill") +
labs(title = "Proportions of interest conditionned on age group", y = "Proportion") +
theme_ipsum_rc() +
scale_fill_manual(values = my_cols[3:9]) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
This plot seems to confirm the positive relationship identified in point 3.
g4 is negative, then people agreeing with the statement: Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government are less likely to be ‘ready’ to fund or volunteer (they are more likely to have answered ‘I could be interested to…’), after controlling for the other variables in the model.Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested:
t_g4 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$g4[survey_df_c$interest != "Not_interested"]), 1)
g4_title <- "\n\nCivic crowdfunding platforms could play a role of\nintermediary between citizen's initiatives and the local government.\n"
lik_plot(t_g4, my_df_names, mylevels, factor_levels, g4_title)
This confirms visually the negative relationship identified in point 4.
p4 is negative, then people agreeing with the statement: I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project are less likely to be ‘ready’ to fund or volunteer (they are more likely to have answered ‘I could be interested to…’), after controlling for the other variables in the model.Let’s try to confirm this visually:
t_p4 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p4[survey_df_c$interest != "Not_interested"]), 1)
p4_title <- "\nI would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"
lik_plot(t_p4, my_df_names, mylevels, factor_levels, p4_title)
The negative relationship identified in this last point is less clear in this plot.
Let’s build a third model in which we are filtering out the observations where people are not interested in citizens initiatives, and where we split the observations into the two following groups, related to the interest variable:
Could_be_interested_Vol and Is_ready_Vol;Could_be_interested_Fund and Is_ready_Fund.In other words, we are now interested to know what factors are associated with the difference between those willing to volunteer and those willing to fund.
The rest of the model is similar to the model 1.
survey_df_c_mat <- survey_df_c %>%
filter(gender %in% c("Female", "Male")) %>%
filter(interest != "Not_interested") %>%
mutate(interest_binary = ifelse(interest %in% c("Is_ready_Vol",
"Could_be_interested_Vol"),
0,
1)) %>%
select(interest_binary, language, p1:edu, -invest) %>%
data.matrix() %>%
as_data_frame()
glm3 <- glm(interest_binary ~ ., family = "binomial",
data = survey_df_c_mat)
summary(glm3)
##
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_c_mat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.743 -0.869 -0.497 0.993 2.120
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.8005 2.9507 -0.95 0.343
## language -0.0840 0.3341 -0.25 0.801
## p1 0.4846 0.2081 2.33 0.020 *
## p2 -0.2998 0.2137 -1.40 0.161
## p3 -0.2339 0.2104 -1.11 0.266
## p4 0.5874 0.2482 2.37 0.018 *
## p5 -0.2857 0.1980 -1.44 0.149
## g1 0.1327 0.2354 0.56 0.573
## g2 -0.1770 0.1968 -0.90 0.369
## g3 -0.2764 0.2049 -1.35 0.177
## g4 0.3355 0.2667 1.26 0.208
## g5 -0.3865 0.2565 -1.51 0.132
## g6 0.0634 0.2883 0.22 0.826
## neighb 0.0558 0.2108 0.26 0.791
## loc 0.0818 0.2714 0.30 0.763
## gender 0.2298 0.4192 0.55 0.583
## age 0.0862 0.1690 0.51 0.610
## edu 0.2534 0.2742 0.92 0.355
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 194.91 on 143 degrees of freedom
## Residual deviance: 161.57 on 126 degrees of freedom
## AIC: 197.6
##
## Number of Fisher Scoring iterations: 4
We have two variables for which the p-value of the coefficient is less than 0.05. Let’s extract them into the following table, which is ordered by estimate in descending order:
options(scipen = 999, digits = 3)
glm3 %>%
tidy() %>%
filter(p.value < 0.05) %>%
arrange(desc(estimate)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| p4 | 0.587 | 0.248 | 2.37 | 0.018 |
| p1 | 0.485 | 0.208 | 2.33 | 0.020 |
We can interpret this table as follows:
p4 is positive, then people agreeing with the statement: I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project are more likely to be willing to fund rather than to volunteer, after controlling for the other variables in the model.Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested and with the class labels for the dependent variable re-ordered (on the y axis):
t_p4 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p4[survey_df_c$interest != "Not_interested"]), 1)
p4_title <- "\nI would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"
factor_levels_m3 <- c("Could_be_interested_Vol", "Is_ready_Vol", "Could_be_interested_Fund", "Is_ready_Fund")
lik_plot(t_p4, my_df_names, mylevels, factor_levels_m3, p4_title)
This confirms visually the positive relationship identified in point 1, and it seems to make more sense, visually, than the relationship identified in point 5 when interpreting the model 2.
p1 is positive, then people agreeing with the statement: Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project are more likely to be willing to fund rather than to volunteer, after controlling for the other variables in the model.Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested and with the class labels for the dependent variable re-ordered:
t_p1 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p1[survey_df_c$interest != "Not_interested"]), 1)
p1_title <- "\n\nPersonally funding a citizen's initiative is enough to\nsatisfy my desire to engage in this project.\n"
factor_levels_m3 <- c("Could_be_interested_Vol", "Is_ready_Vol", "Could_be_interested_Fund", "Is_ready_Fund")
lik_plot(t_p1, my_df_names, mylevels, factor_levels_m3, p1_title)
This confirms visually the positive relationship identified in point 2.
In our last model, we are going to change our target variable. Instead of modeling the interest in citizens initiatives explained by all of the other variables, we will be modeling the respondents’ past investment in citizens initiatives explained by all of the other variables (independent variables and control variables).
In order for this target to be binary, we will split the observations into the two following groups, related to the invest variable:
0, that is, the respondent did not report already investing in citizens initiatives;1_99, 100_199, 200_499, 500_plus).All of the other settings are the same as for the model 1.
survey_df_c_mat <- survey_df_c %>%
filter(gender %in% c("Female", "Male")) %>%
filter(invest != "") %>%
mutate(invest_binary = ifelse(invest == "0", 0, 1)) %>%
select(invest_binary, language, p1:edu, -invest) %>%
data.matrix() %>%
as_data_frame()
glm4 <- glm(invest_binary ~ ., family = "binomial",
data = survey_df_c_mat)
summary(glm4)
##
## Call:
## glm(formula = invest_binary ~ ., family = "binomial", data = survey_df_c_mat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.738 -0.971 -0.514 0.950 2.413
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.79136 2.92178 0.27 0.7865
## language 0.02150 0.34018 0.06 0.9496
## p1 0.20232 0.21531 0.94 0.3474
## p2 0.65330 0.24957 2.62 0.0089 **
## p3 0.05238 0.21171 0.25 0.8046
## p4 -0.42688 0.25583 -1.67 0.0952 .
## p5 0.03403 0.21480 0.16 0.8741
## g1 -0.49469 0.27726 -1.78 0.0744 .
## g2 0.10823 0.20656 0.52 0.6003
## g3 0.05635 0.20918 0.27 0.7876
## g4 -0.25723 0.25098 -1.02 0.3054
## g5 0.08609 0.26612 0.32 0.7463
## g6 -0.40861 0.31412 -1.30 0.1933
## neighb 0.35190 0.22921 1.54 0.1247
## loc -0.00567 0.27556 -0.02 0.9836
## gender -0.51515 0.44109 -1.17 0.2428
## age -0.13074 0.18507 -0.71 0.4799
## edu -0.01773 0.27137 -0.07 0.9479
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 175.44 on 127 degrees of freedom
## Residual deviance: 146.42 on 110 degrees of freedom
## AIC: 182.4
##
## Number of Fisher Scoring iterations: 4
We have a single variable for which the p-value of the coefficient is less than 0.05. Let’s extract it into the following table:
options(scipen = 999, digits = 3)
glm4 %>%
tidy() %>%
filter(p.value < 0.05) %>%
arrange(desc(estimate)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| p2 | 0.653 | 0.25 | 2.62 | 0.009 |
We can interpret this result as follows:
p2 is positive, then people agreeing with the statement: I would like to participate in debates on local public issues are more likely to be already investing in citizens initiatives than not investing, after controlling for the other variables in the model.Let’s re-examine a plot that we already showed in the Exploratory Data Analysis:
t_p2 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p2), 1)
p2_title <- "\n\nI would like to participate in debates on local public issues.\n"
lik_plot(t_p2, my_df_names_c_inv, mylevels_c_inv,
factor_levels_c_inv, p2_title)
As a reminder from the EDA, observations where the investment is 0 represent about 57% of the sample.
From this plot we can visually confirm the result from the model 4.
This chapter is optionnal for the reader, because we are not confident that our results are correct. We intend to update this part of the document in the future as our knowledge about dimensionality reduction techniques, factor analyis and structural equation modeling evolves.
By design, there are questions that are intended to collectively measure one latent variable. For example, there are five questions related to the wish from the citizens for more participation in debates on public issues. Also, there are six questions related to whether a civic crowdfunding platform could be regarded as an intermediary between the citizens initiatives and the local authorities.
Confirmatory Factor Analysis (CFA) allows us to reduce the dimensionality of several items into their corresponding hypothesized latent variable. The resulting latent variable is a continuous numeric variable that captures the variability of its corresponding items.
Let’s try it out.
As a reminder, here below are the questions related to the two latent variables (p for participative and g for relationship between citizens initiatives and the government):
meta <- data_frame(question = header_lookup$google_f_header[17:27],
variable = names(survey_df_c)[5:15])
meta %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| question | variable |
|---|---|
| Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project. | p1 |
| I would like to participate in debates on local public issues. | p2 |
| It is relatively easy for citizens to find places where local public issues are collectively debated. | p3 |
| I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project. | p4 |
| Face-to-face interactions are necessary for collective action. | p5 |
| Civic crowdfunding represents a direct threat to public funding of services. | g1 |
| I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. | g2 |
| The local government is able to identify the projects that are most wanted by citizens. | g3 |
| Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. | g4 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. | g5 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. | g6 |
After converting the categorical variables into an integer value, we can have a look at the first ten observations:
survey_df_c_mat <- survey_df_c %>%
filter(gender %in% c("Female", "Male")) %>%
select(language:edu) %>%
data.matrix() %>%
as_data_frame()
survey_df_c_mat %>%
select(interest:neighb, age, edu, loc, gender) %>%
head(n = 10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| interest | p1 | p2 | p3 | p4 | p5 | g1 | g2 | g3 | g4 | g5 | g6 | invest | neighb | age | edu | loc | gender |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 2 | 2 | 4 | 4 | 2 | 3 | 2 | 4 | 2 | 2 | 1 | 4 | 4 | 5 | 1 | 1 |
| 2 | 3 | 3 | 2 | 2 | 4 | 1 | 4 | 2 | 4 | 4 | 4 | 1 | 4 | 6 | 6 | 1 | 2 |
| 4 | 2 | 4 | 2 | 3 | 5 | 4 | 4 | 2 | 5 | 3 | 3 | 1 | 5 | 5 | 4 | 1 | 2 |
| 2 | 3 | 3 | 3 | 3 | 4 | 2 | 3 | 3 | 4 | 4 | 4 | 1 | 3 | 8 | 3 | 2 | 2 |
| 3 | 3 | 3 | 2 | 4 | 1 | 4 | 3 | 2 | 3 | 3 | 4 | 1 | 4 | 6 | 5 | 1 | 2 |
| 3 | 4 | 5 | 4 | 4 | 4 | 2 | 5 | 2 | 4 | 4 | 2 | 1 | 4 | 4 | 5 | 1 | 1 |
| 3 | 2 | 4 | 2 | 5 | 2 | 2 | 2 | 2 | 4 | 2 | 4 | 1 | 2 | 4 | 2 | 1 | 2 |
| 4 | 3 | 4 | 2 | 3 | 4 | 3 | 2 | 3 | 4 | 2 | 2 | 1 | 4 | 5 | 5 | 1 | 1 |
| 4 | 4 | 5 | 2 | 4 | 4 | 2 | 1 | 2 | 2 | 2 | 1 | 1 | 5 | 4 | 5 | 1 | 1 |
| 3 | 4 | 2 | 1 | 4 | 4 | 2 | 2 | 2 | 5 | 3 | 3 | 1 | 2 | 4 | 5 | 1 | 1 |
Note that the following variables were not displayed in this table because they didn’t fit into the webpage: language and grouping.
In order to perform the CFA, we will use the lavaan (latent variable analysis) R package.
First we need to describe the model that we want to fit. We will use the variable names partcip_will and civic_crowd_gov to refer to the wish from the citizens for more participation in debates on public issues and whether a civic crowdfunding platform could be regarded as an intermediary between the citizens initiatives and the local authorities, respectively.
model <- "
partcip_will =~ p2 + p1 + p3 + p4 + p5
civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
"
cat(model)
##
## partcip_will =~ p2 + p1 + p3 + p4 + p5
## civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
Next, we can fit this model using the cfa() function and look at the summary statistics.
fit1 <- cfa(model = model, data = survey_df_c_mat)
summary(fit1, standardized = TRUE)
## lavaan (0.5-23.1097) converged normally after 93 iterations
##
## Number of observations 151
##
## Estimator ML
## Minimum Function Test Statistic 60.181
## Degrees of freedom 43
## P-value (Chi-square) 0.043
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## partcip_will =~
## p2 1.000 0.270 0.247
## p1 0.294 0.284 1.035 0.300 0.080 0.076
## p3 -0.322 0.267 -1.205 0.228 -0.087 -0.090
## p4 4.006 4.250 0.943 0.346 1.084 1.144
## p5 0.180 0.290 0.620 0.535 0.049 0.044
## civic_crowd_gov =~
## g4 1.000 0.106 0.127
## g1 0.408 0.840 0.486 0.627 0.043 0.050
## g2 1.901 1.730 1.099 0.272 0.202 0.179
## g3 -1.867 1.647 -1.134 0.257 -0.198 -0.198
## g5 5.622 4.254 1.322 0.186 0.596 0.635
## g6 6.347 4.874 1.302 0.193 0.673 0.802
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## partcip_will ~~
## civic_crowd_gv 0.008 0.011 0.741 0.459 0.288 0.288
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .p2 1.126 0.149 7.573 0.000 1.126 0.939
## .p1 1.078 0.124 8.688 0.000 1.078 0.994
## .p3 0.921 0.106 8.681 0.000 0.921 0.992
## .p4 -0.276 1.180 -0.234 0.815 -0.276 -0.308
## .p5 1.192 0.137 8.692 0.000 1.192 0.998
## .g4 0.683 0.079 8.635 0.000 0.683 0.984
## .g1 0.760 0.088 8.681 0.000 0.760 0.998
## .g2 1.224 0.143 8.578 0.000 1.224 0.968
## .g3 0.964 0.113 8.552 0.000 0.964 0.961
## .g5 0.526 0.110 4.775 0.000 0.526 0.597
## .g6 0.252 0.120 2.101 0.036 0.252 0.357
## partcip_will 0.073 0.087 0.841 0.400 1.000 1.000
## civic_crowd_gv 0.011 0.017 0.669 0.504 1.000 1.000
Then, we can extract the standardized parameter estimates as follows.
options(scipen = 999, digits = 3)
sfit1 <- standardizedsolution(fit1)
sfit1 %>%
filter(op == "=~") %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| lhs | op | rhs | est.std | se | z | pvalue |
|---|---|---|---|---|---|---|
| partcip_will | =~ | p2 | 0.247 | 0.145 | 1.706 | 0.088 |
| partcip_will | =~ | p1 | 0.076 | 0.081 | 0.938 | 0.348 |
| partcip_will | =~ | p3 | -0.090 | 0.085 | -1.059 | 0.290 |
| partcip_will | =~ | p4 | 1.144 | 0.575 | 1.989 | 0.047 |
| partcip_will | =~ | p5 | 0.044 | 0.075 | 0.597 | 0.551 |
| civic_crowd_gov | =~ | g4 | 0.127 | 0.094 | 1.350 | 0.177 |
| civic_crowd_gov | =~ | g1 | 0.050 | 0.095 | 0.521 | 0.602 |
| civic_crowd_gov | =~ | g2 | 0.179 | 0.093 | 1.919 | 0.055 |
| civic_crowd_gov | =~ | g3 | -0.198 | 0.093 | -2.127 | 0.033 |
| civic_crowd_gov | =~ | g5 | 0.635 | 0.095 | 6.656 | 0.000 |
| civic_crowd_gov | =~ | g6 | 0.802 | 0.107 | 7.504 | 0.000 |
Finally, we can extract the most important fit statistics as follows.
fit_1 <- fitmeasures(fit1, c("npar", "chisq", "df", "cfi", "rmsea", "srmr"))
cbind(read.table(text = names(fit_1)), fit_1) %>%
select(fit_1) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| fit_1 | |
|---|---|
| npar | 23.0000 |
| chisq | 60.1812 |
| df | 43.0000 |
| cfi | 0.8058 |
| rmsea | 0.0514 |
| srmr | 0.0731 |
Modification indices can be requested in order to determine if changes should be made to the model such as adding error covariance terms or removing poor performing items, in order to improve the model. Here, we sorted the output by mi in decreasing order.
modificationindices(fit1) %>%
arrange(desc(mi)) %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| lhs | op | rhs | mi | epc | sepc.lv | sepc.all | sepc.nox |
|---|---|---|---|---|---|---|---|
| p1 | ~~ | p5 | 12.89 | -0.331 | -0.331 | -0.291 | -0.291 |
| p3 | ~~ | g2 | 6.65 | 0.224 | 0.224 | 0.206 | 0.206 |
| g1 | ~~ | g2 | 6.54 | 0.202 | 0.202 | 0.206 | 0.206 |
| p5 | ~~ | g1 | 5.25 | 0.177 | 0.177 | 0.186 | 0.186 |
| p2 | ~~ | g4 | 2.34 | 0.108 | 0.108 | 0.119 | 0.119 |
| p5 | ~~ | g4 | 2.25 | 0.110 | 0.110 | 0.121 | 0.121 |
| p2 | ~~ | p5 | 2.25 | 0.142 | 0.142 | 0.118 | 0.118 |
| g4 | ~~ | g6 | 2.14 | -0.090 | -0.090 | -0.129 | -0.129 |
| p2 | ~~ | g6 | 1.71 | -0.092 | -0.092 | -0.100 | -0.100 |
| p2 | ~~ | p1 | 1.59 | -0.116 | -0.116 | -0.102 | -0.102 |
In the lavaan package, ~~ means covariance terms. Let’s add p1 ~~ p5 and see if the model improves.
model2 <- "
partcip_will =~ p2 + p1 + p3 + p4 + p5
civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
p1 ~~ p5
"
cat(model2)
##
## partcip_will =~ p2 + p1 + p3 + p4 + p5
## civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
## p1 ~~ p5
fit2 <- cfa(model = model2, data = survey_df_c_mat)
fits <- list(fit_1 = fit1, fit_2 = fit2)
round(sapply(fits, function(X)
fitmeasures(X, c("npar", "chisq", "df", "cfi", "rmsea", "srmr"))), 3) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| fit_1 | fit_2 | |
|---|---|---|
| npar | 23.000 | 24.000 |
| chisq | 60.181 | 46.640 |
| df | 43.000 | 42.000 |
| cfi | 0.806 | 0.948 |
| rmsea | 0.051 | 0.027 |
| srmr | 0.073 | 0.063 |
As we can see, the \(\chi^2\) has improved by 12, the comparative fit index (cfi) has increased by 0.12, the root mean square error of approximation (rmsea) has improved by 0.019 and the standardized root mean square residual (srmr) has improved by 0.008.
Instead of qualitatively assessing this improvement, we can perform a significance test to compare the two models.
anova(fit1, fit2)
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit2 42 4572 4644 46.6
## fit1 43 4583 4653 60.2 13.5 1 0.00023 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As evidenced by the p-value that is below 0.05, it seems that adding the covariance term significantly improved the model. We will then consider this latter model for the rest of this analysis.
We can now save the latent variables and add them to our dataset. In the following two tables we show, for the first six observations, the items p1 to p5 along with their corresponding latent variable, as well as the items g1 to g6 along with their own latent variable.
pfit2 <- data.frame(predict(fit2))
survey_df_c_mat2 <- cbind(survey_df_c_mat, pfit2)
survey_df_c <- survey_df_c %>%
filter(gender %in% c("Female", "Male")) %>%
cbind(partcip_will = survey_df_c_mat2$partcip_will,
civic_crowd_gov = survey_df_c_mat2$civic_crowd_gov)
survey_df_c %>%
select(p1:p5, partcip_will) %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| p1 | p2 | p3 | p4 | p5 | partcip_will |
|---|---|---|---|---|---|
| Disagree | Disagree | Disagree | Agree | Agree | 0.136 |
| Neither agree nor disagree | Neither agree nor disagree | Disagree | Disagree | Agree | -0.489 |
| Disagree | Agree | Disagree | Neither agree nor disagree | Strongly agree | -0.169 |
| Neither agree nor disagree | Neither agree nor disagree | Neither agree nor disagree | Neither agree nor disagree | Agree | -0.169 |
| Neither agree nor disagree | Neither agree nor disagree | Disagree | Agree | Strongly disagree | 0.149 |
| Agree | Strongly agree | Agree | Agree | Agree | 0.154 |
survey_df_c %>%
select(g1:g6, civic_crowd_gov) %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| g1 | g2 | g3 | g4 | g5 | g6 | civic_crowd_gov |
|---|---|---|---|---|---|---|
| Disagree | Neither agree nor disagree | Disagree | Agree | Disagree | Disagree | -0.101 |
| Strongly disagree | Agree | Disagree | Agree | Agree | Agree | 0.105 |
| Agree | Agree | Disagree | Strongly agree | Neither agree nor disagree | Neither agree nor disagree | 0.014 |
| Disagree | Neither agree nor disagree | Neither agree nor disagree | Agree | Agree | Agree | 0.107 |
| Agree | Neither agree nor disagree | Disagree | Neither agree nor disagree | Neither agree nor disagree | Agree | 0.088 |
| Disagree | Strongly agree | Disagree | Agree | Agree | Disagree | -0.019 |
Let’s see now if we can better assess the relationship between our independent variables (partcip_will and civic_crowd_gov) and our dependent variable, which is the degree of interest from citizens in citizens initiatives.
Our independent variables are now reduced to a continuous numeric value, thus we can use boxplots to explore the relationships visually.
survey_df_c %>%
ggplot(aes(y = partcip_will, x = interest, fill = interest)) +
geom_boxplot(alpha = .5) +
labs(title = "Relationship btw particip_will and interest",
x = "") +
theme_ipsum_rc() +
scale_fill_ipsum() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
coord_flip() +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
survey_df_c %>%
ggplot(aes(y = civic_crowd_gov, x = interest, fill = interest)) +
geom_boxplot(alpha = .5) +
labs(title = "Relationship btw civic_crowd_gov and interest",
x = "") +
theme_ipsum_rc() +
scale_fill_ipsum() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
coord_flip() +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
From these two plots, we can qualitatively assess that:
Again, we are not claiming here that these results are correct and we will not include them in our conclusions. We will refine this part of the document further in the next following months.
Our sample frame here is all active members (whom we simply call ‘entrepreneurs’ in this document) of a local citizens initiative, social project or social organization in the Brussels-Capital Region.
The EDA of the control variables for the entrepreneurs was shared with the EDA of the control variables for the citizens. Therefore, please refer back to that section if needed.
Concerning the entrepeneurs (active members of a local citizens initiative, social project or social organization), the objective of this survey is to test two hypotheses that were derived from our qualitative research:
The interest to join a civic crowdfunding platform is thus our dependent variable (this is the outcome that BrusselsTogether would like to better understand) and is measured by the question shown in the table below. Also shown in this table is the related categories in the R code.
x <- list(q = names(lut_e_int_eng), v = levels(survey_df_e$interest)) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[4], "Category in R code")
x %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| As an active member of a local citizen’s initiative, social project or social organization, please indicate your level of interest in joining a civic crowdfunding platform: | Category in R code |
|---|---|
| I am not interested in joining a civic crowdfunding platform. | Not_interested |
| I could be interested to join a civic crowdfunding platform. | Could_be_interested |
| I am ready to join a civic crowdfunding platform. | Is_ready |
The answers are distributed as shown in the figure below:
survey_df_e %>%
count(interest) %>%
rename(interest_count = n) %>%
mutate(interest = reorder(interest, interest_count)) %>%
ggplot(aes(x = interest, y = interest_count)) +
geom_col(fill = "#2C7FB8", alpha = 0.8) +
theme_ipsum_rc(grid = "X") +
scale_fill_ipsum() +
coord_flip() +
labs(title = "Distribution of responses to interest question", x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(plot.margin = unit(c(1,1,1,0), "cm"))
options(scipen = 999, digits = 2)
survey_df_e %>%
count(interest) %>%
mutate(total = sum(n),
percent = n / total) %>%
select(interest, n, percent) %>%
arrange(desc(n)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| interest | n | percent |
|---|---|---|
| Could_be_interested | 31 | 0.79 |
| Is_ready | 7 | 0.18 |
| Not_interested | 1 | 0.03 |
The independent variables (the appeal of a more participative platform for H1 and the opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms for H2) are what we think could be associated with the dependent variable (although we could not make causal inference from an observational study like this one). If the correlations are verified, then we could provide BrusselsTogether with actionnable recommendations, such as making the platform more participative or playing a role of intermediary between citizens initiatives and the local authorities.
For each of the two independent variables, we asked a set of questions which, in our opinion, should reflect their corresponding independent variable. In this case, we could say that the independent variables are latent variables that are each measured by a set of several items.
The table below displays the items along with their related variable name in the R code. The items related to the appeal of a more participative platform are p1 to p5 and the items related to the opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms are g1 to g6.
Additionnaly, we asked the respondents whether their interest in contributing to the future of their neighborhood had increased in the past several years (variable name neighb).
All of those items were measured on a 5-points Likert scale, from Strongly_disagree to Strongly_agree.
meta <- data_frame(question = header_lookup$google_f_header[c(5:15, 29)],
variable_name = names(survey_df_e)[c(5:16)])
meta %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| question | variable_name |
|---|---|
| Face-to-face interactions are necessary for collective action. | p1 |
| For most citizens on a civic crowdfunding platform, personnally funding a citizen’s initiative is enough to satisfy their desire to engage in this project. | p2 |
| Citizens would like to participate in debates on local public issues that they care about. | p3 |
| It is relatively easy for citizens to find places where local public issues are collectively debated. | p4 |
| I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially. | p5 |
| Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government. | g1 |
| I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. | g2 |
| The local government is able to identify the projects that are most wanted by citizens. | g3 |
| Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. | g4 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. | g5 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. | g6 |
| My interest in contributing to the future of my neighborhood has increased in the past several years. | neighb |
The answers are distributed as shown in the three plots below:
questions_e <- header_lookup$google_f_header[5:15]
quest_code_e <- names(survey_df_e)[5:15]
survey_df_e_df <- as.data.frame(survey_df_e)
names(survey_df_e_df)[5:15] <- questions_e
my_df_names_e_p <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_e_p <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_e_p <- c("Face-to-face interactions are necessary for collective action.",
"For most citizens on a civic crowdfunding platform, personnally funding a citizen's initiative is enough to satisfy their desire to engage in this project.",
"Citizens would like to participate in debates on local public issues that they care about.",
"It is relatively easy for citizens to find places where local public issues are collectively debated.",
"I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially.")
temp_e_p <- survey_df_e_df[,5:9]
temp_e_p_gathered <- temp_e_p %>%
gather(key = "question", value = "answer")
temp_e_p_gathered$answer <- factor(temp_e_p_gathered$answer, ordered = T,
levels = mylevels_e_p)
t_e_p <- prop.table(table(temp_e_p_gathered$question,
temp_e_p_gathered$answer), 1)
e_p_title <- "\n\nEntrepreneurs - Participative questions\n"
lik_plot(t_e_p, my_df_names_e_p, mylevels_e_p, factor_levels_e_p, e_p_title)
We can observe from this plot that the responses to most questions are polarized. In particular:
my_df_names_e_g <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_e_g <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_e_g <- c("Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government.",
"I regard crowdfunded citizen's initiatives as a sort of_Do-It-Yourself government_.",
"The local government is able to identify the projects that are most wanted by citizens.",
"Civic crowdfunding platforms could play a role of intermediary between citizen's initiatives and the local government.",
"General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by the local government.",
"General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by non-profit organizations subsidized by the local government.")
temp_e_g <- survey_df_e_df[,10:15]
temp_e_g_gathered <- temp_e_g %>%
gather(key = "question", value = "answer")
temp_e_g_gathered$answer <- factor(temp_e_g_gathered$answer, ordered = T,
levels = mylevels_e_g)
t_e_g <- prop.table(table(temp_e_g_gathered$question,
temp_e_g_gathered$answer), 1)
e_g_title <- "Entrepreneurs - Government questions\n"
lik_plot(t_e_g, my_df_names_e_g, mylevels_e_g, factor_levels_e_g, e_g_title)
Once again, we can observe from this plot that the responses to some questions are polarized. In particular:
questions_neighb <- header_lookup$google_f_header[29]
quest_code_neighb <- names(survey_df_e)[16]
names(survey_df_e_df)[16] <- questions_neighb
my_df_names_e_neighb <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels_e_neighb <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels_e_neighb <- c("My interest in contributing to the future of my neighborhood has increased in the past several years.")
temp_e_neighb <- as.data.frame(survey_df_e_df[,16])
names(temp_e_neighb) <- "My interest in contributing to the future of my neighborhood has increased in the past several years."
temp_e_neighb_gathered <- temp_e_neighb %>%
gather(key = "question", value = "answer")
temp_e_neighb_gathered$answer <- factor(temp_e_neighb_gathered$answer,
ordered = T,
levels = mylevels_e_neighb)
t_e_neighb <- prop.table(table(temp_e_neighb_gathered$question,
temp_e_neighb_gathered$answer), 1)
e_neighb_title <- "Entrepreneurs - Interest in future of neighb. question\n"
lik_plot(t_e_neighb, my_df_names_e_neighb, mylevels_e_neighb, factor_levels_e_neighb, e_neighb_title)
The response to this question is polarized once again, as entrepreneur seem to generally agree with the statement: My interest in contributing to the future of my neighborhood has increased in the past several years.
In order to have a rough sense of the relationship between, on the one hand, the answers to the indenpendent variables questions and the control variable questions, and on the other hand, the answers to the dependent variable question, in this section we are going to break down the previous vizualisations by category of answer to the dependent variable question. We will try to confirm those relationships quantitatively later on, in the chapter Logistic Regression Modeling.
Let us then break down the distribution of answers related to the independent variables by interest in joining a civic crowdfunding platform.
As for the analysis of the citizens responses, we should keep in mind that people ‘ready to join’ and ‘not interested’ represent only a fraction of the sample.
Here below is a series of plots related to the ‘participative’ questions.
my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels <- c("Not_interested",
"Could_be_interested",
"Is_ready")
t_p1 <- prop.table(table(survey_df_e$interest, survey_df_e$p1), 1)
p1_title <- "\n\n\n\nFace-to-face interactions are necessary for collective action.\n"
lik_plot(t_p1, my_df_names, mylevels, factor_levels, p1_title)
t_p2 <- prop.table(table(survey_df_e$interest, survey_df_e$p2), 1)
p2_title <- "\n\nFor most citizens on a civic crowdfunding platform,\npersonnally funding a citizen's initiative is enough to\nsatisfy their desire to engage in this project.\n"
lik_plot(t_p2, my_df_names, mylevels, factor_levels, p2_title)
t_p3 <- prop.table(table(survey_df_e$interest, survey_df_e$p3), 1)
p3_title <- "\n\n\nCitizens would like to participate in debates\non local public issues that they care about.\n"
lik_plot(t_p3, my_df_names, mylevels, factor_levels, p3_title)
t_p4 <- prop.table(table(survey_df_e$interest, survey_df_e$p4), 1)
p4_title <- "\n\n\nIt is relatively easy for citizens to find places where\nlocal public issues are collectively debated.\n"
lik_plot(t_p4, my_df_names, mylevels, factor_levels, p4_title)
t_p5 <- prop.table(table(survey_df_e$interest, survey_df_e$p5), 1)
p5_title <- "\n\nI would welcome the possibility for citizens on a crowdfunding platform\nto express their disagreement about some aspects\nof the projects they contribute to financially.\n"
lik_plot(t_p5, my_df_names, mylevels, factor_levels, p5_title)
Here below is a series of plots related to the ‘government’ questions.
my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
factor_levels <- c("Not_interested",
"Could_be_interested",
"Is_ready")
t_g1 <- prop.table(table(survey_df_e$interest, survey_df_e$g1), 1)
g1_title <- "\n\n\nCivic crowdfunding represents a direct threat to the funding of\npublic goods and services by the local government.\n"
lik_plot(t_g1, my_df_names, mylevels, factor_levels, g1_title)
t_g2 <- prop.table(table(survey_df_e$interest, survey_df_e$g2), 1)
g2_title <- "\n\n\nI regard crowdfunded citizen's initiatives\nas a sort of_Do-It-Yourself government_.\n"
lik_plot(t_g2, my_df_names, mylevels, factor_levels, g2_title)
t_g3 <- prop.table(table(survey_df_e$interest, survey_df_e$g3), 1)
g3_title <- "\n\n\nThe local government is able to identify\nthe projects that are most wanted by citizens.\n"
lik_plot(t_g3, my_df_names, mylevels, factor_levels, g3_title)
t_g4 <- prop.table(table(survey_df_e$interest, survey_df_e$g4), 1)
g4_title <- "\n\n\nCivic crowdfunding platforms could play a role of intermediary\nbetween citizen's initiatives and the local government.\n"
lik_plot(t_g4, my_df_names, mylevels, factor_levels, g4_title)
t_g5 <- prop.table(table(survey_df_e$interest, survey_df_e$g5), 1)
g5_title <- "\n\n\nGeneral interest issues are addressed more efficiently by\ncrowdfunded citizen's initiatives than by the local government.\n"
lik_plot(t_g5, my_df_names, mylevels, factor_levels, g5_title)
t_g6 <- prop.table(table(survey_df_e$interest, survey_df_e$g6), 1)
g6_title <- "\n\nGeneral interest issues are addressed more efficiently by\ncrowdfunded citizen's initiatives than by\nnon-profit organizations subsidized by the local government.\n"
lik_plot(t_g6, my_df_names, mylevels, factor_levels, g6_title)
Here below is the plot related to the ‘interest in the future of my neighborhood’ question.
t_neighb <- prop.table(table(survey_df_e$interest, survey_df_e$neighb), 1)
neighb_title <- "\n\n\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"
lik_plot(t_neighb, my_df_names, mylevels, factor_levels, neighb_title)
Here below is a plot showing the education versus answers to the independent variables questions, broken down by the independent variables questions.
survey_df_e <- survey_df_e %>%
mutate(edu_num = unclass(survey_df_e$edu))
tidy_survey_df_e <- survey_df_e %>%
select(edu_num, p1:g6, neighb) %>%
data.matrix() %>%
as_data_frame() %>%
gather(key, value, -edu_num)
tidy_survey_df_e %>%
group_by(key, value) %>%
summarize(edu_num = mean(edu_num, na.rm = TRUE)) %>%
ggplot(aes(value, edu_num, color = key)) +
geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
geom_point() +
labs(title = "Relationship btw education and answers to likert scale questions",
x = "Answer to likert scale question", y = "Mean education") +
facet_wrap(~factor(key, ordered = T,
levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
"g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
theme_ipsum_rc() +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(legend.position = "none") +
theme(plot.margin = unit(c(1,1,1,0), "cm")) +
theme(
panel.grid.minor.y = element_blank()
)
For this plot, the education is coded as an integer value as shown in the following table (see the column edu_num). This table also shows the count for each level (under the column n).
options(scipen = 999, digits = 3)
survey_df_e %>%
count(edu, edu_num) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| edu | edu_num | n |
|---|---|---|
| Primary_or_None | 1 | 1 |
| High_school | 2 | 2 |
| Bachelor | 4 | 5 |
| Master | 5 | 29 |
| PhD | 6 | 2 |
Here below is a reminder of the questions and their variable names.
meta <- data_frame(question = header_lookup$google_f_header[c(5:15, 29)],
variable_name = names(survey_df_e)[c(5:16)])
meta %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| question | variable_name |
|---|---|
| Face-to-face interactions are necessary for collective action. | p1 |
| For most citizens on a civic crowdfunding platform, personnally funding a citizen’s initiative is enough to satisfy their desire to engage in this project. | p2 |
| Citizens would like to participate in debates on local public issues that they care about. | p3 |
| It is relatively easy for citizens to find places where local public issues are collectively debated. | p4 |
| I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially. | p5 |
| Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government. | g1 |
| I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. | g2 |
| The local government is able to identify the projects that are most wanted by citizens. | g3 |
| Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. | g4 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. | g5 |
| General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. | g6 |
| My interest in contributing to the future of my neighborhood has increased in the past several years. | neighb |
We can observe from this plot that there is a possible positive relationship between education and the extent to which respondents agreed with the statement: The local government is able to identify the projects that are most wanted by citizens, although this should be confirmed by the statistical modeling that we will perform in the next chapter.
Here below is a plot showing the mean age versus answers to the independent variables questions, broken down by the independent variables questions.
survey_df_e <- survey_df_e %>%
mutate(age_num = ifelse(age == "14_and_below", 12,
ifelse(age == "14_17", 16,
ifelse(age == "18_24", 20,
ifelse(age == "25_34", 30,
ifelse(age == "35_44", 40,
ifelse(age == "45_54", 50,
ifelse(age == "55_64", 60,
ifelse(age == "65_74", 70, 80)))))))))
tidy_survey_df_e <- survey_df_e %>%
select(age_num, p1:g6, neighb) %>%
data.matrix() %>%
as_data_frame() %>%
gather(key, value, -age_num)
tidy_survey_df_e %>%
group_by(key, value) %>%
summarize(age_num = mean(age_num, na.rm = TRUE)) %>%
ggplot(aes(value, age_num, color = key)) +
geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
geom_point() +
labs(title = "Relationship btw age and answers to likert scale questions",
x = "Answer to likert scale question", y = "Mean age") +
facet_wrap(~factor(key, ordered = T,
levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
"g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
theme_ipsum_rc() +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(hjust = 0)) +
theme(legend.position = "none") +
theme(plot.margin = unit(c(1,1,1,0), "cm")) +
theme(
panel.grid.minor.y = element_blank()
)
For this plot, the age group is coded as an integer value as shown in the following table (see the column age_num). This table also shows the count for each level (under the column n).
options(scipen = 999, digits = 3)
survey_df_e %>%
count(age, age_num) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = F, position = "left", font_size = 14)
| age | age_num | n |
|---|---|---|
| 18_24 | 20 | 3 |
| 25_34 | 30 | 14 |
| 35_44 | 40 | 6 |
| 45_54 | 50 | 7 |
| 55_64 | 60 | 7 |
| 65_74 | 70 | 1 |
| 75_plus | 80 | 1 |
From this plot we cannot spot any clear trend.
Similar to what we did in Part I for the citizens, we are going to fit a series of multiple binary logistic regression models in order to see which predictors (in our case the independent variables) are associated with the likelihood of entrepreneurs to be interested in joining a civic crowdfunding platforms.
Here again, our dependent variable has more than two classes (three in this case: Not_interested, Could_be_interested and Is_Ready). Therefore, we will fit several models after having split our dependent variable classes into two groups, in several ways.
In this section and also in the next one, we will fit a model with Interest (in joining a civic crowdfunding platform) as the target variable and all of the independent variables, as well as the control variables, as the explanatory variables.
For the first model, we will split the Interest variable into the two following groups:
Not_interested;Could_be_interested and Is_Ready).Also, we filter out observations with gender other than Female and Male. Then, Females are represented by the integer value 1 and Males are represented by the integer value 2.
Answers to Likert scale questions range from 1 (Strongly_disagree) to 5 (Strongly_agree).
Answers to Likert scale questions range from 1 (Strongly_disagree) to 5 (Strongly agree).
Answers to the question about the age of the respondent range from 1 (Under_14) to 9 (75_plus).
Answers to the education question range from 1 (Primary_or_None) to 6 (PhD).
Finally, answers to the location and language questions have three levels:
Brussels = 1, Flanders = 2, Wallonia = 3;English = 1, Dutch = 2, French = 3.Thus, the first few rows of the input to the model look like this:
survey_df_e_mat <- survey_df_e %>%
filter(gender %in% c("Female", "Male")) %>%
mutate(interest_binary = ifelse(interest == "Not_interested", 0, 1)) %>%
select(interest_binary, language, p1:edu) %>%
data.matrix() %>%
as_data_frame()
survey_df_e_mat %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = T, font_size = 14)
| interest_binary | language | p1 | p2 | p3 | p4 | p5 | g1 | g2 | g3 | g4 | g5 | g6 | neighb | loc | gender | age | edu |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 5 | 3 | 5 | 1 | 3 | 4 | 3 | 1 | 3 | 3 | 1 | 5 | 1 | 2 | 6 | 5 |
| 1 | 3 | 4 | 4 | 4 | 2 | 2 | 2 | 4 | 2 | 4 | 3 | 4 | 4 | 1 | 2 | 5 | 1 |
| 1 | 3 | 4 | 3 | 5 | 2 | 5 | 2 | 2 | 1 | 2 | 3 | 2 | 4 | 2 | 2 | 5 | 4 |
| 1 | 1 | 5 | 5 | 5 | 2 | 3 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 1 | 1 | 5 | 6 |
| 1 | 3 | 4 | 4 | 3 | 4 | 2 | 2 | 2 | 4 | 3 | 2 | 2 | 5 | 3 | 2 | 7 | 5 |
| 1 | 3 | 3 | 4 | 3 | 2 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 1 | 1 | 4 | 2 |
Let us now examine the output of the model:
glm_e_1 <- glm(interest_binary ~ ., family = "binomial",
data = survey_df_e_mat)
summary(glm_e_1)
##
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_e_mat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.000008053 0.000000021 0.000000021 0.000002826 0.000007360
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 43.232 3077549.046 0 1
## language 5.733 358960.435 0 1
## p1 3.129 187509.506 0 1
## p2 4.139 212730.988 0 1
## p3 -7.843 143647.712 0 1
## p4 -2.787 119859.452 0 1
## p5 -1.936 237211.576 0 1
## g1 3.348 328149.285 0 1
## g2 -1.828 173352.396 0 1
## g3 -1.600 189327.074 0 1
## g4 7.895 238950.246 0 1
## g5 -0.410 181919.554 0 1
## g6 -9.881 142554.362 0 1
## neighb -3.069 317087.240 0 1
## loc 0.296 458929.396 0 1
## gender 3.281 834734.769 0 1
## age -1.442 279644.120 0 1
## edu 2.479 82403.834 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9.24862260353271 on 37 degrees of freedom
## Residual deviance: 0.00000000037884 on 20 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
In the present case, all p-values are about 1, so this model is useless. In fact, the model did not converge. This is probably due to the high imbalance in the response variable. Indeed, as already mentionned, the class Not_interested represents only 5% of the sample or 1 out of 39 observations.
Therefore, we cannot infer anything regarding factors that are associated with the likelihood of entrepreneurs not to be interested in joining a civic crowdfunding platform as opposed to being somewhat interested (Could_be_interested) or definetly interested (Is_Ready).
In this second model, we are filtering out the observations where entrepeneurs are not interested in joining a civic crowdfunding platform, therefore we are now focusing on a subset of the data. Then we split the observations into the two following groups, related to the interest variable:
Could_be_interested;Is_Ready.The rest of the model is similar to the model 1.
survey_df_e_mat <- survey_df_e %>%
filter(gender %in% c("Female", "Male")) %>%
filter(interest != "Not_interested") %>%
mutate(interest_binary = ifelse(interest == "Is_ready", 1, 0)) %>%
select(interest_binary, language, p1:edu) %>%
data.matrix() %>%
as_data_frame()
glm_e_2 <- glm(interest_binary ~ ., family = "binomial",
data = survey_df_e_mat)
summary(glm_e_2)
##
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_e_mat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.000018263 -0.000003896 -0.000000021 -0.000000021 0.000017733
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -274.70 2119914.75 0 1
## language -50.46 359104.27 0 1
## p1 27.77 262051.60 0 1
## p2 -34.72 184517.74 0 1
## p3 1.86 295410.54 0 1
## p4 40.90 202430.88 0 1
## p5 20.45 122281.08 0 1
## g1 30.20 122849.81 0 1
## g2 -23.19 250528.64 0 1
## g3 -4.76 186271.93 0 1
## g4 24.36 128692.86 0 1
## g5 -9.68 109096.52 0 1
## g6 -4.43 141918.69 0 1
## neighb 20.86 248870.19 0 1
## loc 16.70 489558.79 0 1
## gender 80.43 136489.25 0 1
## age -25.97 61604.18 0 1
## edu 8.59 332359.40 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 35.8933405491689 on 36 degrees of freedom
## Residual deviance: 0.0000000017871 on 19 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
Again, the model did not converge. This is maybe due to the fact that the sample size for the entrepreneurs is too small overall to be able to build a logistic regression model.
It is now time to draw conclusions from this study.
First, we should mention that it is not possible to generalize those conclusions to the population of interest, since the respondents were not sampled at random. Furthermore, the sample is biased, as evidenced by the over-representation of respondents with a masters degree.
Let us remind ourselves the hypotheses that we hoped to confirm with this study.
Based on the observations we have made throughout this document, we can conclude the following:
So, overall, although we are not claiming here that all of our hypotheses are verified, there are multiple indications that a civic crowdfunding platform such as BrusselsTogether would be more attractive to both citizens and entrepreneurs should it strive for being a truly participative platform as well as a communication channel, an intermediary between the initiatives and the local authorities.