Summary

This aim of this document is to communicate an analysis of survey data in the context of a the market research done by students at ULB/Sovay Brussels School on citizen initiatives and, in particular, on civic crowdfunding platforms in the region of Brussels. This research originated from a need by BrusselsTogether to better understand the various factors that could be associated with the success of their own civic crowdfunding platform.

The present document provides all of the associated code written in the R programming language in order for anyone to be able to check and reproduce the findings presented here.

The online survey was launched in April 2018 by the students and was designed based on a qualitative research which was conducted beforehand.

All of the input files, as well as the R code, can also be retrieved on our Github page.

In the survey, the first step was to split the respondents between two categories: citizens and ‘entrepreneurs’ (i.e. active members of a local citizens initiative, social project or social organization). Therefore, in this document we are going to analyze those two groups separately.

Data Preprocessing

Importing libraries

library(readr)
library(lubridate)
library(stringr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(hrbrthemes)
library(knitr)
library(kableExtra)
library(lavaan)
library(reshape2)
library(RColorBrewer)
library(ggthemes)
library(tibble)
library(broom)
library(rebus)

Importing the input files

There are only three input files (Citizen initiative.csv which comes straight from Google Forms, Replacement_of_csv_header.txt and header_lookup.txt).

survey_df <- read_delim("inputs/Citizen initiative.csv",
                   delim = ",", skip = 1,
                   col_names = FALSE)

temp_header <- read_delim("inputs/Replacement_of_csv_header.txt",
                   delim = ",")

colnames(survey_df) <- colnames(temp_header)

header_lookup <- read_delim("inputs/header_lookup.txt", delim = ";")

Preprocessing

The preprocessing code (essentially cleaning and tidying of the data) is available below. If interested, the reader can have a look by clicking on the Code button on the right.

# Dates preprocessing
survey_df$date <- str_replace(string = survey_df$date,
                              pattern = "M GMT.*$",
                              replacement = "M")

survey_df$date <- str_replace_all(string = survey_df$date,
                                  pattern = "/",
                                  replacement = "-")

survey_df$date <- ymd_hms(survey_df$date)

survey_df <- survey_df %>% 
                filter(date > "2018-04-27 20:00:00")

# Language preprocessing
lut_lang <- c("English" = "English",
              "Nederlands" = "Dutch",
              "Français" = "French")

survey_df$lang <- lut_lang[survey_df$lang]

# Grouping preprocessing
lut_group_eng <- c("Citizen with no active implication in a local citizen's initiative, social project or social organization." = "Citizen_not_active",
                   "Active member of a local citizen's initiative, social project or social organization." = "Active_member",
                   "Citizen with casual volonteering for a local citizen's initiative, social project or social organization." = "Citizen_casual_vol")

lut_group_nl <- c("Een burger die geen deel uitmaakt van een lokaal burgerinitiatief, een sociaal project of een sociale organisatie." = "Citizen_not_active",
                  "Een burger die occasioneel bijdraagt als vrijliger van een lokaal burgerinitiatief, een sociaal project of een sociale organisatie." = "Citizen_casual_vol",
                  "Een actief lid van een lokaal burgerinitiatief, van een sociaal project of een sociale organisatie." = "Active_member")

lut_group_fr <- c("Citoyen.ne sans appartenance à une initiative citoyenne locale, un projet social ou une organisation sociale." = "Citizen_not_active",
                  "Citoyen.ne contribuant de manière occasionnelle en tant que volontaire à une initiative citoyenne locale, un projet social ou une organisation sociale." = "Citizen_casual_vol",
                  "Membre actif d'une initiative citoyenne locale, d'un projet social ou d'une organisation sociale." = "Active_member")


survey_df$grouping_eng <- lut_group_eng[survey_df$grouping_eng]
survey_df$grouping_nl <- lut_group_nl[survey_df$grouping_nl]
survey_df$grouping_fr <- lut_group_fr[survey_df$grouping_fr]

# Entrepreuneurs interest preprocessing
lut_e_int_eng <- c("I am not interested in joining a civic crowdfunding platform." = "Not_interested",
                   "I could be interested to join a civic crowdfunding platform." = "Could_be_interested",
                   "I am ready to join a civic crowdfunding platform." = "Is_ready")

survey_df$e_interest_nl <- str_replace(survey_df$e_interest_nl,
                                       pattern = "een " %R% ANY_CHAR %R% "civic crowdfunding" %R% ANY_CHAR %R% " platform",
                                       replacement = "een civic crowdfunding platform")

lut_e_int_nl <- c("Ik ben niet geinteresseerd bij het toetreden van een civic crowdfunding platform." = "Not_interested",
                  "Ik zou geinteresseerd kunnen zijn in het toetreden van een civic crowdfunding platform." = "Could_be_interested",
                  "Ik wil deel uitmaken van een civic crowdfunding platform." = "Is_ready")

lut_e_int_fr <- c("Je ne suis pas intéressé.e par l'adhésion à une plateforme de crowdfunding citoyen." = "Not_interested",
                  "Je pourrais être intéressé.e par l'adhésion à une plateforme de crowdfunding citoyen." = "Could_be_interested",
                  "Je suis prêt.e à rejoindre une plateforme de crowdfunding citoyen." = "Is_ready")

survey_df$e_interest_eng <- lut_e_int_eng[survey_df$e_interest_eng]
survey_df$e_interest_nl <- lut_e_int_nl[survey_df$e_interest_nl]
survey_df$e_interest_fr <- lut_e_int_fr[survey_df$e_interest_fr]

# Likert scale questions preprocessing
lut_likert_nl <- c("Helemaal niet akkoord" = "Strongly disagree",
                   "Niet akkoord" = "Disagree",
                   "Geen mening" = "Neither agree nor disagree",
                   "Akkoord" = "Agree",
                   "Helemaal akkoord" = "Strongly agree",
                   "Helemaal niet akkoord" = "Strongly disagree",
                   "niet akkoord" = "Disagree",
                   "geen mening" = "Neither agree nor disagree",
                   "akkoord" = "Agree",
                   "helemaal akkoord" = "Strongly agree")

lut_likert_fr <- c("Pas du tout d'accord" = "Strongly disagree",
                   "Pas d'accord" = "Disagree",
                   "Sans opinion" = "Neither agree nor disagree",
                   "D'accord" = "Agree",
                   "Tout à fait d'accord" = "Strongly agree")

survey_df[] <- lapply(colnames(survey_df),
                      function(x) {
                        y <- survey_df[[x]]
                        if (str_detect(x, pattern = "^._.[0-9]_nl$")) {
                          y <- lut_likert_nl[y]
                          }
                        return(y)
                        })

survey_df[] <- lapply(colnames(survey_df),
                      function(x) {
                        y <- survey_df[[x]]
                        if (str_detect(x, pattern = "^._.[0-9]_fr$")) {
                          y <- lut_likert_fr[y]
                        }
                        return(y)
                      })

# Citizen interest preprocessing
lut_c_int_eng <- c("I am not interested in citizen's initiatives." = "Not_interested",
                   "I could be interested to fund a citizen's initiative." = "Could_be_interested_Fund",
                   "I could be interested to volunteer for a citizen's initiative." = "Could_be_interested_Vol",
                   "I am ready to fund a citizen's initiative." = "Is_ready_Fund",
                   "I am ready to volunteer for a citizen's initiative." = "Is_ready_Vol")

lut_c_int_nl <- c("Ik ben niet geinteresseerd door burgerinitiatieven" = "Not_interested",
                  "Ik zou eventueel geinteresseerd kunnen zijn door het financieren van burgerinitiatieven" = "Could_be_interested_Fund",
                  "Ik zou eventueel geinteresseerd kunnen zijn door het financieel te ondersteunen van burgerinitiatieven" = "Could_be_interested_Fund",
                  "Ik zou eventueel geinteresseerd zijn om een vrijwillige bijdrage te leveren aan een burgerinitiatief" = "Could_be_interested_Vol",
                  "Ik ben klaar om burgerinitiatieven te financieren" = "Is_ready_Fund",
                  "Ik ben klaar om vrijwilig mee te draaien in een burgerinitiatief" = "Is_ready_Vol")

lut_c_int_fr <- c("Je ne suis pas intéressé.e par les initiatives citoyennes" = "Not_interested",
                  "Je pourrais être éventuellement intéressé.e par le financement d'une initiative citoyenne" = "Could_be_interested_Fund",
                  "Je pourrais être éventuellement intéressé.e par une contribution en tant que volontaire dans une initiative citoyenne" = "Could_be_interested_Vol",
                  "Je suis prêt.e à financer une initiative citoyenne" = "Is_ready_Fund",
                  "Je suis prêt.e à contribuer en tant que volontaire dans une initiative citoyenne" = "Is_ready_Vol")

survey_df$c_interest_eng <- lut_c_int_eng[survey_df$c_interest_eng]
survey_df$c_interest_nl <- lut_c_int_nl[survey_df$c_interest_nl]
survey_df$c_interest_fr <- lut_c_int_fr[survey_df$c_interest_fr]

# Citizen already investing question preprocessing
lut_c_invest_eng <- c("0\u20AC" = "0",
                      "1 - 99\u20AC /year" = "1_99",
                      "100 - 199\u20AC /year" = "100_199",
                      "200 - 499\u20AC /year" = "200_499",
                      "500\u20AC + /year" = "500_plus")

lut_c_invest_nl <- c("0€" = "0",
                     "1 - 99€ /jaar" = "1_99",
                     "100 - 199€ /jaar" = "100_199",
                     "200 - 499€ /jaar" = "200_499",
                     "500€ + /jaar" = "500_plus")

lut_c_invest_fr <- c("0€" = "0",
                     "1 - 99€ /an" = "1_99",
                     "100 - 199€ /an" = "100_199",
                     "200 - 499€ /an" = "200_499",
                     "500€ + /an" = "500_plus")

# survey_df$c_invest_eng <- lut_c_invest_eng[survey_df$c_invest_eng]
# survey_df$c_invest_nl <- lut_c_invest_nl[survey_df$c_invest_nl]
# survey_df$c_invest_fr <- lut_c_invest_fr[survey_df$c_invest_fr]

replace_amount <- function(x, a, b) {
  mask <- str_detect(x, pattern = paste0("^", a))
  x[which(mask)] <- b
  return(x)
}

survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
                                         a = "0",
                                         b = "0")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
                                         a = "1 ",
                                         b = "1_99")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
                                         a = "10",
                                         b = "100_199")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
                                         a = "2",
                                         b = "200_499")
survey_df$c_invest_eng <- replace_amount(survey_df$c_invest_eng,
                                         a = "5",
                                         b = "500_plus")


survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
                                         a = "0",
                                         b = "0")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
                                         a = "1 ",
                                         b = "1_99")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
                                         a = "10",
                                         b = "100_199")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
                                         a = "2",
                                         b = "200_499")
survey_df$c_invest_nl <- replace_amount(survey_df$c_invest_nl,
                                         a = "5",
                                         b = "500_plus")


survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
                                         a = "0",
                                         b = "0")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
                                         a = "1 ",
                                         b = "1_99")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
                                         a = "10",
                                         b = "100_199")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
                                         a = "2",
                                         b = "200_499")
survey_df$c_invest_fr <- replace_amount(survey_df$c_invest_fr,
                                         a = "5",
                                         b = "500_plus")



# Contrib neighb question preprocessing
survey_df$ec_neighb_nl <- lut_likert_nl[survey_df$ec_neighb_nl]
survey_df$ec_neighb_fr <- lut_likert_fr[survey_df$ec_neighb_fr]

# Location question preprocessing
lut_loc_eng <- c("Brussels" = "Brussels",
                 "Outside Belgium " = "Brussels",
                 "Flanders" = "Flanders",
                 "Wallonia" = "Wallonia")


lut_loc_nl <- c("Brussel" = "Brussels",
                "Brussels" = "Brussels",
                "Vlanderen" = "Flanders",
                "Vlaanderen" = "Flanders",
                "Wallonië" = "Wallonia")

lut_loc_fr <- c("Bruxelles-Capitale" = "Brussels",
                "cameroun/ngaoundéré" = "Brussels",
                "Flandre" = "Flanders",
                "Wallonie" = "Wallonia")

survey_df$ec_loc_eng <- lut_loc_eng[survey_df$ec_loc_eng]
survey_df$ec_loc_nl <- lut_loc_nl[survey_df$ec_loc_nl]
survey_df$ec_loc_fr <- lut_loc_fr[survey_df$ec_loc_fr]

# Gender question preprocessing
lut_gen_eng <- c("Female" = "Female",
                 "Male" = "Male",
                 "Other/I prefer keep it for myself" = "Other")

lut_gen_nl <- c("een vrow" = "Female",
                "een man" = "Male",
                "anders/Ik hou het liever voor mezelf." = "Other")

lut_gen_fr <- c("Une femme" = "Female",
                "Un homme" = "Male",
                "Autre/Je préfère le garder pour moi" = "Other")

survey_df$ec_gender_eng <- lut_gen_eng[survey_df$ec_gender_eng]
survey_df$ec_gender_nl <- lut_gen_nl[survey_df$ec_gender_nl]
survey_df$ec_gender_fr <- lut_gen_fr[survey_df$ec_gender_fr]

# Age question preprocessing
lut_age_eng <- c("Under 14 years old." = "14_and_below",
                 "14-17 years old." = "14_17",
                 "18-24 years old." = "18_24",
                 "25-34 years old." = "25_34",
                 "35-44 years old." = "35_44",
                 "45-54 years old." = "45_54",
                 "55-64 years old." = "55_64",
                 "65-74 years old." = "65_74",
                 "75 years or older." = "75_plus")

lut_age_nl <- c("onder 14 jaar" = "14_and_below",
                "14-17 jaar" = "14_17",
                "18-24jaar" = "18_24",
                "25-34jaar" = "25_34",
                "35-44jaar" = "35_44",
                "45-54jaar" = "45_54",
                "55-64jaar" = "55_64",
                "65-74jaar" = "65_74",
                "75 jaar of ouder" = "75_plus")

lut_age_fr <- c("En dessous de 14 ans." = "14_and_below",
                "14-17 ans." = "14_17",
                "18-24 ans." = "18_24",
                "25-34 ans." = "25_34",
                "35-44 ans." = "35_44",
                "45-54 ans." = "45_54",
                "55-64 ans." = "55_64",
                "65-74 ans." = "65_74",
                "75 ans ou plus." = "75_plus")

survey_df$ec_age_eng <- lut_age_eng[survey_df$ec_age_eng]
survey_df$ec_age_nl <- lut_age_nl[survey_df$ec_age_nl]
survey_df$ec_age_fr <- lut_age_fr[survey_df$ec_age_fr]

# Education question preprocessing
survey_df$ec_edu_eng <- str_replace(survey_df$ec_edu_eng,
                                    pattern = "Bachelor" %R% ANY_CHAR %R% "s",
                                    replacement = "Bachelor")

survey_df$ec_edu_eng <- str_replace(survey_df$ec_edu_eng,
                                    pattern = "Master" %R% ANY_CHAR %R% "s",
                                    replacement = "Masters")

lut_edu_eng <- c("Primary school or no schooling completed" = "Primary_or_None",
                 "High-School" = "High_school",
                 "Trade/technical/vocational training." = "Technical",
                 "Bachelor degree." = "Bachelor",
                 "Masters degree." = "Master",
                 "Doctorate degree." = "PhD")

lut_edu_nl <- c("Lagere school of zonder diploma" = "Primary_or_None",
                "Humaniora" = "High_school",
                "Technische Humaniora" = "Technical",
                "Bachelor" = "Bachelor",
                "Master" = "Master",
                "Doctoraat" = "PhD")

lut_edu_fr <- c("Primaire ou sans diplôme" = "Primary_or_None",
                "Diplôme secondaire" = "High_school",
                "Diplôme secondaire + des formations qualifiantes" = "High_school",
                "Diplôme secondaire technique" = "Technical",
                "Bachelier" = "Bachelor",
                "Master" = "Master",
                "Double master même ;-)" = "Master",
                "Doctorat" = "PhD",
                "Post universirtaire" = "PhD")

survey_df$ec_edu_eng <- lut_edu_eng[survey_df$ec_edu_eng]
survey_df$ec_edu_nl <- lut_edu_nl[survey_df$ec_edu_nl]
survey_df$ec_edu_fr <- lut_edu_fr[survey_df$ec_edu_fr]



# Merging columns from the different languages
all_cols <- names(survey_df)
eng_cols <- all_cols[str_detect(all_cols, pattern = "eng$")]
nl_cols <- all_cols[str_detect(all_cols, pattern = "nl$")]
fr_cols <- all_cols[str_detect(all_cols, pattern = "fr$")]

merged_cols <- str_replace(eng_cols,
                           pattern = "_eng$",
                           replacement = "")

temp_df <- tbl_df(data.frame(matrix(nrow = nrow(survey_df),
                             ncol = length(merged_cols))))
colnames(temp_df) <- merged_cols

survey_df <- tbl_df(cbind(survey_df, temp_df))

survey_df[is.na(survey_df)] <- ""

for (i in 1:length(merged_cols)) {
  m <- merged_cols[i]
  e <- eng_cols[i]
  n <- nl_cols[i]
  f <- fr_cols[i]
  survey_df[[m]] <- paste0(survey_df[[e]], survey_df[[n]], survey_df[[f]])
}

survey_df$lang <- factor(survey_df$lang,
                         levels = c("English", "Dutch", "French"))
survey_df$ec_gender <- factor(survey_df$ec_gender,
                              levels = c("Female", "Male", "Other"))
survey_df$grouping <- factor(survey_df$grouping,
                         levels = c("Active_member", "Citizen_casual_vol", "Citizen_not_active"))


# Split survey_df_e/survey_df_c

survey_df_e <- survey_df %>%
  filter(grouping == "Active_member") %>% 
  select(date, lang, grouping:e_g6, ec_neighb:ec_edu)

temp_coln <- str_replace(names(survey_df_e), "^e_", "")
temp_coln <- str_replace(temp_coln, "^ec_", "")
temp_coln <- str_replace(temp_coln, "lang", "language")
names(survey_df_e) <- temp_coln

survey_df_e$age <- factor(survey_df_e$age, ordered = TRUE,
                              levels = c("14_and_below",
                                         "14_17",
                                         "18_24",
                                         "25_34",
                                         "35_44",
                                         "45_54",
                                         "55_64",
                                         "65_74",
                                         "75_plus"))

survey_df_e$loc <- factor(survey_df_e$loc, ordered = TRUE,
                          levels = c("Brussels",
                                     "Flanders",
                                     "Wallonia"))

survey_df_e$edu <- factor(survey_df_e$edu, ordered = TRUE,
                          levels = c("Primary_or_None",
                                     "High_school",
                                     "Technical",
                                     "Bachelor",
                                     "Master",
                                     "PhD"))

survey_df_e$interest <- factor(survey_df_e$interest, ordered = TRUE,
                               levels = c("Not_interested",
                                          "Could_be_interested",
                                          "Is_ready"))

survey_df_c <- survey_df %>%
  filter(grouping != "Active_member") %>% 
  select(date, lang, grouping, c_interest:ec_edu)

temp_coln <- str_replace(names(survey_df_c), "^c_", "")
temp_coln <- str_replace(temp_coln, "^ec_", "")
temp_coln <- str_replace(temp_coln, "lang", "language")
names(survey_df_c) <- temp_coln

survey_df_c$age <- factor(survey_df_c$age, ordered = TRUE,
                          levels = c("14_and_below",
                                     "14_17",
                                     "18_24",
                                     "25_34",
                                     "35_44",
                                     "45_54",
                                     "55_64",
                                     "65_74",
                                     "75_plus"))

survey_df_c$loc <- factor(survey_df_c$loc, ordered = TRUE,
                          levels = c("Brussels",
                                     "Flanders",
                                     "Wallonia"))

survey_df_c$edu <- factor(survey_df_c$edu, ordered = TRUE,
                          levels = c("Primary_or_None",
                                     "High_school",
                                     "Technical",
                                     "Bachelor",
                                     "Master",
                                     "PhD"))

survey_df_c$interest <- factor(survey_df_c$interest, ordered = TRUE,
                                  levels = c("Not_interested",
                                             "Could_be_interested_Vol",
                                             "Could_be_interested_Fund",
                                             "Is_ready_Vol",
                                             "Is_ready_Fund"))


survey_df_c$grouping <- factor(survey_df_c$grouping, ordered = TRUE,
                               levels = c("Citizen_not_active",
                                          "Citizen_casual_vol"))

convert_to_fact_lik <- function(df, columns) {
  for (column in columns) {
    df[[column]] <- factor(df[[column]],
                           levels = c("Strongly disagree",
                                      "Disagree",
                                      "Neither agree nor disagree",
                                      "Agree",
                                      "Strongly agree"))
  }
  return(df)
}

e_likert_cols <- names(survey_df_e)[5:(length(survey_df_e) - 5)]
c_likert_cols <- names(survey_df_c)[5:(length(survey_df_c) - 6)]

survey_df_e <- convert_to_fact_lik(survey_df_e, e_likert_cols)
survey_df_c <- convert_to_fact_lik(survey_df_c, c_likert_cols)

survey_df_e <- convert_to_fact_lik(survey_df_e, "neighb")
survey_df_c <- convert_to_fact_lik(survey_df_c, "neighb")

survey_df_c$invest <- factor(survey_df_c$invest, ordered = TRUE,
                             levels = c("", "0", "1_99", "100_199",
                                        "200_499", "500_plus"))

survey_df_e_j <- survey_df_e %>% select(date,
                                        language,
                                        grouping,
                                        loc,
                                        gender,
                                        age,
                                        edu) %>% mutate(resp_type = "Entrepreneur")

survey_df_c_j <- survey_df_c %>% select(date,
                                        language,
                                        grouping,
                                        loc,
                                        gender,
                                        age,
                                        edu) %>% mutate(resp_type = "Citizen")

survey_df_j <- survey_df_e_j %>% bind_rows(survey_df_c_j)

survey_df_j$resp_type <- factor(survey_df_j$resp_type,
                                levels = c("Entrepreneur", "Citizen"))

Part I - Citizens

Exploratory Data Analysis

Now that we have imported, cleaned and tidied the data, we can start doing some exploratory analysis. In this document, we will often refer to Exploratory Data Analysis as the acronym EDA.

Note that the respondents were given the possibility to fill the questionnaire in either English, Dutch or French.

The survey was aimed at both social entrepreneurs and citizens of Brussels but each group is associated with a distinct set of hypotheses that we would like to investigate. Therefore, the first question in the survey was discriminating so that the rest of the questionnaire was split into two branches.

The discriminating question is shown in the table below along with the possible choices and the related categories in the R code.

x <- list(q = names(lut_group_eng)[c(2, 3, 1)], v = levels(survey_df$grouping)) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[3], "Category in R code")
x %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
Which of the following groups do you best identify yourself with? Category in R code
Active member of a local citizen’s initiative, social project or social organization. Active_member
Citizen with casual volonteering for a local citizen’s initiative, social project or social organization. Citizen_casual_vol
Citizen with no active implication in a local citizen’s initiative, social project or social organization. Citizen_not_active

Let’s have a look at the balance between each group:

survey_df %>%
  count(grouping) %>%
  rename(grouping_count = n) %>%
  mutate(grouping = reorder(grouping, grouping_count)) %>%
  ggplot(aes(x = grouping, y = grouping_count)) +
  geom_col(fill = "#1C9099", alpha = 0.8) +
  theme_ipsum_rc(grid = "X") +
  scale_fill_ipsum() +
  coord_flip() +
  labs(title = "Distribution of responses to discriminating question", x = "") +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

options(scipen = 999, digits = 2)

survey_df %>% 
  count(grouping) %>% 
  mutate(total = sum(n),
         percent = n / total) %>% 
  select(grouping, n, percent) %>% 
  arrange(desc(n)) %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
grouping n percent
Citizen_not_active 92 0.48
Citizen_casual_vol 60 0.31
Active_member 39 0.20

Then we split the respondents into two group: first the citizens, including those with no active implication in a local citizen’s initiative as well as those with casual volonteering, and second, the entrepreneurs (the active members of a local citizen’s initiative).

Control variables

In the survey, we asked several questions that can be referred to as control variables, such as gender, education etc.

The next few plots will decribe our sample in terms of those variables.

First, we can create a bar plot showing the count of respondents, broken down into citizens/entrepreneurs, gender and language:

my_cols <- brewer.pal(3, "Paired")

survey_df_j %>% 
  ggplot(aes(x = gender, fill = language)) +
  geom_bar(alpha = 0.8) +
  facet_wrap(~ resp_type) +
  labs(title = "Composition of respondents") +
  theme_ipsum_rc() +
  scale_fill_manual(values = my_cols) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

The corresponding table with the values is given here:

addmargins(table(survey_df_e$language, survey_df_e$gender) %>%
             cbind(table(survey_df_c$language, survey_df_c$gender))) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14) %>%
  column_spec(8, bold = T) %>% 
  row_spec(4, bold = T) %>%
  add_header_above(c(" ", "Entrepreneur" = 3, "Citizen" = 3, " " = 1))
Entrepreneur
Citizen
Female Male Other Female Male Other Sum
English 2 1 0 5 12 0 20
Dutch 0 2 0 6 5 0 13
French 11 22 1 68 55 1 158
Sum 13 25 1 79 72 1 191

As we can see, the sample size for the entrepreneurs is 39 observations, and the sample size for the citizens is 152 observations.

Let’s now break the count of respondents by location and education.

my_cols <- brewer.pal(9, "BuPu")

survey_df_j %>% 
  ggplot(aes(x = loc, fill = edu)) +
  geom_bar(alpha = 0.8) +
  facet_wrap(~ resp_type) +
  labs(title = "Composition of respondents") +
  theme_ipsum_rc() +
  scale_fill_manual(values = my_cols[2:7]) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

The corresponding table with the values is given here:

addmargins(table(survey_df_e$edu, survey_df_e$loc) %>%
             cbind(table(survey_df_c$edu, survey_df_c$loc))) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14) %>%
  column_spec(8, bold = T) %>% 
  row_spec(7, bold = T) %>%
  add_header_above(c(" ", "Entrepreneur" = 3, "Citizen" = 3, " " = 1))
Entrepreneur
Citizen
Brussels Flanders Wallonia Brussels Flanders Wallonia Sum
Primary_or_None 1 0 0 0 0 1 2
High_school 2 0 0 7 0 0 9
Technical 0 0 0 2 1 0 3
Bachelor 4 1 0 8 9 3 25
Master 22 4 3 82 15 19 145
PhD 2 0 0 4 0 1 7
Sum 31 5 3 103 25 24 191

We can observe that there is a disproportionate share of respondents with a masters degree among both entrepreneurs and citizens. Therefore, the samples are very likely to be biased.

Finally, we can break the count of respondents by location and age:

my_cols <- brewer.pal(9, "YlGnBu")

survey_df_j %>% 
  ggplot(aes(x = loc, fill = age)) +
  geom_bar(alpha = 0.8) +
  facet_wrap(~ resp_type) +
  labs(title = "Composition of respondents") +
  theme_ipsum_rc() +
  scale_fill_manual(values = my_cols[3:9]) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

The corresponding table with the values is given here:

addmargins(table(survey_df_e$age, survey_df_e$loc) %>%
             cbind(table(survey_df_c$age, survey_df_c$loc))) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14) %>%
  column_spec(8, bold = T) %>% 
  row_spec(10, bold = T) %>%
  add_header_above(c(" ", "Entrepreneur" = 3, "Citizen" = 3, " " = 1))
Entrepreneur
Citizen
Brussels Flanders Wallonia Brussels Flanders Wallonia Sum
14_and_below 0 0 0 0 0 0 0
14_17 0 0 0 0 0 0 0
18_24 3 0 0 11 1 3 18
25_34 13 0 1 57 10 15 96
35_44 4 1 1 11 1 2 20
45_54 6 1 0 17 7 2 33
55_64 3 3 1 5 5 2 19
65_74 1 0 0 1 1 0 3
75_plus 1 0 0 1 0 0 2
Sum 31 5 3 103 25 24 191

Dependent and independent variables

Our population of interest in this chapter on citizens is the whole population in the Brussels-Capital Region (therefore including all its 19 municipalities).

Concerning the citizens, the objective of this survey is to test two hypotheses that were derived from our qualitative research:

  1. H3: The wish for more participation in debates on public issues is positively correlated with the interest to support citizens initiatives.
  2. H4: A positive opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms is positively correlated with the interest to support citizens initiatives.

The interest to support citizens initiatives is our dependent variable (this is the outcome that BrusselsTogether would like to better understand) and is measured by the question shown in the table below. Also shown in this table is the related categories in the R code.

x <- list(q = names(lut_c_int_eng)[c(1, 3, 2, 5, 4)], v = levels(survey_df_c$interest)) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[16], "Category in R code")
x %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
Please indicate your level of interest for citizen’s initiatives: Category in R code
I am not interested in citizen’s initiatives. Not_interested
I could be interested to volunteer for a citizen’s initiative. Could_be_interested_Vol
I could be interested to fund a citizen’s initiative. Could_be_interested_Fund
I am ready to volunteer for a citizen’s initiative. Is_ready_Vol
I am ready to fund a citizen’s initiative. Is_ready_Fund

The answers are distributed as shown in the figure below:

survey_df_c %>%
  count(interest) %>%
  rename(interest_count = n) %>%
  mutate(interest = reorder(interest, interest_count)) %>%
  ggplot(aes(x = interest, y = interest_count)) +
  geom_col(fill = "#2C7FB8", alpha = 0.8) +
  theme_ipsum_rc(grid = "X") +
  scale_fill_ipsum() +
  coord_flip() +
  labs(title = "Distribution of responses to interest question", x = "") +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

options(scipen = 999, digits = 2)

survey_df_c %>% 
  count(interest) %>% 
  mutate(total = sum(n),
         percent = n / total) %>% 
  select(interest, n, percent) %>% 
  arrange(desc(n)) %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
interest n percent
Could_be_interested_Vol 48 0.32
Could_be_interested_Fund 46 0.30
Is_ready_Vol 38 0.25
Is_ready_Fund 13 0.09
Not_interested 7 0.05

The independent variables (the wish for more participation in debates on public issues for H3 and the opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms for H4) are what we think could be associated with the dependent variable (although we could not make causal inference from an observational study like this one). If the correlations are verified, then we could provide BrusselsTogether with actionnable recommendations, such as making the platform more participative or playing a role of intermediary between citizens initiatives and the local authorities.

For each of the two independent variables, we asked a set of questions which, in our opinion, should reflect their corresponding independent variable. In this case, we could say that the independent variables are latent variables that are each measured by a set of several items. We will dig more into the idea of latent variables in the chapter Confirmatory Factor Analysis.

The table below displays the items and their related variable name in the R code. The items related to the need for more participation in debates on public issues are p1 to p5 and the items related to opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms are g1 to g6.

Additionnaly, we asked the respondents whether their interest in contributing to the future of their neighborhood had increased in the past several years (variable name neighb).

All of those items were measured on a 5-points Likert scale, from Strongly_disagree to Strongly_agree.

meta <- data_frame(question = header_lookup$google_f_header[c(17:27, 29)],
                   variable_name = names(survey_df_c)[c(5:15, 17)])

meta %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
question variable_name
Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project. p1
I would like to participate in debates on local public issues. p2
It is relatively easy for citizens to find places where local public issues are collectively debated. p3
I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project. p4
Face-to-face interactions are necessary for collective action. p5
Civic crowdfunding represents a direct threat to public funding of services. g1
I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. g2
The local government is able to identify the projects that are most wanted by citizens. g3
Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. g4
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. g5
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. g6
My interest in contributing to the future of my neighborhood has increased in the past several years. neighb

The answers are distributed as shown in the three plots below:

lik_plot <- function(t, my_df_names, mylevels, factor_levels, p_title) {
  tab <- rownames_to_column(data.frame(unclass(t)), "rownames_col")
  names(tab) <- my_df_names
  tab$interest <- factor(tab$interest, ordered = T, levels = factor_levels)
  
  numlevels <- length(tab[1,]) - 1
  numcenter <- ceiling(numlevels / 2) + 1
  tab$midvalues <- tab[,numcenter] / 2
  tab2 <- cbind(tab[,1],
                tab[,2:ceiling(numlevels / 2)],
                tab$midvalues,
                tab$midvalues,
                tab[,numcenter:numlevels + 1])
  
  colnames(tab2) <- c("outcome",
                      mylevels[1:floor(numlevels / 2)],
                      "midlow",
                      "midhigh",
                      mylevels[numcenter:numlevels])
  
  numlevels <- length(mylevels) + 1
  point1 <- 2
  point2 <- ((numlevels) / 2) + 1
  point3 <- point2 + 1
  point4 <- numlevels + 1
  mymin <- (ceiling(max(rowSums(tab2[,point1:point2])) * 4) / 4) * -100
  mymax <- (ceiling(max(rowSums(tab2[,point3:point4])) * 4) / 4) * 100
  
  numlevels <- length(tab[1,]) - 1
  temp.rows <- length(tab2[,1])
  pal <- brewer.pal((numlevels - 1),"BrBG")
  pal[ceiling(numlevels / 2)] <- "#DFDFDF"
  legend.pal <- pal
  pal <- c(pal[1:(ceiling(numlevels / 2) - 1)],
           pal[ceiling(numlevels / 2)],
           pal[ceiling(numlevels / 2)],
           pal[(ceiling(numlevels / 2) + 1):(numlevels - 1)])
  
  tab3 <- melt(tab2, id = "outcome")
  tab3$col <- rep(pal, each = temp.rows)
  tab3$value <- tab3$value * 100
  tab3$outcome <- str_wrap(tab3$outcome, width = 70)
  tab3$outcome <- factor(tab3$outcome, ordered = T,
                         levels = str_wrap(factor_levels, width = 70))
  highs <- na.omit(tab3[(length(tab3[,1]) / 2) + 1:length(tab3[,1]),])
  lows <- na.omit(tab3[1:(length(tab3[,1]) / 2),])
  lows <- lows[rev(rownames(lows)),]
  
  highs$col <- factor(highs$col, ordered = T, levels = c("#018571",
                                                         "#80CDC1",
                                                         "#DFDFDF"))
  lows$col <- factor(lows$col, ordered = T, levels = c("#A6611A",
                                                       "#DFC27D",
                                                       "#DFDFDF"))
  
  highs$outcome <- factor(highs$outcome, ordered = T,
                          levels = rev(levels(highs$outcome)))
  lows$outcome <- factor(lows$outcome, ordered = T,
                         levels = rev(levels(lows$outcome)))
  # lows$outcome
  # highs$outcome <- factor(highs$outcome, ordered = T,
  #                        levels = tab2$outcome[order(-(tab2[,5] + tab2[,6] + tab2[,7]))])
  
  ggplot() +
    geom_bar(data = highs, alpha = 0.8, aes(x = outcome, y = value, fill = col), position = "stack", stat = "identity") +
    geom_bar(data = lows, alpha = 0.8, aes(x = outcome, y = -value, fill = col), position = "stack", stat = "identity") +
    geom_hline(yintercept = 0, color = c("white")) +
    scale_fill_identity("Percent", labels = mylevels, breaks = legend.pal, guide = "legend") + 
    theme_fivethirtyeight() +
    coord_flip() +
    theme_ipsum_rc() +
    labs(title = p_title, y = "",x = "") +
    theme(plot.title = element_text(size = 14, hjust = 0.5)) +
    theme(axis.text.y = element_text(hjust = 0)) +
    theme(legend.position = "bottom") +
    scale_y_continuous(breaks = seq(-100,100,25), limits = c(-100,100), labels = function(x) paste0(x, "%")) +
    theme(plot.margin = unit(c(1,1,1,0), "cm")) +
    theme(
      # panel.grid.major.x = element_blank(),
      # panel.grid.minor.x = element_blank(),
      panel.grid.major.y = element_blank(),
      panel.grid.minor.y = element_blank()
      )
}
questions_c <- header_lookup$google_f_header[17:27]
quest_code_c <- names(survey_df_c)[5:15]
survey_df_c_df <- as.data.frame(survey_df_c)
names(survey_df_c_df)[5:15] <- questions_c



my_df_names_c_p <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                     "Agree", "Strongly agree")
mylevels_c_p <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
                  "Agree", "Strongly agree")
factor_levels_c_p <- c("Personally funding a citizen's initiative is enough to satisfy my desire to engage in this project.",
                       "I would like to participate in debates on local public issues.",
                       "It is relatively easy for citizens to find places where local public issues are collectively debated.",
                       "I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project.",
                       "Face-to-face interactions are necessary for collective action.")

temp_c_p <- survey_df_c_df[,5:9]

temp_c_p_gathered <- temp_c_p %>% 
  gather(key = "question", value = "answer")

temp_c_p_gathered$answer <- factor(temp_c_p_gathered$answer, ordered = T,
                                   levels = mylevels_c_p)

t_c_p <- prop.table(table(temp_c_p_gathered$question,
                          temp_c_p_gathered$answer), 1)
c_p_title <- "\n\nCitizens - Participative questions\n"

lik_plot(t_c_p, my_df_names_c_p, mylevels_c_p, factor_levels_c_p, c_p_title)

We can observe from this plot that the responses to some of the questions are polarized. Overall, it seems that most of the respondents disagree with the statement: It is relatively easy for citizens to find places where local public issues are collectively debated.

my_df_names_c_g <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels_c_g <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels_c_g <- c("Civic crowdfunding represents a direct threat to public funding of services.",
                       "I regard crowdfunded citizen's initiatives as a sort of _Do-It-Yourself government_.",
                       "The local government is able to identify the projects that are most wanted by citizens.",
                       "Civic crowdfunding platforms could play a role of intermediary between citizen's initiatives and the local government.",
                       "General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by the local government.",
                       "General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by non-profit organizations subsidized by the local government.")

temp_c_g <- survey_df_c_df[,10:15]

temp_c_g_gathered <- temp_c_g %>% 
  gather(key = "question", value = "answer")

temp_c_g_gathered$answer <- factor(temp_c_g_gathered$answer, ordered = T,
                                   levels = mylevels_c_g)

t_c_g <- prop.table(table(temp_c_g_gathered$question,
                          temp_c_g_gathered$answer), 1)
c_g_title <- "Citizens - Government questions\n"

lik_plot(t_c_g, my_df_names_c_g, mylevels_c_g, factor_levels_c_g, c_g_title)

Here again, we can observe from this plot that the responses to some of the questions are quite polarized:

  • Overall, it seems that most of the respondents disagree with the statement: Civic crowdfunding represents a direct threat to public funding of services;
  • Quite interestingly for the purposes of our research, it seems that most of the respondents agree with the statement: Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government.
questions_neighb <- header_lookup$google_f_header[29]
quest_code_neighb <- names(survey_df_c)[17]
names(survey_df_c_df)[17] <- questions_neighb

my_df_names_c_neighb <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                          "Agree", "Strongly agree")
mylevels_c_neighb <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
                       "Agree", "Strongly agree")
factor_levels_c_neighb <- c("My interest in contributing to the future of my neighborhood has increased in the past several years.")

temp_c_neighb <- as.data.frame(survey_df_c_df[,17])
names(temp_c_neighb) <- "My interest in contributing to the future of my neighborhood has increased in the past several years."

temp_c_neighb_gathered <- temp_c_neighb %>% 
  gather(key = "question", value = "answer")

temp_c_neighb_gathered$answer <- factor(temp_c_neighb_gathered$answer,
                                        ordered = T,
                                        levels = mylevels_c_neighb)

t_c_neighb <- prop.table(table(temp_c_neighb_gathered$question,
                          temp_c_neighb_gathered$answer), 1)
c_neighb_title <- "Citizens - Interest in future of neighb. question\n"

lik_plot(t_c_neighb, my_df_names_c_neighb, mylevels_c_neighb, factor_levels_c_neighb, c_neighb_title)

Finally, we asked how much the respondents were already investing in citizens initiatives. This could be regarded as a secondary dependent variable capturing the interest in citizens initiatives.

Note that this feature was added a bit later after having launched the online survey, so the first few observations do not contain information for this question.

This question, along with its corresponding choices and categories in the R code, is shown in the table below.

x <- list(q = names(lut_c_invest_eng),
     v = levels(survey_df_c$invest)[2:6]) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[28], "Category in R code")
x %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
I already invest in social projects: Category in R code
0€ 0
1 - 99€ /year 1_99
100 - 199€ /year 100_199
200 - 499€ /year 200_499
500€ + /year 500_plus

The answers are distributed as shown in the figure below:

survey_df_c %>%
  filter(invest != "") %>% 
  count(invest) %>%
  rename(invest_count = n) %>%
  mutate(invest = reorder(invest, invest_count)) %>%
  ggplot(aes(x = invest, y = invest_count)) +
  geom_col(fill = "#31A354", alpha = 0.8) +
  theme_ipsum_rc(grid = "X") +
  scale_fill_ipsum() +
  coord_flip() +
  labs(title = "Distribution of responses to past investment question", x = "") +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

options(scipen = 999, digits = 2)

survey_df_c %>% 
  filter(invest != "") %>% 
  count(invest) %>% 
  mutate(total = sum(n),
         percent = n / total) %>% 
  select(invest, n, percent) %>% 
  arrange(desc(n)) %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
invest n percent
0 73 0.57
1_99 43 0.33
100_199 7 0.05
500_plus 5 0.04
200_499 1 0.01

Distribution of answers for independent variables broken down by interest

In order to have a rough sense of the relationship between, on the one hand, the answers to the indenpendent variables questions and the control variable questions, and on the other hand, the answers to the dependent variables questions, in the next few sections we are going to break down the previous vizualisations by category of answer to the dependent variables questions.

Note that we will not comment these plots here, as we will attempt to quantify these relationships later in the chapter Logistic Regression Modeling.

First, let us break down the distribution of answers related to the independent variables by interest in citizens initiatives.

While examining the plots below, we should keep in mind that, as shown previously by the distribution of the interest, the proportion of people ‘ready to fund’ and ‘not interested’ is only 9% and 5% of the sample, respectively.

Here below is a series of plots related to the ‘participative’ questions.

my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels <- c("Not_interested",
                   "Could_be_interested_Vol",
                   "Could_be_interested_Fund",
                   "Is_ready_Vol",
                   "Is_ready_Fund")

t_p1 <- prop.table(table(survey_df_c$interest, survey_df_c$p1), 1)
p1_title <- "\nPersonally funding a citizen's initiative is enough to\nsatisfy my desire to engage in this project.\n"

lik_plot(t_p1, my_df_names, mylevels, factor_levels, p1_title)

t_p2 <- prop.table(table(survey_df_c$interest, survey_df_c$p2), 1)
p2_title <- "\n\nI would like to participate in debates on local public issues.\n"

lik_plot(t_p2, my_df_names, mylevels, factor_levels, p2_title)

t_p3 <- prop.table(table(survey_df_c$interest, survey_df_c$p3), 1)
p3_title <- "\nIt is relatively easy for citizens to find places where\nlocal public issues are collectively debated.\n"

lik_plot(t_p3, my_df_names, mylevels, factor_levels, p3_title)

t_p4 <- prop.table(table(survey_df_c$interest, survey_df_c$p4), 1)
p4_title <- "I would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"

lik_plot(t_p4, my_df_names, mylevels, factor_levels, p4_title)

t_p5 <- prop.table(table(survey_df_c$interest, survey_df_c$p5), 1)
p5_title <- "\n\nFace-to-face interactions are necessary for collective action.\n"

lik_plot(t_p5, my_df_names, mylevels, factor_levels, p5_title)

Here below is a series of plots related to the ‘government’ questions.

my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels <- c("Not_interested",
                   "Could_be_interested_Vol",
                   "Could_be_interested_Fund",
                   "Is_ready_Vol",
                   "Is_ready_Fund")

t_g1 <- prop.table(table(survey_df_c$interest, survey_df_c$g1), 1)
g1_title <- "\n\nCivic crowdfunding represents a direct threat to public funding of services.\n"

lik_plot(t_g1, my_df_names, mylevels, factor_levels, g1_title)

t_g2 <- prop.table(table(survey_df_c$interest, survey_df_c$g2), 1)
g2_title <- "\nI regard crowdfunded citizen's initiatives as\na sort of _Do-It-Yourself government_.\n"

lik_plot(t_g2, my_df_names, mylevels, factor_levels, g2_title)

t_g3 <- prop.table(table(survey_df_c$interest, survey_df_c$g3), 1)
g3_title <- "\nThe local government is able to identify\nthe projects that are most wanted by citizens.\n"

lik_plot(t_g3, my_df_names, mylevels, factor_levels, g3_title)

t_g4 <- prop.table(table(survey_df_c$interest, survey_df_c$g4), 1)
g4_title <- "\nCivic crowdfunding platforms could play a role of\nintermediary between citizen's initiatives and the local government.\n"

lik_plot(t_g4, my_df_names, mylevels, factor_levels, g4_title)

t_g5 <- prop.table(table(survey_df_c$interest, survey_df_c$g5), 1)
g5_title <- "\nGeneral interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by the local government.\n"

lik_plot(t_g5, my_df_names, mylevels, factor_levels, g5_title)

t_g6 <- prop.table(table(survey_df_c$interest, survey_df_c$g6), 1)
g6_title <- "General interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by\nnon-profit organizations subsidized by the local government.\n"

lik_plot(t_g6, my_df_names, mylevels, factor_levels, g6_title)

Here below is the plot related to the ‘interest in the future of my neighborhood’ question.

t_neighb <- prop.table(table(survey_df_c$interest, survey_df_c$neighb), 1)
neighb_title <- "\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"

lik_plot(t_neighb, my_df_names, mylevels, factor_levels, neighb_title)

Distribution of answers for independent variables broken down by past investment

Just like in the section above, while examining the following plots, we should keep in mind that, as shown previously by the distribution of ‘past investment’, the proportions are highly imbalanced, since people investing 100_199, 200_499 and 500_plus represent only 5%, 1% and 4% of the sample, respectively.

Here below is a series of plots related to the ‘participative’ questions.

my_df_names_c_inv <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels_c_inv <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels_c_inv <- c("0",
                         "1_99",
                         "100_199",
                         "200_499",
                         "500_plus")

survey_df_c_inv <- survey_df_c %>% 
  filter(invest != "")

t_p1 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p1), 1)
p1_title <- "\nPersonally funding a citizen's initiative is enough to\nsatisfy my desire to engage in this project.\n"

lik_plot(t_p1, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p1_title)

t_p2 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p2), 1)
p2_title <- "\n\nI would like to participate in debates on local public issues.\n"

lik_plot(t_p2, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p2_title)

t_p3 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p3), 1)
p3_title <- "\nIt is relatively easy for citizens to find places where\nlocal public issues are collectively debated.\n"

lik_plot(t_p3, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p3_title)

t_p4 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p4), 1)
p4_title <- "I would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"

lik_plot(t_p4, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p4_title)

t_p5 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p5), 1)
p5_title <- "\nFace-to-face interactions are necessary for collective action.\n"

lik_plot(t_p1, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p1_title)

Here below is a series of plots related to the ‘government’ questions.

t_g1 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g1), 1)
g1_title <- "\n\nCivic crowdfunding represents a direct threat to public funding of services.\n"

lik_plot(t_g1, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, g1_title)

t_g2 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g2), 1)
g2_title <- "\nI regard crowdfunded citizen's initiatives as\na sort of _Do-It-Yourself government_.\n"

lik_plot(t_g2, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, g2_title)

t_g3 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g3), 1)
g3_title <- "\nThe local government is able to identify\nthe projects that are most wanted by citizens.\n"

lik_plot(t_g3, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, g3_title)

t_g4 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g4), 1)
g4_title <- "\nCivic crowdfunding platforms could play a role of\nintermediary between citizen's initiatives and the local government.\n"

lik_plot(t_g4, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, g4_title)

t_g5 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g5), 1)
g5_title <- "\nGeneral interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by the local government.\n"

lik_plot(t_p1, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p1_title)

t_g6 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$g6), 1)
g6_title <- "General interest issues are addressed more efficiently\nby crowdfunded citizen's initiatives than by\nnon-profit organizations subsidized by the local government.\n"

lik_plot(t_g6, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, g6_title)

Here below is the plot related to the ‘interest in the future of my neighborhood’ question.

t_neighb <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$neighb), 1)
neighb_title <- "\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"

lik_plot(t_neighb, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, neighb_title)

Distribution of answers for independent variables broken down by education

Here below is a plot showing the education versus answers to the independent variables questions, broken down by the independent variables questions.

survey_df_c <- survey_df_c %>% 
  mutate(edu_num = unclass(survey_df_c$edu))

tidy_survey_df_c <- survey_df_c %>% 
  select(edu_num, p1:g6, neighb) %>% 
  data.matrix() %>% 
  as_data_frame() %>% 
  gather(key, value, -edu_num)

tidy_survey_df_c %>%
  group_by(key, value) %>%
  summarize(edu_num = mean(edu_num, na.rm = TRUE)) %>%
  ggplot(aes(value, edu_num, color = key)) +
  geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
  geom_point() +
  labs(title = "Relationship btw education and answers to likert scale questions",
       x = "Answer to likert scale question", y = "Mean education") +
  facet_wrap(~factor(key, ordered = T,
                     levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
                                "g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
  theme_ipsum_rc() +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(legend.position = "none") +
  theme(plot.margin = unit(c(1,1,1,0), "cm")) +
  theme(
    panel.grid.minor.y = element_blank()
  )

For this plot, the education is coded as an integer value as shown in the following table (see the column edu_num). This table also shows the count for each level (under the column n).

options(scipen = 999, digits = 3)

survey_df_c %>% 
  count(edu, edu_num) %>%
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
edu edu_num n
Primary_or_None 1 1
High_school 2 7
Technical 3 3
Bachelor 4 20
Master 5 116
PhD 6 5

Here below is a reminder of the questions and their variable names.

meta <- data_frame(question = header_lookup$google_f_header[c(17:27, 29)],
                   variable_name = names(survey_df_c)[c(5:15, 17)])

meta %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
question variable_name
Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project. p1
I would like to participate in debates on local public issues. p2
It is relatively easy for citizens to find places where local public issues are collectively debated. p3
I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project. p4
Face-to-face interactions are necessary for collective action. p5
Civic crowdfunding represents a direct threat to public funding of services. g1
I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. g2
The local government is able to identify the projects that are most wanted by citizens. g3
Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. g4
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. g5
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. g6
My interest in contributing to the future of my neighborhood has increased in the past several years. neighb

We can observe from this plot that there is a possible positive relationship between education and the extent to which respondents agreed with the statement: Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project, although this should be confirmed/quantified by the statistical modeling that we will perform in the next chapter.

Distribution of answers for independent variables broken down by age

Here below is a plot showing the mean age versus answers to the independent variables questions, broken down by the independent variables questions.

survey_df_c <- survey_df_c %>% 
  mutate(age_num = ifelse(age == "14_and_below", 12,
                          ifelse(age == "14_17", 16,
                                 ifelse(age == "18_24", 20,
                                        ifelse(age == "25_34", 30,
                                               ifelse(age == "35_44", 40,
                                                      ifelse(age == "45_54", 50,
                                                             ifelse(age == "55_64", 60,
                                                                    ifelse(age == "65_74", 70, 80)))))))))

tidy_survey_df_c <- survey_df_c %>% 
  select(age_num, p1:g6, neighb) %>% 
  data.matrix() %>% 
  as_data_frame() %>% 
  gather(key, value, -age_num)

tidy_survey_df_c %>%
  group_by(key, value) %>%
  summarize(age_num = mean(age_num, na.rm = TRUE)) %>%
  ggplot(aes(value, age_num, color = key)) +
  geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
  geom_point() +
  labs(title = "Relationship btw age and answers to likert scale questions",
       x = "Answer to likert scale question", y = "Mean age") +
facet_wrap(~factor(key, ordered = T,
                   levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
                              "g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
  theme_ipsum_rc() +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(legend.position = "none") +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

For this plot, the age group is coded as an integer value as shown in the following table (see the column age_num). This table also shows the count for each level (under the column n).

options(scipen = 999, digits = 3)

survey_df_c %>% 
  count(age, age_num) %>%
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
age age_num n
18_24 20 15
25_34 30 82
35_44 40 14
45_54 50 26
55_64 60 12
65_74 70 2
75_plus 80 1

Similar to the previous plot, we can observe from this plot that there is a possible positive relationship between age and the extent to which the participant agreed with the statement: Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project, although this should be confirmed/quantified with the statistical modeling that we will perform in the next chapter.

Logistic Regression modeling

Introduction

In this chapter, we are going to fit a series of multiple binary logistic regression models (which are the most basic model used to predict a categorical response, such as our two dependent variables ‘interest’ and ‘past investment’). This is useful because these models will allow us to see which predictors (in our case the independent variables and control variables) are associated with the likelihood of people to be interested in supporting citizens initiatives.

This kind of models can only predict a target variable which has two classes, thus the term binary. However, our dependent variables have more than two classes (for example, there are five classes for the ‘interest’ variable, from ‘Not interested’ up to ‘Ready to fund’). Therefore, we will fit several models one after the other after having split our dependent variable classes into two groups in several ways. This is called the ‘one versus rest’ approach. There are of course more complex models that can handle more than two classes in the target variable, but we will keep things simple here.

Note that designing a survey with ordinal categorical variables (i.e. categorical variables which have a natural order) such as questions with choices on a 5-point Likert scale or age or education makes it easier to interpret the results of the models, as we will see next.

Logistic regression with Interest as the target variable

Model 1

In this section, we will fit a model with ‘Interest’ (in supporting citizens initiatives) as the target variable and all of the independent variables, as well as the control variables, as the explanatory variables.

In this model, we will split the Interest variable into the two following groups:

  • Class 0: Not_interested;
  • Class 1: The rest of the possible classes combined (Could_be_interested_Vol, Could_be_interested_Fund, Is_ready_Vol and Is_ready_Fund).

Also, we filter out observations with gender other than Female and Male. Then, Female are represented by the integer value 1 and Male are represented by the integer value 2 (there is no particular reason for this, we could do it the other way around).

Answers to Likert scale questions range from 1 (Strongly_disagree) to 5 (Strongly agree).

Answers to the question about the age of the respondent range from 1 (Under_14) to 9 (75_plus).

Answers to the education question range from 1 (Primary_or_None) to 6 (PhD).

Finally, answers to the location and language questions have three levels:

  • Brussels = 1, Flanders = 2, Wallonia = 3;
  • English = 1, Dutch = 2, French = 3.

Thus, the first few rows of the input to the model look like this:

survey_df_c_mat <- survey_df_c %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  mutate(interest_binary = ifelse(interest == "Not_interested",
                                  0,
                                  1)) %>% 
  select(interest_binary, language, p1:edu, -invest) %>% 
  data.matrix() %>% 
  as_data_frame()

survey_df_c_mat %>% 
  head() %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
interest_binary language p1 p2 p3 p4 p5 g1 g2 g3 g4 g5 g6 neighb loc gender age edu
1 3 2 2 2 4 4 2 3 2 4 2 2 4 1 1 4 5
1 1 3 3 2 2 4 1 4 2 4 4 4 4 1 2 6 6
1 3 2 4 2 3 5 4 4 2 5 3 3 5 1 2 5 4
1 3 3 3 3 3 4 2 3 3 4 4 4 3 2 2 8 3
1 3 3 3 2 4 1 4 3 2 3 3 4 4 1 2 6 5
1 2 4 5 4 4 4 2 5 2 4 4 2 4 1 1 4 5

Let us now examine the output of the model:

glm1 <- glm(interest_binary ~ .,  family = "binomial", 
                  data = survey_df_c_mat)

summary(glm1)
## 
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_c_mat)
## 
## Deviance Residuals: 
##         Min           1Q       Median           3Q          Max  
## -0.00013589   0.00000002   0.00000002   0.00000002   0.00011225  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -712.10  260294.49    0.00        1
## language       104.17   25318.23    0.00        1
## p1              -2.82    8753.77    0.00        1
## p2              -6.25   13890.32    0.00        1
## p3             -34.86   21457.88    0.00        1
## p4              34.80   17255.02    0.00        1
## p5             -23.91   12686.23    0.00        1
## g1              11.38   18054.07    0.00        1
## g2              35.47   16975.20    0.00        1
## g3              88.56   23372.16    0.00        1
## g4              87.05   24097.56    0.00        1
## g5              28.90   10846.46    0.00        1
## g6             -32.60   14709.20    0.00        1
## neighb         103.56   20100.27    0.01        1
## loc            -28.77   15039.57    0.00        1
## gender        -134.84   59506.52    0.00        1
## age            -38.48   11978.54    0.00        1
## edu             46.11    9663.82    0.00        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 56.669538353445  on 150  degrees of freedom
## Residual deviance:  0.000000084893  on 133  degrees of freedom
## AIC: 36
## 
## Number of Fisher Scoring iterations: 25

This is a lot of output, but we are going to focus on the Coefficients table in the middle.

In order to be able to interpret anything meaningful, we first have to look at the p-values represented in the last column of the Ceofficients table (Pr(>|z|)). The p-value is used to test for statistical significance. We won’t go into the theory of inference for regression, but simply put, low p-values (below 0.05) generally mean that we can rule out that the effect of the coefficient is due to chance. In other words, we are looking for coefficients for which the p-value is below 0.05.

In the present case, all p-values are about 1, so this model is useless. In fact, the model did not converge. This is probably due to the high imbalance in the target variable. Indeed, as already mentionned, the class Not_interested represents only 5% of the sample or 7 out of 152 observations.

Therefore, we cannot infer anything regarding factors that could be associated with the likelihood of people not to be interested in citizens initiatives as opposed to being somewhat interested (‘Could be interested’) or definetly interested (‘Is ready’).

Model 2

In this second model, we are filtering out the observations where people are not interested in citizens initiatives, therefore we are now focusing on a subset of the data. Then we split the observations into the following two groups, related to the interest variable:

  • Class 0: Could_be_interested_Vol and Could_be_interested_Fund;
  • Class 1: Is_ready_Vol and Is_ready_Fund.

The rest of the model is similar to the model 1.

survey_df_c_mat <- survey_df_c %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  filter(interest != "Not_interested") %>% 
  mutate(interest_binary = ifelse(interest %in% c("Is_ready_Vol",
                                                  "Is_ready_Fund"),
                                  1,
                                  0)) %>% 
  select(interest_binary, language, p1:edu, -invest) %>% 
  data.matrix() %>% 
  as_data_frame()

glm2 <- glm(interest_binary ~ .,  family = "binomial", 
                  data = survey_df_c_mat)

summary(glm2)
## 
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_c_mat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.708  -0.860  -0.505   0.952   2.208  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -2.0048     3.1302   -0.64   0.5219   
## language     -0.1270     0.3240   -0.39   0.6950   
## p1           -0.3370     0.2209   -1.53   0.1272   
## p2            0.7007     0.2650    2.64   0.0082 **
## p3            0.1394     0.2150    0.65   0.5166   
## p4           -0.6186     0.2582   -2.40   0.0166 * 
## p5           -0.2407     0.2119   -1.14   0.2560   
## g1            0.2197     0.2415    0.91   0.3629   
## g2            0.1062     0.2018    0.53   0.5985   
## g3            0.3367     0.2103    1.60   0.1093   
## g4           -0.5030     0.2553   -1.97   0.0488 * 
## g5           -0.1043     0.2613   -0.40   0.6898   
## g6           -0.0468     0.3032   -0.15   0.8774   
## neighb        0.4887     0.2442    2.00   0.0453 * 
## loc          -0.0153     0.2885   -0.05   0.9577   
## gender        0.1171     0.4288    0.27   0.7847   
## age           0.3703     0.1818    2.04   0.0417 * 
## edu           0.0259     0.2613    0.10   0.9211   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 187.20  on 143  degrees of freedom
## Residual deviance: 151.49  on 126  degrees of freedom
## AIC: 187.5
## 
## Number of Fisher Scoring iterations: 5

Now we have several variables for which the p-value of the coefficient is less than 0.05. Let’s extract them into the following table, which is ordered by estimate in descending order:

options(scipen = 999, digits = 3)

glm2 %>% 
  tidy() %>%
  filter(p.value < 0.05) %>%
  arrange(desc(estimate)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
term estimate std.error statistic p.value
p2 0.701 0.265 2.64 0.008
neighb 0.489 0.244 2.00 0.045
age 0.370 0.182 2.04 0.042
g4 -0.503 0.255 -1.97 0.049
p4 -0.619 0.258 -2.40 0.017

We can interpret this table as follows:

  1. Since the effect size (i.e. the estimate of the coefficient) associated with the variable p2 is above zero, then people agreeing with the statement: I would like to participate in debates on local public issues are more likely to be ‘ready’ to fund or volunteer (they are more likely to have answered ‘I am ready to…’ rather than ‘I could be interested to…’), after controlling for the other variables in the model.

Let’s have a look at a plot similar to what we did in the Exploratory Data Analysis (EDA), but without the category Not_interested:

t_p2 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p2[survey_df_c$interest != "Not_interested"]), 1)
p2_title <- "\n\n\nI would like to participate in debates on local public issues.\n"

lik_plot(t_p2, my_df_names, mylevels, factor_levels, p2_title)

This confirms visually the positive relationship identified in point 1.

  1. Similarly, since the effect size associated with the variable neighb is positive, then people agreeing with the statement: My interest in contributing to the future of my neighborhood has increased in the past several years are more likely to be ‘ready’ to fund or volunteer, after controlling for the other variables in the model.

Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested:

t_neighb <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$neighb[survey_df_c$interest != "Not_interested"]), 1)
neighb_title <- "\n\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"

lik_plot(t_neighb, my_df_names, mylevels, factor_levels, neighb_title)

This confirms visually the positive relationship identified in point 2.

  1. Since the effect size associated with the variable age is above zero, then older people are more likely to be ‘ready’ to fund or volunteer, after controlling for the other variables in the model.

Let’s have a look at a bar chart representing the proportions of interest conditionned on the age group:

my_cols <- brewer.pal(9, "YlGnBu")

survey_df_c %>% 
  filter(interest != "Not_interested") %>% 
  ggplot(aes(x = interest, fill = age)) +
  geom_bar(alpha = 0.8, position = "fill") +
  labs(title = "Proportions of interest conditionned on age group", y = "Proportion") +
  theme_ipsum_rc() +
  scale_fill_manual(values = my_cols[3:9]) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

This plot seems to confirm the positive relationship identified in point 3.

  1. Since the effect size associated with the variable g4 is negative, then people agreeing with the statement: Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government are less likely to be ‘ready’ to fund or volunteer (they are more likely to have answered ‘I could be interested to…’), after controlling for the other variables in the model.

Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested:

t_g4 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$g4[survey_df_c$interest != "Not_interested"]), 1)
g4_title <- "\n\nCivic crowdfunding platforms could play a role of\nintermediary between citizen's initiatives and the local government.\n"

lik_plot(t_g4, my_df_names, mylevels, factor_levels, g4_title)

This confirms visually the negative relationship identified in point 4.

  1. Finally, since the effect size associated with the variable p4 is negative, then people agreeing with the statement: I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project are less likely to be ‘ready’ to fund or volunteer (they are more likely to have answered ‘I could be interested to…’), after controlling for the other variables in the model.

Let’s try to confirm this visually:

t_p4 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p4[survey_df_c$interest != "Not_interested"]), 1)
p4_title <- "\nI would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"

lik_plot(t_p4, my_df_names, mylevels, factor_levels, p4_title)

The negative relationship identified in this last point is less clear in this plot.

Model 3

Let’s build a third model in which we are filtering out the observations where people are not interested in citizens initiatives, and where we split the observations into the two following groups, related to the interest variable:

  • Class 0: Could_be_interested_Vol and Is_ready_Vol;
  • Class 1: Could_be_interested_Fund and Is_ready_Fund.

In other words, we are now interested to know what factors are associated with the difference between those willing to volunteer and those willing to fund.

The rest of the model is similar to the model 1.

survey_df_c_mat <- survey_df_c %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  filter(interest != "Not_interested") %>% 
  mutate(interest_binary = ifelse(interest %in% c("Is_ready_Vol",
                                                  "Could_be_interested_Vol"),
                                  0,
                                  1)) %>% 
  select(interest_binary, language, p1:edu, -invest) %>% 
  data.matrix() %>% 
  as_data_frame()

glm3 <- glm(interest_binary ~ .,  family = "binomial", 
                  data = survey_df_c_mat)

summary(glm3)
## 
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_c_mat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.743  -0.869  -0.497   0.993   2.120  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -2.8005     2.9507   -0.95    0.343  
## language     -0.0840     0.3341   -0.25    0.801  
## p1            0.4846     0.2081    2.33    0.020 *
## p2           -0.2998     0.2137   -1.40    0.161  
## p3           -0.2339     0.2104   -1.11    0.266  
## p4            0.5874     0.2482    2.37    0.018 *
## p5           -0.2857     0.1980   -1.44    0.149  
## g1            0.1327     0.2354    0.56    0.573  
## g2           -0.1770     0.1968   -0.90    0.369  
## g3           -0.2764     0.2049   -1.35    0.177  
## g4            0.3355     0.2667    1.26    0.208  
## g5           -0.3865     0.2565   -1.51    0.132  
## g6            0.0634     0.2883    0.22    0.826  
## neighb        0.0558     0.2108    0.26    0.791  
## loc           0.0818     0.2714    0.30    0.763  
## gender        0.2298     0.4192    0.55    0.583  
## age           0.0862     0.1690    0.51    0.610  
## edu           0.2534     0.2742    0.92    0.355  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 194.91  on 143  degrees of freedom
## Residual deviance: 161.57  on 126  degrees of freedom
## AIC: 197.6
## 
## Number of Fisher Scoring iterations: 4

We have two variables for which the p-value of the coefficient is less than 0.05. Let’s extract them into the following table, which is ordered by estimate in descending order:

options(scipen = 999, digits = 3)

glm3 %>% 
  tidy() %>%
  filter(p.value < 0.05) %>%
  arrange(desc(estimate)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
term estimate std.error statistic p.value
p4 0.587 0.248 2.37 0.018
p1 0.485 0.208 2.33 0.020

We can interpret this table as follows:

  1. Since the effect size associated with the variable p4 is positive, then people agreeing with the statement: I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project are more likely to be willing to fund rather than to volunteer, after controlling for the other variables in the model.

Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested and with the class labels for the dependent variable re-ordered (on the y axis):

t_p4 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p4[survey_df_c$interest != "Not_interested"]), 1)
p4_title <- "\nI would be more inclined to contribute financially to\na project on a civic crowdfunding platform if I had the possibility\nto express my disagreement about some aspects of the project.\n"

factor_levels_m3 <- c("Could_be_interested_Vol", "Is_ready_Vol", "Could_be_interested_Fund", "Is_ready_Fund")
lik_plot(t_p4, my_df_names, mylevels, factor_levels_m3, p4_title)

This confirms visually the positive relationship identified in point 1, and it seems to make more sense, visually, than the relationship identified in point 5 when interpreting the model 2.

  1. Similarly, since the effect size associated with the variable p1 is positive, then people agreeing with the statement: Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project are more likely to be willing to fund rather than to volunteer, after controlling for the other variables in the model.

Let’s have a look at a plot similar to what we did in the EDA, but without the category Not_interested and with the class labels for the dependent variable re-ordered:

t_p1 <- prop.table(table(survey_df_c$interest[survey_df_c$interest != "Not_interested"], survey_df_c$p1[survey_df_c$interest != "Not_interested"]), 1)
p1_title <- "\n\nPersonally funding a citizen's initiative is enough to\nsatisfy my desire to engage in this project.\n"

factor_levels_m3 <- c("Could_be_interested_Vol", "Is_ready_Vol", "Could_be_interested_Fund", "Is_ready_Fund")
lik_plot(t_p1, my_df_names, mylevels, factor_levels_m3, p1_title)

This confirms visually the positive relationship identified in point 2.

Logistic regression with past investment as the target variable

Model 4

In our last model, we are going to change our target variable. Instead of modeling the interest in citizens initiatives explained by all of the other variables, we will be modeling the respondents’ past investment in citizens initiatives explained by all of the other variables (independent variables and control variables).

In order for this target to be binary, we will split the observations into the two following groups, related to the invest variable:

  • Class 0: 0, that is, the respondent did not report already investing in citizens initiatives;
  • Class 1: All of the other choices (1_99, 100_199, 200_499, 500_plus).

All of the other settings are the same as for the model 1.

survey_df_c_mat <- survey_df_c %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  filter(invest != "") %>%
  mutate(invest_binary = ifelse(invest == "0", 0, 1)) %>% 
  select(invest_binary, language, p1:edu, -invest) %>% 
  data.matrix() %>% 
  as_data_frame()

glm4 <- glm(invest_binary ~ .,  family = "binomial", 
            data = survey_df_c_mat)

summary(glm4)
## 
## Call:
## glm(formula = invest_binary ~ ., family = "binomial", data = survey_df_c_mat)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.738  -0.971  -0.514   0.950   2.413  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.79136    2.92178    0.27   0.7865   
## language     0.02150    0.34018    0.06   0.9496   
## p1           0.20232    0.21531    0.94   0.3474   
## p2           0.65330    0.24957    2.62   0.0089 **
## p3           0.05238    0.21171    0.25   0.8046   
## p4          -0.42688    0.25583   -1.67   0.0952 . 
## p5           0.03403    0.21480    0.16   0.8741   
## g1          -0.49469    0.27726   -1.78   0.0744 . 
## g2           0.10823    0.20656    0.52   0.6003   
## g3           0.05635    0.20918    0.27   0.7876   
## g4          -0.25723    0.25098   -1.02   0.3054   
## g5           0.08609    0.26612    0.32   0.7463   
## g6          -0.40861    0.31412   -1.30   0.1933   
## neighb       0.35190    0.22921    1.54   0.1247   
## loc         -0.00567    0.27556   -0.02   0.9836   
## gender      -0.51515    0.44109   -1.17   0.2428   
## age         -0.13074    0.18507   -0.71   0.4799   
## edu         -0.01773    0.27137   -0.07   0.9479   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 175.44  on 127  degrees of freedom
## Residual deviance: 146.42  on 110  degrees of freedom
## AIC: 182.4
## 
## Number of Fisher Scoring iterations: 4

We have a single variable for which the p-value of the coefficient is less than 0.05. Let’s extract it into the following table:

options(scipen = 999, digits = 3)

glm4 %>% 
  tidy() %>%
  filter(p.value < 0.05) %>%
  arrange(desc(estimate)) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
term estimate std.error statistic p.value
p2 0.653 0.25 2.62 0.009

We can interpret this result as follows:

  1. Since the effect size associated with the variable p2 is positive, then people agreeing with the statement: I would like to participate in debates on local public issues are more likely to be already investing in citizens initiatives than not investing, after controlling for the other variables in the model.

Let’s re-examine a plot that we already showed in the Exploratory Data Analysis:

t_p2 <- prop.table(table(survey_df_c_inv$invest, survey_df_c_inv$p2), 1)
p2_title <- "\n\nI would like to participate in debates on local public issues.\n"

lik_plot(t_p2, my_df_names_c_inv, mylevels_c_inv,
         factor_levels_c_inv, p2_title)

As a reminder from the EDA, observations where the investment is 0 represent about 57% of the sample.

From this plot we can visually confirm the result from the model 4.

Confirmatory Factor Analysis

Introduction

This chapter is optionnal for the reader, because we are not confident that our results are correct. We intend to update this part of the document in the future as our knowledge about dimensionality reduction techniques, factor analyis and structural equation modeling evolves.

By design, there are questions that are intended to collectively measure one latent variable. For example, there are five questions related to the wish from the citizens for more participation in debates on public issues. Also, there are six questions related to whether a civic crowdfunding platform could be regarded as an intermediary between the citizens initiatives and the local authorities.

Confirmatory Factor Analysis (CFA) allows us to reduce the dimensionality of several items into their corresponding hypothesized latent variable. The resulting latent variable is a continuous numeric variable that captures the variability of its corresponding items.

Let’s try it out.

Preprocessing

As a reminder, here below are the questions related to the two latent variables (p for participative and g for relationship between citizens initiatives and the government):

meta <- data_frame(question = header_lookup$google_f_header[17:27],
                   variable = names(survey_df_c)[5:15])

meta %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
question variable
Personally funding a citizen’s initiative is enough to satisfy my desire to engage in this project. p1
I would like to participate in debates on local public issues. p2
It is relatively easy for citizens to find places where local public issues are collectively debated. p3
I would be more inclined to contribute financially to a project on a civic crowdfunding platform if I had the possibility to express my disagreement about some aspects of the project. p4
Face-to-face interactions are necessary for collective action. p5
Civic crowdfunding represents a direct threat to public funding of services. g1
I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. g2
The local government is able to identify the projects that are most wanted by citizens. g3
Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. g4
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. g5
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. g6

After converting the categorical variables into an integer value, we can have a look at the first ten observations:

survey_df_c_mat <- survey_df_c %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  select(language:edu) %>% 
  data.matrix() %>% 
  as_data_frame()

survey_df_c_mat %>% 
  select(interest:neighb, age, edu, loc, gender) %>% 
  head(n = 10) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
interest p1 p2 p3 p4 p5 g1 g2 g3 g4 g5 g6 invest neighb age edu loc gender
2 2 2 2 4 4 2 3 2 4 2 2 1 4 4 5 1 1
2 3 3 2 2 4 1 4 2 4 4 4 1 4 6 6 1 2
4 2 4 2 3 5 4 4 2 5 3 3 1 5 5 4 1 2
2 3 3 3 3 4 2 3 3 4 4 4 1 3 8 3 2 2
3 3 3 2 4 1 4 3 2 3 3 4 1 4 6 5 1 2
3 4 5 4 4 4 2 5 2 4 4 2 1 4 4 5 1 1
3 2 4 2 5 2 2 2 2 4 2 4 1 2 4 2 1 2
4 3 4 2 3 4 3 2 3 4 2 2 1 4 5 5 1 1
4 4 5 2 4 4 2 1 2 2 2 1 1 5 4 5 1 1
3 4 2 1 4 4 2 2 2 5 3 3 1 2 4 5 1 1

Note that the following variables were not displayed in this table because they didn’t fit into the webpage: language and grouping.

Modeling

Initial model

In order to perform the CFA, we will use the lavaan (latent variable analysis) R package.

First we need to describe the model that we want to fit. We will use the variable names partcip_will and civic_crowd_gov to refer to the wish from the citizens for more participation in debates on public issues and whether a civic crowdfunding platform could be regarded as an intermediary between the citizens initiatives and the local authorities, respectively.

model <- "
partcip_will =~ p2 + p1 + p3 + p4 + p5
civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
"

cat(model)
## 
## partcip_will =~ p2 + p1 + p3 + p4 + p5
## civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6

Next, we can fit this model using the cfa() function and look at the summary statistics.

fit1 <- cfa(model = model, data = survey_df_c_mat)

summary(fit1, standardized = TRUE)
## lavaan (0.5-23.1097) converged normally after  93 iterations
## 
##   Number of observations                           151
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               60.181
##   Degrees of freedom                                43
##   P-value (Chi-square)                           0.043
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                      Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   partcip_will =~                                                         
##     p2                  1.000                               0.270    0.247
##     p1                  0.294    0.284    1.035    0.300    0.080    0.076
##     p3                 -0.322    0.267   -1.205    0.228   -0.087   -0.090
##     p4                  4.006    4.250    0.943    0.346    1.084    1.144
##     p5                  0.180    0.290    0.620    0.535    0.049    0.044
##   civic_crowd_gov =~                                                      
##     g4                  1.000                               0.106    0.127
##     g1                  0.408    0.840    0.486    0.627    0.043    0.050
##     g2                  1.901    1.730    1.099    0.272    0.202    0.179
##     g3                 -1.867    1.647   -1.134    0.257   -0.198   -0.198
##     g5                  5.622    4.254    1.322    0.186    0.596    0.635
##     g6                  6.347    4.874    1.302    0.193    0.673    0.802
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   partcip_will ~~                                                       
##     civic_crowd_gv    0.008    0.011    0.741    0.459    0.288    0.288
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .p2                1.126    0.149    7.573    0.000    1.126    0.939
##    .p1                1.078    0.124    8.688    0.000    1.078    0.994
##    .p3                0.921    0.106    8.681    0.000    0.921    0.992
##    .p4               -0.276    1.180   -0.234    0.815   -0.276   -0.308
##    .p5                1.192    0.137    8.692    0.000    1.192    0.998
##    .g4                0.683    0.079    8.635    0.000    0.683    0.984
##    .g1                0.760    0.088    8.681    0.000    0.760    0.998
##    .g2                1.224    0.143    8.578    0.000    1.224    0.968
##    .g3                0.964    0.113    8.552    0.000    0.964    0.961
##    .g5                0.526    0.110    4.775    0.000    0.526    0.597
##    .g6                0.252    0.120    2.101    0.036    0.252    0.357
##     partcip_will      0.073    0.087    0.841    0.400    1.000    1.000
##     civic_crowd_gv    0.011    0.017    0.669    0.504    1.000    1.000

Then, we can extract the standardized parameter estimates as follows.

options(scipen = 999, digits = 3)

sfit1 <- standardizedsolution(fit1)

sfit1 %>% 
  filter(op == "=~") %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
lhs op rhs est.std se z pvalue
partcip_will =~ p2 0.247 0.145 1.706 0.088
partcip_will =~ p1 0.076 0.081 0.938 0.348
partcip_will =~ p3 -0.090 0.085 -1.059 0.290
partcip_will =~ p4 1.144 0.575 1.989 0.047
partcip_will =~ p5 0.044 0.075 0.597 0.551
civic_crowd_gov =~ g4 0.127 0.094 1.350 0.177
civic_crowd_gov =~ g1 0.050 0.095 0.521 0.602
civic_crowd_gov =~ g2 0.179 0.093 1.919 0.055
civic_crowd_gov =~ g3 -0.198 0.093 -2.127 0.033
civic_crowd_gov =~ g5 0.635 0.095 6.656 0.000
civic_crowd_gov =~ g6 0.802 0.107 7.504 0.000

Finally, we can extract the most important fit statistics as follows.

fit_1 <- fitmeasures(fit1, c("npar", "chisq", "df", "cfi", "rmsea", "srmr"))

cbind(read.table(text = names(fit_1)), fit_1) %>% 
  select(fit_1) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
fit_1
npar 23.0000
chisq 60.1812
df 43.0000
cfi 0.8058
rmsea 0.0514
srmr 0.0731

Model improvement

Modification indices can be requested in order to determine if changes should be made to the model such as adding error covariance terms or removing poor performing items, in order to improve the model. Here, we sorted the output by mi in decreasing order.

modificationindices(fit1) %>% 
  arrange(desc(mi)) %>% 
  head(10) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
lhs op rhs mi epc sepc.lv sepc.all sepc.nox
p1 ~~ p5 12.89 -0.331 -0.331 -0.291 -0.291
p3 ~~ g2 6.65 0.224 0.224 0.206 0.206
g1 ~~ g2 6.54 0.202 0.202 0.206 0.206
p5 ~~ g1 5.25 0.177 0.177 0.186 0.186
p2 ~~ g4 2.34 0.108 0.108 0.119 0.119
p5 ~~ g4 2.25 0.110 0.110 0.121 0.121
p2 ~~ p5 2.25 0.142 0.142 0.118 0.118
g4 ~~ g6 2.14 -0.090 -0.090 -0.129 -0.129
p2 ~~ g6 1.71 -0.092 -0.092 -0.100 -0.100
p2 ~~ p1 1.59 -0.116 -0.116 -0.102 -0.102

In the lavaan package, ~~ means covariance terms. Let’s add p1 ~~ p5 and see if the model improves.

model2 <- "
partcip_will =~ p2 + p1 + p3 + p4 + p5
civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
p1 ~~ p5
"

cat(model2)
## 
## partcip_will =~ p2 + p1 + p3 + p4 + p5
## civic_crowd_gov =~ g4 + g1 + g2 + g3 + g5 + g6
## p1 ~~ p5
fit2 <- cfa(model = model2, data = survey_df_c_mat)

fits <- list(fit_1 = fit1, fit_2 = fit2)

round(sapply(fits, function(X)
  fitmeasures(X, c("npar", "chisq", "df", "cfi", "rmsea", "srmr"))), 3) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
fit_1 fit_2
npar 23.000 24.000
chisq 60.181 46.640
df 43.000 42.000
cfi 0.806 0.948
rmsea 0.051 0.027
srmr 0.073 0.063

As we can see, the \(\chi^2\) has improved by 12, the comparative fit index (cfi) has increased by 0.12, the root mean square error of approximation (rmsea) has improved by 0.019 and the standardized root mean square residual (srmr) has improved by 0.008.

Instead of qualitatively assessing this improvement, we can perform a significance test to compare the two models.

anova(fit1, fit2)
## Chi Square Difference Test
## 
##      Df  AIC  BIC Chisq Chisq diff Df diff Pr(>Chisq)    
## fit2 42 4572 4644  46.6                                  
## fit1 43 4583 4653  60.2       13.5       1    0.00023 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As evidenced by the p-value that is below 0.05, it seems that adding the covariance term significantly improved the model. We will then consider this latter model for the rest of this analysis.

We can now save the latent variables and add them to our dataset. In the following two tables we show, for the first six observations, the items p1 to p5 along with their corresponding latent variable, as well as the items g1 to g6 along with their own latent variable.

pfit2 <- data.frame(predict(fit2))
survey_df_c_mat2 <- cbind(survey_df_c_mat, pfit2)

survey_df_c <- survey_df_c %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  cbind(partcip_will = survey_df_c_mat2$partcip_will,
        civic_crowd_gov = survey_df_c_mat2$civic_crowd_gov)

survey_df_c %>% 
  select(p1:p5, partcip_will) %>% 
  head() %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
p1 p2 p3 p4 p5 partcip_will
Disagree Disagree Disagree Agree Agree 0.136
Neither agree nor disagree Neither agree nor disagree Disagree Disagree Agree -0.489
Disagree Agree Disagree Neither agree nor disagree Strongly agree -0.169
Neither agree nor disagree Neither agree nor disagree Neither agree nor disagree Neither agree nor disagree Agree -0.169
Neither agree nor disagree Neither agree nor disagree Disagree Agree Strongly disagree 0.149
Agree Strongly agree Agree Agree Agree 0.154
survey_df_c %>% 
  select(g1:g6, civic_crowd_gov) %>% 
  head() %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
g1 g2 g3 g4 g5 g6 civic_crowd_gov
Disagree Neither agree nor disagree Disagree Agree Disagree Disagree -0.101
Strongly disagree Agree Disagree Agree Agree Agree 0.105
Agree Agree Disagree Strongly agree Neither agree nor disagree Neither agree nor disagree 0.014
Disagree Neither agree nor disagree Neither agree nor disagree Agree Agree Agree 0.107
Agree Neither agree nor disagree Disagree Neither agree nor disagree Neither agree nor disagree Agree 0.088
Disagree Strongly agree Disagree Agree Agree Disagree -0.019

Analysis using latent variables

Qualitative assessment of the relationship between dependent and independent variables

Let’s see now if we can better assess the relationship between our independent variables (partcip_will and civic_crowd_gov) and our dependent variable, which is the degree of interest from citizens in citizens initiatives.

Our independent variables are now reduced to a continuous numeric value, thus we can use boxplots to explore the relationships visually.

survey_df_c %>% 
  ggplot(aes(y = partcip_will, x = interest, fill = interest)) +
  geom_boxplot(alpha = .5) +
  labs(title = "Relationship btw particip_will and interest",
       x = "") +
  theme_ipsum_rc() +
  scale_fill_ipsum() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank()) +
  coord_flip() +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

survey_df_c %>% 
  ggplot(aes(y = civic_crowd_gov, x = interest, fill = interest)) +
  geom_boxplot(alpha = .5) +
  labs(title = "Relationship btw civic_crowd_gov and interest",
       x = "") +
  theme_ipsum_rc() +
  scale_fill_ipsum() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank()) +
  coord_flip() +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

From these two plots, we can qualitatively assess that:

  • negative values of both independent variables could be associated with not being interested in citizens initiatives;
  • there does not seem to be a significant difference between the other categories of interest: they are all associated with values of the latent variables centered at or above zero.

Again, we are not claiming here that these results are correct and we will not include them in our conclusions. We will refine this part of the document further in the next following months.

Part II - Entrepreneurs

Exploratory Data Analysis

Our sample frame here is all active members (whom we simply call ‘entrepreneurs’ in this document) of a local citizens initiative, social project or social organization in the Brussels-Capital Region.

Control variables

The EDA of the control variables for the entrepreneurs was shared with the EDA of the control variables for the citizens. Therefore, please refer back to that section if needed.

Dependent and independent variables

Concerning the entrepeneurs (active members of a local citizens initiative, social project or social organization), the objective of this survey is to test two hypotheses that were derived from our qualitative research:

  1. H1: A more participative platform is positively correlated with the interest to join a civic crowdfunding platform.
  2. H2: A positive opinion on projects that are co-created by social entrepreneurs and the local government is positively correlated with the interest to join a civic crowdfunding platform.

The interest to join a civic crowdfunding platform is thus our dependent variable (this is the outcome that BrusselsTogether would like to better understand) and is measured by the question shown in the table below. Also shown in this table is the related categories in the R code.

x <- list(q = names(lut_e_int_eng), v = levels(survey_df_e$interest)) %>% as_data_frame()
colnames(x) <- c(header_lookup$google_f_header[4], "Category in R code")
x %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
As an active member of a local citizen’s initiative, social project or social organization, please indicate your level of interest in joining a civic crowdfunding platform: Category in R code
I am not interested in joining a civic crowdfunding platform. Not_interested
I could be interested to join a civic crowdfunding platform. Could_be_interested
I am ready to join a civic crowdfunding platform. Is_ready

The answers are distributed as shown in the figure below:

survey_df_e %>%
  count(interest) %>%
  rename(interest_count = n) %>%
  mutate(interest = reorder(interest, interest_count)) %>%
  ggplot(aes(x = interest, y = interest_count)) +
  geom_col(fill = "#2C7FB8", alpha = 0.8) +
  theme_ipsum_rc(grid = "X") +
  scale_fill_ipsum() +
  coord_flip() +
  labs(title = "Distribution of responses to interest question", x = "") +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(plot.margin = unit(c(1,1,1,0), "cm"))

options(scipen = 999, digits = 2)

survey_df_e %>% 
  count(interest) %>% 
  mutate(total = sum(n),
         percent = n / total) %>% 
  select(interest, n, percent) %>% 
  arrange(desc(n)) %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
interest n percent
Could_be_interested 31 0.79
Is_ready 7 0.18
Not_interested 1 0.03

The independent variables (the appeal of a more participative platform for H1 and the opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms for H2) are what we think could be associated with the dependent variable (although we could not make causal inference from an observational study like this one). If the correlations are verified, then we could provide BrusselsTogether with actionnable recommendations, such as making the platform more participative or playing a role of intermediary between citizens initiatives and the local authorities.

For each of the two independent variables, we asked a set of questions which, in our opinion, should reflect their corresponding independent variable. In this case, we could say that the independent variables are latent variables that are each measured by a set of several items.

The table below displays the items along with their related variable name in the R code. The items related to the appeal of a more participative platform are p1 to p5 and the items related to the opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms are g1 to g6.

Additionnaly, we asked the respondents whether their interest in contributing to the future of their neighborhood had increased in the past several years (variable name neighb).

All of those items were measured on a 5-points Likert scale, from Strongly_disagree to Strongly_agree.

meta <- data_frame(question = header_lookup$google_f_header[c(5:15, 29)],
                   variable_name = names(survey_df_e)[c(5:16)])

meta %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
question variable_name
Face-to-face interactions are necessary for collective action. p1
For most citizens on a civic crowdfunding platform, personnally funding a citizen’s initiative is enough to satisfy their desire to engage in this project. p2
Citizens would like to participate in debates on local public issues that they care about. p3
It is relatively easy for citizens to find places where local public issues are collectively debated. p4
I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially. p5
Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government. g1
I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. g2
The local government is able to identify the projects that are most wanted by citizens. g3
Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. g4
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. g5
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. g6
My interest in contributing to the future of my neighborhood has increased in the past several years. neighb

The answers are distributed as shown in the three plots below:

questions_e <- header_lookup$google_f_header[5:15]
quest_code_e <- names(survey_df_e)[5:15]
survey_df_e_df <- as.data.frame(survey_df_e)
names(survey_df_e_df)[5:15] <- questions_e


my_df_names_e_p <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                     "Agree", "Strongly agree")
mylevels_e_p <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
                  "Agree", "Strongly agree")
factor_levels_e_p <- c("Face-to-face interactions are necessary for collective action.",
                       "For most citizens on a civic crowdfunding platform, personnally funding a citizen's initiative is enough to satisfy their desire to engage in this project.",
                       "Citizens would like to participate in debates on local public issues that they care about.",
                       "It is relatively easy for citizens to find places where local public issues are collectively debated.",
                       "I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially.")

temp_e_p <- survey_df_e_df[,5:9]

temp_e_p_gathered <- temp_e_p %>% 
  gather(key = "question", value = "answer")

temp_e_p_gathered$answer <- factor(temp_e_p_gathered$answer, ordered = T,
                                   levels = mylevels_e_p)

t_e_p <- prop.table(table(temp_e_p_gathered$question,
                          temp_e_p_gathered$answer), 1)
e_p_title <- "\n\nEntrepreneurs - Participative questions\n"

lik_plot(t_e_p, my_df_names_e_p, mylevels_e_p, factor_levels_e_p, e_p_title)

We can observe from this plot that the responses to most questions are polarized. In particular:

  • It seems that most of the respondents agree with the statement: Face-to-face interactions are necessary for collective action.
  • It seems that most of the respondents agree with the statement: Citizens would like to participate in debates on local public issues that they care about.
  • It seems that most of the respondents disagree with the statement: It is relatively easy for citizens to find places where local public issues are collectively debated.
  • It seems that most of the respondents agree with the statement: I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially.
my_df_names_e_g <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels_e_g <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels_e_g <- c("Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government.",
                       "I regard crowdfunded citizen's initiatives as a sort of_Do-It-Yourself government_.",
                       "The local government is able to identify the projects that are most wanted by citizens.",
                       "Civic crowdfunding platforms could play a role of intermediary between citizen's initiatives and the local government.",
                       "General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by the local government.",
                       "General interest issues are addressed more efficiently by crowdfunded citizen's initiatives than by non-profit organizations subsidized by the local government.")

temp_e_g <- survey_df_e_df[,10:15]

temp_e_g_gathered <- temp_e_g %>% 
  gather(key = "question", value = "answer")

temp_e_g_gathered$answer <- factor(temp_e_g_gathered$answer, ordered = T,
                                   levels = mylevels_e_g)

t_e_g <- prop.table(table(temp_e_g_gathered$question,
                          temp_e_g_gathered$answer), 1)
e_g_title <- "Entrepreneurs - Government questions\n"

lik_plot(t_e_g, my_df_names_e_g, mylevels_e_g, factor_levels_e_g, e_g_title)

Once again, we can observe from this plot that the responses to some questions are polarized. In particular:

  • It seems that most of the respondents disagree with the statement: Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government.
  • Quite interestingly for the validation of the hypothese H2, it seems that most of the respondents agree with the statement: Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government.
questions_neighb <- header_lookup$google_f_header[29]
quest_code_neighb <- names(survey_df_e)[16]
names(survey_df_e_df)[16] <- questions_neighb

my_df_names_e_neighb <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                          "Agree", "Strongly agree")
mylevels_e_neighb <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
                       "Agree", "Strongly agree")
factor_levels_e_neighb <- c("My interest in contributing to the future of my neighborhood has increased in the past several years.")

temp_e_neighb <- as.data.frame(survey_df_e_df[,16])
names(temp_e_neighb) <- "My interest in contributing to the future of my neighborhood has increased in the past several years."

temp_e_neighb_gathered <- temp_e_neighb %>% 
  gather(key = "question", value = "answer")

temp_e_neighb_gathered$answer <- factor(temp_e_neighb_gathered$answer,
                                        ordered = T,
                                        levels = mylevels_e_neighb)

t_e_neighb <- prop.table(table(temp_e_neighb_gathered$question,
                          temp_e_neighb_gathered$answer), 1)
e_neighb_title <- "Entrepreneurs - Interest in future of neighb. question\n"

lik_plot(t_e_neighb, my_df_names_e_neighb, mylevels_e_neighb, factor_levels_e_neighb, e_neighb_title)

The response to this question is polarized once again, as entrepreneur seem to generally agree with the statement: My interest in contributing to the future of my neighborhood has increased in the past several years.

Distribution of answers for independent variables broken down by interest

In order to have a rough sense of the relationship between, on the one hand, the answers to the indenpendent variables questions and the control variable questions, and on the other hand, the answers to the dependent variable question, in this section we are going to break down the previous vizualisations by category of answer to the dependent variable question. We will try to confirm those relationships quantitatively later on, in the chapter Logistic Regression Modeling.

Let us then break down the distribution of answers related to the independent variables by interest in joining a civic crowdfunding platform.

As for the analysis of the citizens responses, we should keep in mind that people ‘ready to join’ and ‘not interested’ represent only a fraction of the sample.

Here below is a series of plots related to the ‘participative’ questions.

my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels <- c("Not_interested",
                   "Could_be_interested",
                   "Is_ready")

t_p1 <- prop.table(table(survey_df_e$interest, survey_df_e$p1), 1)
p1_title <- "\n\n\n\nFace-to-face interactions are necessary for collective action.\n"

lik_plot(t_p1, my_df_names, mylevels, factor_levels, p1_title)

t_p2 <- prop.table(table(survey_df_e$interest, survey_df_e$p2), 1)
p2_title <- "\n\nFor most citizens on a civic crowdfunding platform,\npersonnally funding a citizen's initiative is enough to\nsatisfy their desire to engage in this project.\n"

lik_plot(t_p2, my_df_names, mylevels, factor_levels, p2_title)

t_p3 <- prop.table(table(survey_df_e$interest, survey_df_e$p3), 1)
p3_title <- "\n\n\nCitizens would like to participate in debates\non local public issues that they care about.\n"

lik_plot(t_p3, my_df_names, mylevels, factor_levels, p3_title)

t_p4 <- prop.table(table(survey_df_e$interest, survey_df_e$p4), 1)
p4_title <- "\n\n\nIt is relatively easy for citizens to find places where\nlocal public issues are collectively debated.\n"

lik_plot(t_p4, my_df_names, mylevels, factor_levels, p4_title)

t_p5 <- prop.table(table(survey_df_e$interest, survey_df_e$p5), 1)
p5_title <- "\n\nI would welcome the possibility for citizens on a crowdfunding platform\nto express their disagreement about some aspects\nof the projects they contribute to financially.\n"

lik_plot(t_p5, my_df_names, mylevels, factor_levels, p5_title)

Here below is a series of plots related to the ‘government’ questions.

my_df_names <- c("interest", "Strongly disagree", "Disagree", "Neither agree nor disagree",
                 "Agree", "Strongly agree")
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
              "Agree", "Strongly agree")
factor_levels <- c("Not_interested",
                   "Could_be_interested",
                   "Is_ready")

t_g1 <- prop.table(table(survey_df_e$interest, survey_df_e$g1), 1)
g1_title <- "\n\n\nCivic crowdfunding represents a direct threat to the funding of\npublic goods and services by the local government.\n"

lik_plot(t_g1, my_df_names, mylevels, factor_levels, g1_title)

t_g2 <- prop.table(table(survey_df_e$interest, survey_df_e$g2), 1)
g2_title <- "\n\n\nI regard crowdfunded citizen's initiatives\nas a sort of_Do-It-Yourself government_.\n"

lik_plot(t_g2, my_df_names, mylevels, factor_levels, g2_title)

t_g3 <- prop.table(table(survey_df_e$interest, survey_df_e$g3), 1)
g3_title <- "\n\n\nThe local government is able to identify\nthe projects that are most wanted by citizens.\n"

lik_plot(t_g3, my_df_names, mylevels, factor_levels, g3_title)

t_g4 <- prop.table(table(survey_df_e$interest, survey_df_e$g4), 1)
g4_title <- "\n\n\nCivic crowdfunding platforms could play a role of intermediary\nbetween citizen's initiatives and the local government.\n"

lik_plot(t_g4, my_df_names, mylevels, factor_levels, g4_title)

t_g5 <- prop.table(table(survey_df_e$interest, survey_df_e$g5), 1)
g5_title <- "\n\n\nGeneral interest issues are addressed more efficiently by\ncrowdfunded citizen's initiatives than by the local government.\n"

lik_plot(t_g5, my_df_names, mylevels, factor_levels, g5_title)

t_g6 <- prop.table(table(survey_df_e$interest, survey_df_e$g6), 1)
g6_title <- "\n\nGeneral interest issues are addressed more efficiently by\ncrowdfunded citizen's initiatives than by\nnon-profit organizations subsidized by the local government.\n"

lik_plot(t_g6, my_df_names, mylevels, factor_levels, g6_title)

Here below is the plot related to the ‘interest in the future of my neighborhood’ question.

t_neighb <- prop.table(table(survey_df_e$interest, survey_df_e$neighb), 1)
neighb_title <- "\n\n\nMy interest in contributing to the future of my neighborhood\nhas increased in the past several years.\n"

lik_plot(t_neighb, my_df_names, mylevels, factor_levels, neighb_title)

Distribution of answers for independent variables broken down by education

Here below is a plot showing the education versus answers to the independent variables questions, broken down by the independent variables questions.

survey_df_e <- survey_df_e %>% 
  mutate(edu_num = unclass(survey_df_e$edu))

tidy_survey_df_e <- survey_df_e %>% 
  select(edu_num, p1:g6, neighb) %>% 
  data.matrix() %>% 
  as_data_frame() %>% 
  gather(key, value, -edu_num)

tidy_survey_df_e %>%
  group_by(key, value) %>%
  summarize(edu_num = mean(edu_num, na.rm = TRUE)) %>%
  ggplot(aes(value, edu_num, color = key)) +
  geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
  geom_point() +
  labs(title = "Relationship btw education and answers to likert scale questions",
       x = "Answer to likert scale question", y = "Mean education") +
  facet_wrap(~factor(key, ordered = T,
                     levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
                                "g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
  theme_ipsum_rc() +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(legend.position = "none") +
  theme(plot.margin = unit(c(1,1,1,0), "cm")) +
  theme(
    panel.grid.minor.y = element_blank()
  )

For this plot, the education is coded as an integer value as shown in the following table (see the column edu_num). This table also shows the count for each level (under the column n).

options(scipen = 999, digits = 3)

survey_df_e %>% 
  count(edu, edu_num) %>%
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
edu edu_num n
Primary_or_None 1 1
High_school 2 2
Bachelor 4 5
Master 5 29
PhD 6 2

Here below is a reminder of the questions and their variable names.

meta <- data_frame(question = header_lookup$google_f_header[c(5:15, 29)],
                   variable_name = names(survey_df_e)[c(5:16)])

meta %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
question variable_name
Face-to-face interactions are necessary for collective action. p1
For most citizens on a civic crowdfunding platform, personnally funding a citizen’s initiative is enough to satisfy their desire to engage in this project. p2
Citizens would like to participate in debates on local public issues that they care about. p3
It is relatively easy for citizens to find places where local public issues are collectively debated. p4
I would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially. p5
Civic crowdfunding represents a direct threat to the funding of public goods and services by the local government. g1
I regard crowdfunded citizen’s initiatives as a sort of Do-It-Yourself government. g2
The local government is able to identify the projects that are most wanted by citizens. g3
Civic crowdfunding platforms could play a role of intermediary between citizen’s initiatives and the local government. g4
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by the local government. g5
General interest issues are addressed more efficiently by crowdfunded citizen’s initiatives than by non-profit organizations subsidized by the local government. g6
My interest in contributing to the future of my neighborhood has increased in the past several years. neighb

We can observe from this plot that there is a possible positive relationship between education and the extent to which respondents agreed with the statement: The local government is able to identify the projects that are most wanted by citizens, although this should be confirmed by the statistical modeling that we will perform in the next chapter.

Distribution of answers for independent variables broken down by age

Here below is a plot showing the mean age versus answers to the independent variables questions, broken down by the independent variables questions.

survey_df_e <- survey_df_e %>% 
  mutate(age_num = ifelse(age == "14_and_below", 12,
                          ifelse(age == "14_17", 16,
                                 ifelse(age == "18_24", 20,
                                        ifelse(age == "25_34", 30,
                                               ifelse(age == "35_44", 40,
                                                      ifelse(age == "45_54", 50,
                                                             ifelse(age == "55_64", 60,
                                                                    ifelse(age == "65_74", 70, 80)))))))))

tidy_survey_df_e <- survey_df_e %>% 
  select(age_num, p1:g6, neighb) %>% 
  data.matrix() %>% 
  as_data_frame() %>% 
  gather(key, value, -age_num)

tidy_survey_df_e %>%
  group_by(key, value) %>%
  summarize(age_num = mean(age_num, na.rm = TRUE)) %>%
  ggplot(aes(value, age_num, color = key)) +
  geom_line(size = 1.2, show.legend = FALSE, alpha = 0.5) +
  geom_point() +
  labs(title = "Relationship btw age and answers to likert scale questions",
       x = "Answer to likert scale question", y = "Mean age") +
  facet_wrap(~factor(key, ordered = T,
                     levels = c("p1", "p2", "p3", "p4", "p5", "g1", "g2",
                                "g3", "g4", "g5", "g6", "neighb")), nrow = 3) +
  theme_ipsum_rc() +
  theme(plot.title = element_text(size = 14, hjust = 0.5)) +
  theme(axis.text.y = element_text(hjust = 0)) +
  theme(legend.position = "none") +
  theme(plot.margin = unit(c(1,1,1,0), "cm")) +
  theme(
    panel.grid.minor.y = element_blank()
  )

For this plot, the age group is coded as an integer value as shown in the following table (see the column age_num). This table also shows the count for each level (under the column n).

options(scipen = 999, digits = 3)

survey_df_e %>% 
  count(age, age_num) %>%
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F, position = "left", font_size = 14)
age age_num n
18_24 20 3
25_34 30 14
35_44 40 6
45_54 50 7
55_64 60 7
65_74 70 1
75_plus 80 1

From this plot we cannot spot any clear trend.

Logistic Regression modeling

Introduction

Similar to what we did in Part I for the citizens, we are going to fit a series of multiple binary logistic regression models in order to see which predictors (in our case the independent variables) are associated with the likelihood of entrepreneurs to be interested in joining a civic crowdfunding platforms.

Here again, our dependent variable has more than two classes (three in this case: Not_interested, Could_be_interested and Is_Ready). Therefore, we will fit several models after having split our dependent variable classes into two groups, in several ways.

Logistic regression with Interest as the target variable

Model 1

In this section and also in the next one, we will fit a model with Interest (in joining a civic crowdfunding platform) as the target variable and all of the independent variables, as well as the control variables, as the explanatory variables.

For the first model, we will split the Interest variable into the two following groups:

  • Class 0: Not_interested;
  • Class 1: The rest of the possible classes combined (Could_be_interested and Is_Ready).

Also, we filter out observations with gender other than Female and Male. Then, Females are represented by the integer value 1 and Males are represented by the integer value 2.

Answers to Likert scale questions range from 1 (Strongly_disagree) to 5 (Strongly_agree).

Answers to Likert scale questions range from 1 (Strongly_disagree) to 5 (Strongly agree).

Answers to the question about the age of the respondent range from 1 (Under_14) to 9 (75_plus).

Answers to the education question range from 1 (Primary_or_None) to 6 (PhD).

Finally, answers to the location and language questions have three levels:

  • Brussels = 1, Flanders = 2, Wallonia = 3;
  • English = 1, Dutch = 2, French = 3.

Thus, the first few rows of the input to the model look like this:

survey_df_e_mat <- survey_df_e %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  mutate(interest_binary = ifelse(interest == "Not_interested", 0, 1)) %>% 
  select(interest_binary, language, p1:edu) %>% 
  data.matrix() %>% 
  as_data_frame()

survey_df_e_mat %>% 
  head() %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = T, font_size = 14)
interest_binary language p1 p2 p3 p4 p5 g1 g2 g3 g4 g5 g6 neighb loc gender age edu
1 3 5 3 5 1 3 4 3 1 3 3 1 5 1 2 6 5
1 3 4 4 4 2 2 2 4 2 4 3 4 4 1 2 5 1
1 3 4 3 5 2 5 2 2 1 2 3 2 4 2 2 5 4
1 1 5 5 5 2 3 5 4 4 4 4 4 4 1 1 5 6
1 3 4 4 3 4 2 2 2 4 3 2 2 5 3 2 7 5
1 3 3 4 3 2 4 3 4 4 4 4 4 4 1 1 4 2

Let us now examine the output of the model:

glm_e_1 <- glm(interest_binary ~ .,  family = "binomial", 
                  data = survey_df_e_mat)

summary(glm_e_1)
## 
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_e_mat)
## 
## Deviance Residuals: 
##          Min            1Q        Median            3Q           Max  
## -0.000008053   0.000000021   0.000000021   0.000002826   0.000007360  
## 
## Coefficients:
##                Estimate  Std. Error z value Pr(>|z|)
## (Intercept)      43.232 3077549.046       0        1
## language          5.733  358960.435       0        1
## p1                3.129  187509.506       0        1
## p2                4.139  212730.988       0        1
## p3               -7.843  143647.712       0        1
## p4               -2.787  119859.452       0        1
## p5               -1.936  237211.576       0        1
## g1                3.348  328149.285       0        1
## g2               -1.828  173352.396       0        1
## g3               -1.600  189327.074       0        1
## g4                7.895  238950.246       0        1
## g5               -0.410  181919.554       0        1
## g6               -9.881  142554.362       0        1
## neighb           -3.069  317087.240       0        1
## loc               0.296  458929.396       0        1
## gender            3.281  834734.769       0        1
## age              -1.442  279644.120       0        1
## edu               2.479   82403.834       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9.24862260353271  on 37  degrees of freedom
## Residual deviance: 0.00000000037884  on 20  degrees of freedom
## AIC: 36
## 
## Number of Fisher Scoring iterations: 25

In the present case, all p-values are about 1, so this model is useless. In fact, the model did not converge. This is probably due to the high imbalance in the response variable. Indeed, as already mentionned, the class Not_interested represents only 5% of the sample or 1 out of 39 observations.

Therefore, we cannot infer anything regarding factors that are associated with the likelihood of entrepreneurs not to be interested in joining a civic crowdfunding platform as opposed to being somewhat interested (Could_be_interested) or definetly interested (Is_Ready).

Model 2

In this second model, we are filtering out the observations where entrepeneurs are not interested in joining a civic crowdfunding platform, therefore we are now focusing on a subset of the data. Then we split the observations into the two following groups, related to the interest variable:

  • Class 0: Could_be_interested;
  • Class 1: Is_Ready.

The rest of the model is similar to the model 1.

survey_df_e_mat <- survey_df_e %>% 
  filter(gender %in% c("Female", "Male")) %>% 
  filter(interest != "Not_interested") %>% 
  mutate(interest_binary = ifelse(interest == "Is_ready", 1, 0)) %>% 
  select(interest_binary, language, p1:edu) %>% 
  data.matrix() %>% 
  as_data_frame()

glm_e_2 <- glm(interest_binary ~ .,  family = "binomial", 
                  data = survey_df_e_mat)

summary(glm_e_2)
## 
## Call:
## glm(formula = interest_binary ~ ., family = "binomial", data = survey_df_e_mat)
## 
## Deviance Residuals: 
##          Min            1Q        Median            3Q           Max  
## -0.000018263  -0.000003896  -0.000000021  -0.000000021   0.000017733  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)    -274.70 2119914.75       0        1
## language        -50.46  359104.27       0        1
## p1               27.77  262051.60       0        1
## p2              -34.72  184517.74       0        1
## p3                1.86  295410.54       0        1
## p4               40.90  202430.88       0        1
## p5               20.45  122281.08       0        1
## g1               30.20  122849.81       0        1
## g2              -23.19  250528.64       0        1
## g3               -4.76  186271.93       0        1
## g4               24.36  128692.86       0        1
## g5               -9.68  109096.52       0        1
## g6               -4.43  141918.69       0        1
## neighb           20.86  248870.19       0        1
## loc              16.70  489558.79       0        1
## gender           80.43  136489.25       0        1
## age             -25.97   61604.18       0        1
## edu               8.59  332359.40       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 35.8933405491689  on 36  degrees of freedom
## Residual deviance:  0.0000000017871  on 19  degrees of freedom
## AIC: 36
## 
## Number of Fisher Scoring iterations: 25

Again, the model did not converge. This is maybe due to the fact that the sample size for the entrepreneurs is too small overall to be able to build a logistic regression model.

Conclusions

It is now time to draw conclusions from this study.

First, we should mention that it is not possible to generalize those conclusions to the population of interest, since the respondents were not sampled at random. Furthermore, the sample is biased, as evidenced by the over-representation of respondents with a masters degree.

Let us remind ourselves the hypotheses that we hoped to confirm with this study.

  • Hypotheses related to the entrepreneurs:
    • H1: A more participative platform is positively correlated with the interest to join a civic crowdfunding platform.
    • H2: A positive opinion on projects that are co-created by social entrepreneurs and the local government is positively correlated with the interest to join a civic crowdfunding platform.
  • Hypotheses related to the citizens:
    • H3: The wish for more participation in debates on public issues is positively correlated with the interest to support citizens initiatives.
    • H4: A positive opinion on projects that are co-created by social entrepreneurs and the local government through civic crowdfunding platforms is positively correlated with the interest to support citizens initiatives.

Based on the observations we have made throughout this document, we can conclude the following:

  1. For the entrepreneurs (active members of a local citizens initiative, social project or social organization):
    1. Unfortunately, the ‘response rate’ for the entrepreneurs was not sufficient for us to be able to fit a statistical model, thus we can only rely on summary statistics and visualizations to draw conclusions here.
    2. We observed that most entrepreneurs recognize in general the need for face-to-face interactions, most of them think that citizens would like to participate in debates on local public issues, and they would welcome the possibility for citizens on a crowdfunding platform to express their disagreement about some aspects of the projects they contribute to financially.
    3. However, we cannot use those results to validate H1 since they could not be compared with the entrepreneurs’ interest to join a civic crowdfunding platform.
    4. We also observed that entrepreneurs generally back the idea of civic crowdfunding platform playing a role of intermediary between citizens initiatives and the local authorities. This is an interesting insight, although once again this could not be correlated with the entrepreneurs’ interest to join a civic crowdfunding platform, therefore it is not possible to validate H2.
  2. For the citizens:
    1. We observed that most respondents think that it is difficult for citizens to find places where local public issues are collectively debated. Moreover, they seem to have a positive opinion about civic crowdfunding platforms, since there seems to be a consensus about the fact that the latter don’t represent a threat to funding of services by the government.
    2. They generally agree with the entrepreneurs about the role of intermediary between citizens initiatives and the local authorities that civic crowdfunding platforms could play.
    3. Respondents who want to participate in debates on local public issues are more likely to be willing to fund or volunteer in a citizens initiative.
    4. It is those who reported to be only potentially interested to fund or volunteer in a citizen initiative (as opposed to those who reported to be ready to fund or volunteer) that reported the most enthusiasm regarding the role of intermediary between citizen initiatives and local authorities that civic crowdfunding platforms could play. Perhaps would they be ready to fund or volunteer once this role is fulfilled?
    5. Similarly, respondents only potentially interested to fund or volunteer, in particular those willing to volunteer, were more likely to report that they would be more inclined to contribute financially if they had the possibility to express their disagreement on the projects they would support. Although, once again, we cannot draw causal inferences in this study, that makes a lot of sense.
    6. Respondents who agree that personally funding a citizens initiative is enough to satisfy their desire to engage in this project were more likely to be willing to fund rather than to volunteer. That is not such a surprising result after all.
    7. Respondents who reported that they would like to participate in debates on local public issues are more likely to be already investing in citizens initiatives than not investing at all. This is quite an interesting insight, which backs our hypothesis H3.

So, overall, although we are not claiming here that all of our hypotheses are verified, there are multiple indications that a civic crowdfunding platform such as BrusselsTogether would be more attractive to both citizens and entrepreneurs should it strive for being a truly participative platform as well as a communication channel, an intermediary between the initiatives and the local authorities.