coded_df <- readRDS("C:/Users/court/Google Drive/Research/SRHS/SRHS_coded.RDS")
coded_df <- coded_df %>%
mutate_if(
is.labelled,
~(as_factor(.)))
df <- coded_df %>% select(
ll_finrole2,
ll_size,
ll_comm_unit,
bp_advert_b,
bp_screen_b,
bp_dec_flex_b,
bp_dec_standard_b,
bp_dec_offer_b,
bp_vouch_b,
bp_crim_b,
bp_dis_b,
bp_keep_pct_num,
bp_install_b,
bp_comm_unit_fee_app_b,
bp_comm_unit_fee_sec_b,
bp_comm_unit_fee_clean_b,
bp_comm_unit_fee_park_b,
bp_fair_b,
bp_comm_unit_fee_incl_b,
bp_comm_unit_deposit_b,
bp_rent_raise_b,
bp_rent_raise_num,
bp_terminate_b,
bp_terminate_num,
bp_income_b,
bp_income_num,
ll_remain_1or5yr_b,
ll_remain_reg
)
# bp_court_b,
# bp_keep_rsn,
# bp_terminate_paylate,
# bp_terminate_pay,
# bp_rent_raise_rising,
# bp_rent_raise_regulation,
# bp_rent_raise_market,
# bp_rent_raise_s_tax,
# bp_rent_raise_s_repair,
# bp_rent_raise_s_new,
# bp_rent_raise_s_market,
# bp_rent_raise_s_reg,
df <- df %>% mutate(
ll_size = relevel(as.factor(ll_size), ref="small"),
ll_comm_unit = relevel(as.factor(ll_comm_unit), ref="2 bedroom"),
ll_finrole2 = as.factor(ll_finrole2)
)
df <- df %>% rename(
"Financial_role_" = ll_finrole2,
"Common_unit_" = ll_comm_unit,
"Advertising" = bp_advert_b,
"Screening" = bp_screen_b,
"Flexible_decision" = bp_dec_flex_b,
"Standard_criteria" = bp_dec_standard_b,
"Offers_to_nonstandard" = bp_dec_offer_b,
"Rent_to_vouchers" = bp_vouch_b,
"Rent_to_criminal" = bp_crim_b,
"Disability_accom." = bp_dis_b,
"Deposit_pct_kept" = bp_keep_pct_num,
"Allow_installments" = bp_install_b,
"Fair_market_price" = bp_fair_b,
"App_fee" = bp_comm_unit_fee_app_b,
"Security_deposit" = bp_comm_unit_fee_sec_b,
"Cleaning_fee" = bp_comm_unit_fee_clean_b,
"Parking_fee" = bp_comm_unit_fee_park_b,
"Utilities_included" = bp_comm_unit_fee_incl_b,
"Last_month_required" = bp_comm_unit_deposit_b,
"Raised_rent" = bp_rent_raise_b,
"Raised_rent_pct" = bp_rent_raise_num,
"Ever_terminated" = bp_terminate_b,
"Num_of_terminations" = bp_terminate_num,
"Income_above_avg_Seattle" = bp_income_b,
"Income_estimate" = bp_income_num,
"Still_mngr1_5yrs" = ll_remain_1or5yr_b,
"Stop_burden_regs" = ll_remain_reg)
# "Terminated_use_courts" = bp_court_b,
# --------------------------
df <- df %>%
mutate_all(as.factor)
cats = apply(df, 2, function(x) nlevels(as.factor(x)))
bp_mca <- MCA(df, graph = FALSE, na="average")
Below are two MCA analyses:
MCA #1: The first analysis was done for our meeting on November 19, 2021. This was an MCA on all business practices
MCA #2: The second analysis was done for our meeting on December 2, 2021. This was an MCA on the variables that loaded on Dimension 1 in the results of MCA #1.
Note: I have included Kyle’s comments on MCA #1 in this document to keep everyone up to date. Also, there is one change I’ve made to all analyses below: I’ve taken out the variable ‘terminated through courts’ as this had missing values for most respondents who never terminated. This created minor changes to the results, but overall things remained relatively unchanged.
I did a Multiple Correspondence Analysis (MCA) on all business practices. MCA is similar to PCA but for categorical data.
Some business practices were left out if they weren’t asked of all/most respondents (e.g. “If you raised the rent, why?”). Some landlord characteristics were also left in this analysis, including landlord size and financial role.
Here are the Eigen values of the first few dimensions and the percent of variance explained.
eig.val <- get_eigenvalue(bp_mca)
head(eig.val)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.14402592 7.762393 7.762393
## Dim.2 0.09478067 5.108281 12.870674
## Dim.3 0.07604360 4.098431 16.969105
## Dim.4 0.07135754 3.845872 20.814977
## Dim.5 0.06448769 3.475616 24.290593
## Dim.6 0.06127000 3.302196 27.592789
First dimension explains 8% of variance.
fviz_eig(bp_mca)
# column coordinates
#head(bp_mca$var$coord)
# data frames for ggplot
# bp_mca_cats_df = data.frame(bp_mca$var$coord, Variable = rep(names(cats), cats))
# bp_mca_vars_df = data.frame(bp_mca$var)
#
# bp_mca_obs_df = data.frame(bp_mca$ind$coord)
# # plot of variable categories
# ggplot(data = bp_mca_vars_df,
# aes(x = Dim.1,
# y = Dim.2,
# label = rownames(bp_mca_vars_df))) +
# geom_hline(yintercept = 0, colour = "gray70") +
# geom_vline(xintercept = 0,
# colour = "gray70") +
# geom_point(label=rownames(bp_mca_vars_df)) +
# theme(legend.position = "none") +
# ggtitle("MCA plot of variables using R package FactoMineR")
Here is a quick plot of variables along the first two dimensions. You can see the variables that contribute the most to these dimensions, such as landlord size (look at x-axis = 0.6) and renter income (y axis = 0.5).
fviz_mca_var(bp_mca, choice = "mca.cor",
col.var="darkgreen",
repel = TRUE, # Avoid text overlapping (slow)
ggtheme = theme_minimal(),
xlim = c(0,.6),
ylim = c(0,.5))
The most valuable contributions to Dimension 1 are from the following categories (listed here because bar chart is difficult to read):
results <- data.frame(bp_mca$var$eta2)
results[ "vars" ] <- rownames(results)
results %>% filter(Dim.1>0.05) %>%
ggplot(aes(x=reorder(vars, -Dim.1), y=Dim.1)) +
geom_point()+
labs(x="",
y="",
title="Contribution to Dimension 1") +
theme_classic() +
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))
The most valuable contributions to Dimension 2 are from the following categories (listed here because bar chart is difficult to read):
results %>% filter(Dim.2>0.05) %>%
ggplot(aes(x=reorder(vars, -Dim.2), y=Dim.2)) +
geom_point()+
labs(x="",
y="",
title="Contribution to Dimension 2") +
theme_classic() +
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))
The most valuable contributions to Dimension 1 are from the following categories (listed here because bar chart is difficult to read):
# Contributions of variables to PC1
fviz_contrib(bp_mca, choice = "var", axes = 1, top = 20)
The most valuable contributions to Dimension 2 are from the following categories (listed here because bar chart is difficult to read):
# Contributions of variables to PC2
fviz_contrib(bp_mca, choice = "var", axes = 2, top = 20)
This plot shows how cases are distributed along the dimensions. Big and small landlord are well differentiated along dimension 1 (x-axis), while not well differentiated along dimension 2 (y-axis).
fviz_pca_ind(bp_mca,
label = "none", # hide individual labels
habillage = df$ll_size, # color by groups
palette = c("#00AFBB", "#E7B800"),
addEllipses = TRUE # Concentration ellipses
)
This analysis focuses on all of the variables that loaded on the 1st dimension of MCA #1 (per Kyle’s suggestions, please see above). I selected these variables only, and ran another MCA.
The objective of this analysis is to see which variables fit closely together and to see if we should create an index of some of the business practices.
The variables kept for this analysis were:
df_2<- df %>% select(ll_size,
"Financial_role_",
"Rent_to_vouchers",
"Rent_to_criminal",
"Raised_rent_pct",
"Ever_terminated" ,
"Num_of_terminations",
"Disability_accom.",
"Income_estimate")
# "Terminated_use_courts",
cats = apply(df_2, 2, function(x) nlevels(as.factor(x)))
bp_mca_2 <- MCA(df_2, graph = FALSE, na="average")
Here are the Eigen values of the first few dimensions and the percent of variance explained.
eig.val <- get_eigenvalue(bp_mca_2)
head(eig.val)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.3225740 15.033827 15.03383
## Dim.2 0.1721780 8.024495 23.05832
## Dim.3 0.1249742 5.824524 28.88285
## Dim.4 0.1220761 5.689458 34.57230
## Dim.5 0.1192670 5.558538 40.13084
## Dim.6 0.1147635 5.348649 45.47949
First dimension explains 15% of variance.
fviz_eig(bp_mca_2)
Here is a plot of variables along the first two dimensions.
fviz_mca_var(bp_mca_2, choice = "mca.cor",
col.var="darkgreen",
repel = TRUE, # Avoid text overlapping (slow)
ggtheme = theme_minimal(),
xlim = c(0,.6),
ylim = c(0,.5))
Here are the ordered contributions to Dimension 1 from highest to lowest contribution:
results_2 <- data.frame(bp_mca_2$var$eta2)
results_2[ "vars" ] <- rownames(results_2)
results_2 %>% filter(Dim.1>0.05) %>%
ggplot(aes(x=reorder(vars, -Dim.1), y=Dim.1)) +
geom_point()+
labs(x="",
y="",
title="Contribution to Dimension 1") +
theme_classic() +
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))
Comments on MCA #1
Comments from Kyle:
Three general observations to get this conversation rolling:
The two dimensions that you pull out and analyze are not only statistically distinct, but also conceptually distinct. The first dimension seems to be about management practices while the second is a little more of a hodge-podge dominated by financial positionality. This conceptual distinction makes sense to me.
Dimension 2 appears to be of only marginal utility, with an eigenvalue below 1 (thus hard to defend in published work).
Nevertheless, it is really interesting that large and small landlords are highly differentiated on the first dimension but not so much on the second. This might hint at an important challenge to popular assumptions about small landlords.
In terms of next steps, you might try running an MCA on just the variables that load highly on Dimension 1 to get a sense of how tightly they fit together without the muddy waters of the less central variables, and give us a sense of whether creating an index of permissive business practices might be worthwhile. This is essentially shifting us from an exploratory to a confirmatory factor analysis.