coded_df <- readRDS("C:/Users/court/Google Drive/Research/SRHS/SRHS_coded.RDS")
coded_df <- coded_df %>%
  mutate_if(
        is.labelled,
        ~(as_factor(.)))
df <- coded_df %>% select(
   ll_finrole2,
   ll_size,
   ll_comm_unit,
   bp_advert_b,
   bp_screen_b,
   bp_dec_flex_b,
   bp_dec_standard_b,
   bp_dec_offer_b,
   bp_vouch_b,
   bp_crim_b,
   bp_dis_b,
   bp_keep_pct_num,
   bp_install_b,
   bp_comm_unit_fee_app_b,
   bp_comm_unit_fee_sec_b,
   bp_comm_unit_fee_clean_b,
   bp_comm_unit_fee_park_b,
   bp_fair_b,
   bp_comm_unit_fee_incl_b,
   bp_comm_unit_deposit_b,
   bp_rent_raise_b,
   bp_rent_raise_num,
   bp_terminate_b,
   bp_terminate_num,
   bp_income_b,
   bp_income_num,
   ll_remain_1or5yr_b,
   ll_remain_reg
   )
#    bp_court_b,

   # bp_keep_rsn,
   # bp_terminate_paylate,
   # bp_terminate_pay,
   # bp_rent_raise_rising,
   # bp_rent_raise_regulation,
   # bp_rent_raise_market,
   # bp_rent_raise_s_tax,
   # bp_rent_raise_s_repair,
   # bp_rent_raise_s_new,
   # bp_rent_raise_s_market,
   # bp_rent_raise_s_reg,

df <- df %>% mutate(
  ll_size = relevel(as.factor(ll_size), ref="small"),
  ll_comm_unit = relevel(as.factor(ll_comm_unit), ref="2 bedroom"),
  ll_finrole2 = as.factor(ll_finrole2)
)

df <- df %>% rename(
                    "Financial_role_" = ll_finrole2,
                    "Common_unit_" = ll_comm_unit,
                    "Advertising" = bp_advert_b,
                    "Screening" = bp_screen_b,
                    "Flexible_decision" = bp_dec_flex_b,
                    "Standard_criteria" = bp_dec_standard_b,
                    "Offers_to_nonstandard" = bp_dec_offer_b,
                    "Rent_to_vouchers" = bp_vouch_b,
                    "Rent_to_criminal" = bp_crim_b,
                    "Disability_accom." = bp_dis_b,
                    "Deposit_pct_kept" = bp_keep_pct_num,
                    "Allow_installments" = bp_install_b,
                    "Fair_market_price" = bp_fair_b,
                    "App_fee"   =   bp_comm_unit_fee_app_b,
                    "Security_deposit" =    bp_comm_unit_fee_sec_b,
                    "Cleaning_fee"  =   bp_comm_unit_fee_clean_b,
                    "Parking_fee"   =   bp_comm_unit_fee_park_b,
                    "Utilities_included" = bp_comm_unit_fee_incl_b,
                    "Last_month_required" = bp_comm_unit_deposit_b,
                    "Raised_rent" = bp_rent_raise_b,
                    "Raised_rent_pct" = bp_rent_raise_num,
                     "Ever_terminated" = bp_terminate_b,
                     "Num_of_terminations" = bp_terminate_num,
                     "Income_above_avg_Seattle" = bp_income_b,
                     "Income_estimate" = bp_income_num,
                     "Still_mngr1_5yrs" = ll_remain_1or5yr_b,
                     "Stop_burden_regs" = ll_remain_reg)
                   
#                      "Terminated_use_courts" = bp_court_b,


# --------------------------

df <- df %>%
  mutate_all(as.factor)

Outline

cats = apply(df, 2, function(x) nlevels(as.factor(x)))

bp_mca <- MCA(df, graph = FALSE, na="average")

Below are two MCA analyses:

  1. MCA #1: The first analysis was done for our meeting on November 19, 2021. This was an MCA on all business practices

  2. MCA #2: The second analysis was done for our meeting on December 2, 2021. This was an MCA on the variables that loaded on Dimension 1 in the results of MCA #1.

Note: I have included Kyle’s comments on MCA #1 in this document to keep everyone up to date. Also, there is one change I’ve made to all analyses below: I’ve taken out the variable ‘terminated through courts’ as this had missing values for most respondents who never terminated. This created minor changes to the results, but overall things remained relatively unchanged.

MCA #1

I did a Multiple Correspondence Analysis (MCA) on all business practices. MCA is similar to PCA but for categorical data.

Some business practices were left out if they weren’t asked of all/most respondents (e.g. “If you raised the rent, why?”). Some landlord characteristics were also left in this analysis, including landlord size and financial role.

BP dimensions

Here are the Eigen values of the first few dimensions and the percent of variance explained.

eig.val <- get_eigenvalue(bp_mca)
head(eig.val)
##       eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.14402592         7.762393                    7.762393
## Dim.2 0.09478067         5.108281                   12.870674
## Dim.3 0.07604360         4.098431                   16.969105
## Dim.4 0.07135754         3.845872                   20.814977
## Dim.5 0.06448769         3.475616                   24.290593
## Dim.6 0.06127000         3.302196                   27.592789

First dimension explains 8% of variance.

fviz_eig(bp_mca)

# column coordinates
#head(bp_mca$var$coord)

# data frames for ggplot
# bp_mca_cats_df = data.frame(bp_mca$var$coord, Variable = rep(names(cats), cats))
# bp_mca_vars_df = data.frame(bp_mca$var)
# 
# bp_mca_obs_df = data.frame(bp_mca$ind$coord)
# # plot of variable categories
# ggplot(data = bp_mca_vars_df, 
#        aes(x = Dim.1, 
#            y = Dim.2, 
#            label = rownames(bp_mca_vars_df))) +
#    geom_hline(yintercept = 0, colour = "gray70") +
#    geom_vline(xintercept = 0, 
#               colour = "gray70") +
#    geom_point(label=rownames(bp_mca_vars_df)) +
#    theme(legend.position = "none") +
#    ggtitle("MCA plot of variables using R package FactoMineR")

Here is a quick plot of variables along the first two dimensions. You can see the variables that contribute the most to these dimensions, such as landlord size (look at x-axis = 0.6) and renter income (y axis = 0.5).

fviz_mca_var(bp_mca, choice = "mca.cor", 
             col.var="darkgreen",
            repel = TRUE, # Avoid text overlapping (slow)
            ggtheme = theme_minimal(),
            xlim = c(0,.6),
            ylim = c(0,.5))

Variable contributions

Variable contributions to dimension 1

The most valuable contributions to Dimension 1 are from the following categories (listed here because bar chart is difficult to read):

  1. Landlord size
  2. Rents to vouchers
  3. Terminated number
  4. Tenant income
  5. Ever terminated
  6. Rents to criminal history
  7. % Raised the rent last year
  8. Tenant income above avg. HH ($50,000)
  9. Ever Raised rent in last year
  10. Financial role
  11. Tenant ever requestested disability accommodation
results <- data.frame(bp_mca$var$eta2)
results[ "vars" ] <- rownames(results)

results %>% filter(Dim.1>0.05) %>% 
   ggplot(aes(x=reorder(vars, -Dim.1), y=Dim.1)) +
   geom_point()+
   labs(x="",
        y="",
        title="Contribution to Dimension 1") +
      theme_classic() +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) 

Variable contributions to dimension 2

The most valuable contributions to Dimension 2 are from the following categories (listed here because bar chart is difficult to read):

  1. Tenant income
  2. Tenant income above avg. HH ($50,000)
  3. % Raised rent in last year
  4. Ever Raised rent in last year
  5. Decisions made on standard criteria
  6. Word-of-mouth advertising
  7. Screening
  8. Security deposit
  9. Rent above fair market price
  10. % Deposit kept
  11. Flexible lease decisions
results %>% filter(Dim.2>0.05) %>% 
   ggplot(aes(x=reorder(vars, -Dim.2), y=Dim.2)) +
   geom_point()+
   labs(x="",
        y="",
        title="Contribution to Dimension 2") +
      theme_classic() +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) 

Categorical contributions

Categorical contributions to dimension 1

The most valuable contributions to Dimension 1 are from the following categories (listed here because bar chart is difficult to read):

  1. Landlord size: big
  2. Rents to vouchers: yes
  3. Terminated number: 1 or more
  4. Terminated number: 1-5 (coded as 3)
  5. Rents to criminal history: yes
  6. Ever had tenant request disability accommodation: yes
  7. Financial role: Primary income
  8. Tenant income: <=$50,000 (binary)
  9. Raised the rent last year: No
  10. Landlord size: small
# Contributions of variables to PC1
fviz_contrib(bp_mca, choice = "var", axes = 1, top = 20)

Categorical contributions to dimension 2

The most valuable contributions to Dimension 2 are from the following categories (listed here because bar chart is difficult to read):

  1. Tenant income: <=$50,000 (binary)
  2. Tenant income: $25,000-50,000
  3. % raised rent in last year: 0%
  4. Raised the rent last year: No
  5. Word-of-mouth advertising: Yes
  6. Decisions based on standard criteria: No
  7. Tenant income: >$50,000 (binary)
  8. Charge security deposit: Yes
  9. Tenant income: <$25,000
  10. Raised the rent last year: Yes
# Contributions of variables to PC2
fviz_contrib(bp_mca, choice = "var", axes = 2, top = 20)

Individual contributions

This plot shows how cases are distributed along the dimensions. Big and small landlord are well differentiated along dimension 1 (x-axis), while not well differentiated along dimension 2 (y-axis).

fviz_pca_ind(bp_mca,
             label = "none", # hide individual labels
             habillage = df$ll_size, # color by groups
             palette = c("#00AFBB", "#E7B800"),
             addEllipses = TRUE # Concentration ellipses
             )

Comments on MCA #1

Comments from Kyle:

Three general observations to get this conversation rolling:

  1. The two dimensions that you pull out and analyze are not only statistically distinct, but also conceptually distinct. The first dimension seems to be about management practices while the second is a little more of a hodge-podge dominated by financial positionality. This conceptual distinction makes sense to me.

  2. Dimension 2 appears to be of only marginal utility, with an eigenvalue below 1 (thus hard to defend in published work).

  3. Nevertheless, it is really interesting that large and small landlords are highly differentiated on the first dimension but not so much on the second. This might hint at an important challenge to popular assumptions about small landlords.

In terms of next steps, you might try running an MCA on just the variables that load highly on Dimension 1 to get a sense of how tightly they fit together without the muddy waters of the less central variables, and give us a sense of whether creating an index of permissive business practices might be worthwhile. This is essentially shifting us from an exploratory to a confirmatory factor analysis.

MCA #2

This analysis focuses on all of the variables that loaded on the 1st dimension of MCA #1 (per Kyle’s suggestions, please see above). I selected these variables only, and ran another MCA.

The objective of this analysis is to see which variables fit closely together and to see if we should create an index of some of the business practices.

The variables kept for this analysis were:

  1. Landlord size
  2. Rents to vouchers
  3. Ever terminated
  4. Terminated number
  5. Tenant income
  6. Rents to criminal history
  7. Settle terminations in court
  8. % Raised the rent last year
  9. Financial role
  10. Tenant ever requested disability accommodation
df_2<- df %>% select(ll_size,
                    "Financial_role_",
                    "Rent_to_vouchers",
                    "Rent_to_criminal",
                    "Raised_rent_pct",
                     "Ever_terminated" ,
                     "Num_of_terminations",
                    "Disability_accom.",
                     "Income_estimate")

#                     "Terminated_use_courts",
cats = apply(df_2, 2, function(x) nlevels(as.factor(x)))

bp_mca_2 <- MCA(df_2, graph = FALSE, na="average")

BP dimensions

Here are the Eigen values of the first few dimensions and the percent of variance explained.

eig.val <- get_eigenvalue(bp_mca_2)
head(eig.val)
##       eigenvalue variance.percent cumulative.variance.percent
## Dim.1  0.3225740        15.033827                    15.03383
## Dim.2  0.1721780         8.024495                    23.05832
## Dim.3  0.1249742         5.824524                    28.88285
## Dim.4  0.1220761         5.689458                    34.57230
## Dim.5  0.1192670         5.558538                    40.13084
## Dim.6  0.1147635         5.348649                    45.47949

First dimension explains 15% of variance.

fviz_eig(bp_mca_2)

Here is a plot of variables along the first two dimensions.

  • Number of terminations and ever terminated look like they hang together, we may not need both of them.
  • Terminations, landlord size, and vouchers are the most important variables
  • Interestingly, the two financial vars (raised rent, income) contribute the least to the first dimension.
fviz_mca_var(bp_mca_2, choice = "mca.cor", 
             col.var="darkgreen",
            repel = TRUE, # Avoid text overlapping (slow)
            ggtheme = theme_minimal(),
            xlim = c(0,.6),
            ylim = c(0,.5))

Variable contributions

Variable contributions to dimension 1

Here are the ordered contributions to Dimension 1 from highest to lowest contribution:

  1. Terminated number
  2. Ever terminated
  3. Landlord size
  4. Rents to vouchers
  5. Rents to criminal history
  6. Tenant ever requested disability accommodation
  7. Financial role
  8. Tenant income
  9. % Raised the rent last year
results_2 <- data.frame(bp_mca_2$var$eta2)
results_2[ "vars" ] <- rownames(results_2)

results_2 %>% filter(Dim.1>0.05) %>% 
   ggplot(aes(x=reorder(vars, -Dim.1), y=Dim.1)) +
   geom_point()+
   labs(x="",
        y="",
        title="Contribution to Dimension 1") +
      theme_classic() +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))