coded_df <- readRDS("C:/Users/court/Google Drive/Research/SRHS/SRHS_coded.RDS")
coded_df <- coded_df %>%
  mutate_if(
        is.labelled,
        ~(as_factor(.)))

df <- coded_df %>% select(
   ll_finrole2,
   ll_size,
   ll_comm_unit,
   bp_advert_b,
   bp_screen_b,
   bp_dec_flex_b,
   bp_dec_standard_b,
   bp_dec_offer_b,
   bp_vouch_b,
   bp_crim_b,
   bp_dis_b,
   bp_keep_pct_num,
   bp_install_b,
   bp_comm_unit_fee_app_b,
   bp_comm_unit_fee_sec_b,
   bp_comm_unit_fee_clean_b,
   bp_comm_unit_fee_park_b,
   bp_fair_b,
   bp_comm_unit_fee_incl_b,
   bp_comm_unit_deposit_b,
   bp_rent_raise_b,
   bp_rent_raise_num,
   bp_terminate_b,
   bp_terminate_num,
   bp_income_b,
   bp_income_num,
   ll_remain_1or5yr_b,
   ll_remain_reg
   )
#    bp_court_b,

   # bp_keep_rsn,
   # bp_terminate_paylate,
   # bp_terminate_pay,
   # bp_rent_raise_rising,
   # bp_rent_raise_regulation,
   # bp_rent_raise_market,
   # bp_rent_raise_s_tax,
   # bp_rent_raise_s_repair,
   # bp_rent_raise_s_new,
   # bp_rent_raise_s_market,
   # bp_rent_raise_s_reg,

df <- df %>% mutate(
  ll_size = relevel(as.factor(ll_size), ref="small"),
  ll_comm_unit = relevel(as.factor(ll_comm_unit), ref="2 bedroom"),
  ll_finrole2 = as.factor(ll_finrole2)
)

df <- df %>% rename(
                    "Financial_role_" = ll_finrole2,
                    "Common_unit_" = ll_comm_unit,
                    "Advertising" = bp_advert_b,
                    "Screening" = bp_screen_b,
                    "Flexible_decision" = bp_dec_flex_b,
                    "Standard_criteria" = bp_dec_standard_b,
                    "Offers_to_nonstandard" = bp_dec_offer_b,
                    "Rent_to_vouchers" = bp_vouch_b,
                    "Rent_to_criminal" = bp_crim_b,
                    "Disability_accom." = bp_dis_b,
                    "Deposit_pct_kept" = bp_keep_pct_num,
                    "Allow_installments" = bp_install_b,
                    "Fair_market_price" = bp_fair_b,
                    "App_fee"   =   bp_comm_unit_fee_app_b,
                    "Security_deposit" =    bp_comm_unit_fee_sec_b,
                    "Cleaning_fee"  =   bp_comm_unit_fee_clean_b,
                    "Parking_fee"   =   bp_comm_unit_fee_park_b,
                    "Utilities_included" = bp_comm_unit_fee_incl_b,
                    "Last_month_required" = bp_comm_unit_deposit_b,
                    "Raised_rent" = bp_rent_raise_b,
                    "Raised_rent_pct" = bp_rent_raise_num,
                     "Ever_terminated" = bp_terminate_b,
                     "Num_of_terminations" = bp_terminate_num,
                     "Income_above_avg_Seattle" = bp_income_b,
                     "Income_estimate" = bp_income_num,
                     "Still_mngr1_5yrs" = ll_remain_1or5yr_b,
                     "Stop_burden_regs" = ll_remain_reg)
                   
#                      "Terminated_use_courts" = bp_court_b,


# --------------------------

df <- df %>%
  mutate_all(as.factor)

Outline

cats = apply(df, 2, function(x) nlevels(as.factor(x)))

bp_mca <- MCA(df, graph = FALSE, na="average")

Below are two MCA analyses:

MCA #1: The first analysis was done for our meeting on November 19, 2021. This was an MCA on all business practices
MCA #2: The second analysis was done for our meeting on December 2, 2021. This was an MCA on the variables that loaded on Dimension 1 in the results of MCA #1.

Note: I have included Kyle’s comments on MCA #1 in this document to keep everyone up to date. Also, there is one change I’ve made to all analyses below: I’ve taken out the variable ‘terminated through courts’ as this had missing values for most respondents who never terminated. This created minor changes to the results, but overall things remained relatively unchanged.

MCA #1

I did a Multiple Correspondence Analysis (MCA) on all business practices. MCA is similar to PCA but for categorical data.

Some business practices were left out if they weren’t asked of all/most respondents (e.g. “If you raised the rent, why?”). Some landlord characteristics were also left in this analysis, including landlord size and financial role.

BP dimensions

Here are the Eigen values of the first few dimensions and the percent of variance explained.

eig.val <- get_eigenvalue(bp_mca)
head(eig.val)

##       eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.14402592         7.762393                    7.762393
## Dim.2 0.09478067         5.108281                   12.870674
## Dim.3 0.07604360         4.098431                   16.969105
## Dim.4 0.07135754         3.845872                   20.814977
## Dim.5 0.06448769         3.475616                   24.290593
## Dim.6 0.06127000         3.302196                   27.592789

First dimension explains 8% of variance.

fviz_eig(bp_mca)

# column coordinates
#head(bp_mca$var$coord)

# data frames for ggplot
# bp_mca_cats_df = data.frame(bp_mca$var$coord, Variable = rep(names(cats), cats))
# bp_mca_vars_df = data.frame(bp_mca$var)
# 
# bp_mca_obs_df = data.frame(bp_mca$ind$coord)
# # plot of variable categories
# ggplot(data = bp_mca_vars_df, 
#        aes(x = Dim.1, 
#            y = Dim.2, 
#            label = rownames(bp_mca_vars_df))) +
#    geom_hline(yintercept = 0, colour = "gray70") +
#    geom_vline(xintercept = 0, 
#               colour = "gray70") +
#    geom_point(label=rownames(bp_mca_vars_df)) +
#    theme(legend.position = "none") +
#    ggtitle("MCA plot of variables using R package FactoMineR")

Here is a quick plot of variables along the first two dimensions. You can see the variables that contribute the most to these dimensions, such as landlord size (look at x-axis = 0.6) and renter income (y axis = 0.5).

fviz_mca_var(bp_mca, choice = "mca.cor", 
             col.var="darkgreen",
            repel = TRUE, # Avoid text overlapping (slow)
            ggtheme = theme_minimal(),
            xlim = c(0,.6),
            ylim = c(0,.5))

Variable contributions

Variable contributions to dimension 1

The most valuable contributions to Dimension 1 are from the following categories (listed here because bar chart is difficult to read):

Landlord size
Rents to vouchers
Terminated number
Tenant income
Ever terminated
Rents to criminal history
% Raised the rent last year
Tenant income above avg. HH ($50,000)
Ever Raised rent in last year
Financial role
Tenant ever requestested disability accommodation

results <- data.frame(bp_mca$var$eta2)
results[ "vars" ] <- rownames(results)

results %>% filter(Dim.1>0.05) %>% 
   ggplot(aes(x=reorder(vars, -Dim.1), y=Dim.1)) +
   geom_point()+
   labs(x="",
        y="",
        title="Contribution to Dimension 1") +
      theme_classic() +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))

Variable contributions to dimension 2

The most valuable contributions to Dimension 2 are from the following categories (listed here because bar chart is difficult to read):

Tenant income
Tenant income above avg. HH ($50,000)
% Raised rent in last year
Ever Raised rent in last year
Decisions made on standard criteria
Word-of-mouth advertising
Screening
Security deposit
Rent above fair market price
% Deposit kept
Flexible lease decisions

results %>% filter(Dim.2>0.05) %>% 
   ggplot(aes(x=reorder(vars, -Dim.2), y=Dim.2)) +
   geom_point()+
   labs(x="",
        y="",
        title="Contribution to Dimension 2") +
      theme_classic() +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))

Categorical contributions

Categorical contributions to dimension 1

The most valuable contributions to Dimension 1 are from the following categories (listed here because bar chart is difficult to read):

Landlord size: big
Rents to vouchers: yes
Terminated number: 1 or more
Terminated number: 1-5 (coded as 3)
Rents to criminal history: yes
Ever had tenant request disability accommodation: yes
Financial role: Primary income
Tenant income: <=$50,000 (binary)
Raised the rent last year: No
Landlord size: small

# Contributions of variables to PC1
fviz_contrib(bp_mca, choice = "var", axes = 1, top = 20)

Categorical contributions to dimension 2

The most valuable contributions to Dimension 2 are from the following categories (listed here because bar chart is difficult to read):

Tenant income: <=$50,000 (binary)
Tenant income: $25,000-50,000
% raised rent in last year: 0%
Raised the rent last year: No
Word-of-mouth advertising: Yes
Decisions based on standard criteria: No
Tenant income: >$50,000 (binary)
Charge security deposit: Yes
Tenant income: <$25,000
Raised the rent last year: Yes

# Contributions of variables to PC2
fviz_contrib(bp_mca, choice = "var", axes = 2, top = 20)

Individual contributions

This plot shows how cases are distributed along the dimensions. Big and small landlord are well differentiated along dimension 1 (x-axis), while not well differentiated along dimension 2 (y-axis).

fviz_pca_ind(bp_mca,
             label = "none", # hide individual labels
             habillage = df$ll_size, # color by groups
             palette = c("#00AFBB", "#E7B800"),
             addEllipses = TRUE # Concentration ellipses
             )

Comments on MCA #1

Comments from Kyle:

Three general observations to get this conversation rolling:

The two dimensions that you pull out and analyze are not only statistically distinct, but also conceptually distinct. The first dimension seems to be about management practices while the second is a little more of a hodge-podge dominated by financial positionality. This conceptual distinction makes sense to me.
Dimension 2 appears to be of only marginal utility, with an eigenvalue below 1 (thus hard to defend in published work).
Nevertheless, it is really interesting that large and small landlords are highly differentiated on the first dimension but not so much on the second. This might hint at an important challenge to popular assumptions about small landlords.

In terms of next steps, you might try running an MCA on just the variables that load highly on Dimension 1 to get a sense of how tightly they fit together without the muddy waters of the less central variables, and give us a sense of whether creating an index of permissive business practices might be worthwhile. This is essentially shifting us from an exploratory to a confirmatory factor analysis.

MCA #2

This analysis focuses on all of the variables that loaded on the 1st dimension of MCA #1 (per Kyle’s suggestions, please see above). I selected these variables only, and ran another MCA.

The objective of this analysis is to see which variables fit closely together and to see if we should create an index of some of the business practices.

The variables kept for this analysis were:

Landlord size
Rents to vouchers
Ever terminated
Terminated number
Tenant income
Rents to criminal history
Settle terminations in court
% Raised the rent last year
Financial role
Tenant ever requested disability accommodation

df_2<- df %>% select(ll_size,
                    "Financial_role_",
                    "Rent_to_vouchers",
                    "Rent_to_criminal",
                    "Raised_rent_pct",
                     "Ever_terminated" ,
                     "Num_of_terminations",
                    "Disability_accom.",
                     "Income_estimate")

#                     "Terminated_use_courts",

cats = apply(df_2, 2, function(x) nlevels(as.factor(x)))

bp_mca_2 <- MCA(df_2, graph = FALSE, na="average")

BP dimensions

Here are the Eigen values of the first few dimensions and the percent of variance explained.

eig.val <- get_eigenvalue(bp_mca_2)
head(eig.val)

##       eigenvalue variance.percent cumulative.variance.percent
## Dim.1  0.3225740        15.033827                    15.03383
## Dim.2  0.1721780         8.024495                    23.05832
## Dim.3  0.1249742         5.824524                    28.88285
## Dim.4  0.1220761         5.689458                    34.57230
## Dim.5  0.1192670         5.558538                    40.13084
## Dim.6  0.1147635         5.348649                    45.47949

First dimension explains 15% of variance.

fviz_eig(bp_mca_2)

Here is a plot of variables along the first two dimensions.

Number of terminations and ever terminated look like they hang together, we may not need both of them.
Terminations, landlord size, and vouchers are the most important variables
Interestingly, the two financial vars (raised rent, income) contribute the least to the first dimension.

fviz_mca_var(bp_mca_2, choice = "mca.cor", 
             col.var="darkgreen",
            repel = TRUE, # Avoid text overlapping (slow)
            ggtheme = theme_minimal(),
            xlim = c(0,.6),
            ylim = c(0,.5))

Variable contributions

Variable contributions to dimension 1

Here are the ordered contributions to Dimension 1 from highest to lowest contribution:

Terminated number
Ever terminated
Landlord size
Rents to vouchers
Rents to criminal history
Tenant ever requested disability accommodation
Financial role
Tenant income
% Raised the rent last year

results_2 <- data.frame(bp_mca_2$var$eta2)
results_2[ "vars" ] <- rownames(results_2)

results_2 %>% filter(Dim.1>0.05) %>% 
   ggplot(aes(x=reorder(vars, -Dim.1), y=Dim.1)) +
   geom_point()+
   labs(x="",
        y="",
        title="Contribution to Dimension 1") +
      theme_classic() +
   theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))

SRHS Data Reduction

Courtney Allen

02 December, 2021

Outline

MCA #1

BP dimensions

Variable contributions

Variable contributions to dimension 1

Variable contributions to dimension 2

Categorical contributions

Categorical contributions to dimension 1

Categorical contributions to dimension 2

Individual contributions

Comments on MCA #1

MCA #2

BP dimensions

Variable contributions

Variable contributions to dimension 1