Importing Data

library(readr)
ahs <- read_csv("C:/Users/RArev/Desktop/households.csv")

Creating New Variable Type

data2$Income.Class <- data2$Income

Grouping Income Levels

data2$Income.Class[data2$Income <= 25000] <- '1'
data2$Income.Class[data2$Income > 25000 & data2$Income < 45000] <- '2'
data2$Income.Class[data2$Income >= 45000 & data2$Income <= 140000 ] <- '3'
data2$Income.Class[data2$Income > 140000] <- '4'
data3$Income.Class <- as.factor(data3$Income.Class)
data3$Rent <- as.numeric(data3$Rent)
data3$Income <- as.numeric(data3$Income)

Data

head(data3)
## # A tibble: 6 x 5
##   Gender Income  Rent Education Income.Class
##    <dbl>  <dbl> <dbl>     <dbl> <fct>       
## 1      2   3000   600        31 1           
## 2      2   4000   180        31 1           
## 3      1   9000   130        31 1           
## 4      2  27900     0        31 2           
## 5      2  10000   180        32 1           
## 6      2  10000     0        32 1

Research Question & Analysis

For this week’s assignment, I will be continuing with the 2015 American Housing Survey.

The question I will be research is if there exist a gender difference in monthly rent amount between males and females ?

The variabeles that are being used for this weeks analysis are: * HHSEX = Gender (factor) * 1 = male * 2 = female * HINCP = Income (numeric) * RENT = Monthly.Rent (numeric) * Education

length(unique(data3$Education))
## [1] 17

Ecological Analysis

data.a <- data3 %>%
  group_by(Education) %>%
  summarise(mean_s = mean(Rent), mean_p = mean(Gender))
ecoreg <- lm(mean_s~mean_p, data = data.a)
summary(ecoreg)
## 
## Call:
## lm(formula = mean_s ~ mean_p, data = data.a)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -302.22  -48.69   11.71   65.02  136.59 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1112.8      319.2   3.486  0.00332 **
## mean_p        -463.6      209.1  -2.217  0.04249 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 104.6 on 15 degrees of freedom
## Multiple R-squared:  0.2468, Adjusted R-squared:  0.1966 
## F-statistic: 4.915 on 1 and 15 DF,  p-value: 0.04249

The first analysis being conducted is an ecological regression, which an analysis based on educational level. The analysis indicates that there is a significant negative correlation between Rent and Gender. Based on this analysis, we can indicate that males have a monthly rent that is on average 463.6 lower when compared to the monthly rent that females pay.

Complete Pooling Model

cpooling <- lm(Rent ~ Gender, data = data3)
summary(cpooling)
## 
## Call:
## lm(formula = Rent ~ Gender, data = data3)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -396.9 -394.1 -392.9  273.1 6205.9 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  391.248     50.325   7.774 1.35e-14 ***
## Gender         2.824     31.255   0.090    0.928    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 624.5 on 1601 degrees of freedom
## Multiple R-squared:  5.098e-06,  Adjusted R-squared:  -0.0006195 
## F-statistic: 0.008162 on 1 and 1601 DF,  p-value: 0.928

In using the pooling, we can see that there is a significant positive relationship between Rent and Gender. We can state that for males, rent tends to increase by 2.84. Thus far, both the ecological regression and complete pooling methods do present differences in rent amounts between genders.

No-Pooling Intercept Model

dcoef <- data3 %>%
  group_by(Education) %>%
  do(mod = lm(Rent ~ Gender, data= .))
coef <- dcoef %>% do(data.frame(intc = coef( .$mod)[1]))
ggplot(coef, aes(x=intc)) + geom_histogram(fill = "darkgreen") + xlab("Rent according to Gender")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

With using the no-pooling intercept model, we’re grouping by education to determine if there is any variation. The intercept above represents the average cost of rent between genders when grouping by education. Based on the histogram, we can indicate that the average cost of rent among the is relatively low for respondents with low education.

No Pooling Slope Model

dcoef <- data3 %>%
  group_by(Education) %>%
  do(mod = lm(Rent ~ Gender, data = .))
coef <- dcoef %>% do(data.frame(sexc = coef(.$mod)[2]))
ggplot(coef, aes(x = sexc)) + geom_histogram(fill = 'darkgreen') + xlab("Difference in low and high rent according to Gender Education Level")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The no-pooling slope model presents the regression for low and high rent differences bwtween gender and education level. This model suggest that there is a $500 difference between rend. In other words, on average, people with lower education can be expected to pay 500 less than there higher education counter-parts.

Random Intercept Model

library(nlme)
m1 <- lme(Rent ~ Gender, data = data3, random = ~ 1|Education, method ="ML")
summary(m1)
## Linear mixed-effects model fit by maximum likelihood
##  Data: data3 
##        AIC      BIC    logLik
##   25191.59 25213.11 -12591.79
## 
## Random effects:
##  Formula: ~1 | Education
##         (Intercept) Residual
## StdDev:    21.34911 623.7095
## 
## Fixed effects: Rent ~ Gender 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 393.4917  50.98179 1585 7.718279  0.0000
## Gender        3.9267  31.28595 1585 0.125509  0.9001
##  Correlation: 
##        (Intr)
## Gender -0.94 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.6592620 -0.6382389 -0.6040251  0.4403737  9.9326770 
## 
## Number of Observations: 1603
## Number of Groups: 17

USing the random Intercept Model, we can state that the standard deviation for males with low education level is 21.35. the data from the random intercept model further states that rent for males with low education is on average 393.5 while the rent for males with higer edcation is on average 3.9 higher.

Random Slope model

m2 <- lme(Rent ~ Gender, data = data3, random = ~ Gender|Education, method = "ML", control = lmeControl(returnObject=TRUE))
summary(m2)
## Linear mixed-effects model fit by maximum likelihood
##  Data: data3 
##       AIC      BIC    logLik
##   25194.7 25226.97 -12591.35
## 
## Random effects:
##  Formula: ~Gender | Education
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept)  51.17915 (Intr)
## Gender       50.19566 -1    
## Residual    623.07310       
## 
## Fixed effects: Rent ~ Gender 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 383.0838  53.30271 1585 7.186948  0.0000
## Gender       10.9097  35.75525 1585 0.305122  0.7603
##  Correlation: 
##        (Intr)
## Gender -0.945
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.7296488 -0.6322641 -0.5722483  0.4371034  9.9598823 
## 
## Number of Observations: 1603
## Number of Groups: 17

For the random slope model, we can see that the rent for males with lowe education is 383 with a standard devaition of 51.18 for low education. For males with high, the rent across education levels is 10.91 higher, with a standard deviation of 50.2. Both the slope and the intercept have a negative correlation of -.95 which can indicate that the rent between low educated males tends to be low and the rent for higher educated males tends to be high.

Comparing Models

AIC(cpooling, m1, m2)
##          df      AIC
## cpooling  3 25189.92
## m1        4 25191.59
## m2        6 25194.70

Based on the results for the AIC, w, the ecological regresssion seemes like the best fit for the data. As previously mentioned, the ecological regression mentions that females have a monthly rent that is on average 463.6 lower when compared to the monthly rent that males pay.

The Intra-Class Correlation - Is rent determined by gender or education?

m0 <- lme(Rent ~ 1, random = ~ 1|Education, data = data3, method = "ML")
summary(m0)
## Linear mixed-effects model fit by maximum likelihood
##  Data: data3 
##        AIC      BIC   logLik
##   25189.61 25205.74 -12591.8
## 
## Random effects:
##  Formula: ~1 | Education
##         (Intercept) Residual
## StdDev:    21.18852 623.7169
## 
## Fixed effects: Rent ~ 1 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 399.4591  17.42984 1586 22.91812       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.6560695 -0.6413460 -0.6077210  0.4404050  9.9295071 
## 
## Number of Observations: 1603
## Number of Groups: 17

21.18852/(21.18852+623.7169)=0.03285523635 About 3.28% of the total variation in rent can be attributed to educational level while the remaining 96.71% can be attributed to gender differences (individual level).

Confidence Interval

intervals(m0)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                lower     est.    upper
## (Intercept) 365.2819 399.4591 433.6364
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: Education 
##                    lower     est.    upper
## sd((Intercept)) 2.477322 21.18852 181.2252
## 
##  Within-group standard error:
##    lower     est.    upper 
## 602.4622 623.7169 645.7215

After conducting my analysis of the data, I can therefore conclude that there does exist a gender difference between monthly rent amounts when analyzed at the individual level. Even though the analysis was also done by grouping for education levels, the anaylsis indicated there being little variation providing more emphasis on the gender differences overall which is surpring as I was expecting males to pay more in rent than their female counter parts.