library(readr)
ahs <- read_csv("C:/Users/RArev/Desktop/households.csv")
data2$Income.Class <- data2$Income
data2$Income.Class[data2$Income <= 25000] <- '1'
data2$Income.Class[data2$Income > 25000 & data2$Income < 45000] <- '2'
data2$Income.Class[data2$Income >= 45000 & data2$Income <= 140000 ] <- '3'
data2$Income.Class[data2$Income > 140000] <- '4'
data3$Income.Class <- as.factor(data3$Income.Class)
data3$Rent <- as.numeric(data3$Rent)
data3$Income <- as.numeric(data3$Income)
head(data3)
## # A tibble: 6 x 5
## Gender Income Rent Education Income.Class
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 2 3000 600 31 1
## 2 2 4000 180 31 1
## 3 1 9000 130 31 1
## 4 2 27900 0 31 2
## 5 2 10000 180 32 1
## 6 2 10000 0 32 1
For this week’s assignment, I will be continuing with the 2015 American Housing Survey.
The question I will be research is if there exist a gender difference in monthly rent amount between males and females ?
The variabeles that are being used for this weeks analysis are: * HHSEX = Gender (factor) * 1 = male * 2 = female * HINCP = Income (numeric) * RENT = Monthly.Rent (numeric) * Education
length(unique(data3$Education))
## [1] 17
data.a <- data3 %>%
group_by(Education) %>%
summarise(mean_s = mean(Rent), mean_p = mean(Gender))
ecoreg <- lm(mean_s~mean_p, data = data.a)
summary(ecoreg)
##
## Call:
## lm(formula = mean_s ~ mean_p, data = data.a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -302.22 -48.69 11.71 65.02 136.59
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1112.8 319.2 3.486 0.00332 **
## mean_p -463.6 209.1 -2.217 0.04249 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 104.6 on 15 degrees of freedom
## Multiple R-squared: 0.2468, Adjusted R-squared: 0.1966
## F-statistic: 4.915 on 1 and 15 DF, p-value: 0.04249
The first analysis being conducted is an ecological regression, which an analysis based on educational level. The analysis indicates that there is a significant negative correlation between Rent and Gender. Based on this analysis, we can indicate that males have a monthly rent that is on average 463.6 lower when compared to the monthly rent that females pay.
cpooling <- lm(Rent ~ Gender, data = data3)
summary(cpooling)
##
## Call:
## lm(formula = Rent ~ Gender, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -396.9 -394.1 -392.9 273.1 6205.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 391.248 50.325 7.774 1.35e-14 ***
## Gender 2.824 31.255 0.090 0.928
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 624.5 on 1601 degrees of freedom
## Multiple R-squared: 5.098e-06, Adjusted R-squared: -0.0006195
## F-statistic: 0.008162 on 1 and 1601 DF, p-value: 0.928
In using the pooling, we can see that there is a significant positive relationship between Rent and Gender. We can state that for males, rent tends to increase by 2.84. Thus far, both the ecological regression and complete pooling methods do present differences in rent amounts between genders.
dcoef <- data3 %>%
group_by(Education) %>%
do(mod = lm(Rent ~ Gender, data= .))
coef <- dcoef %>% do(data.frame(intc = coef( .$mod)[1]))
ggplot(coef, aes(x=intc)) + geom_histogram(fill = "darkgreen") + xlab("Rent according to Gender")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
With using the no-pooling intercept model, we’re grouping by education to determine if there is any variation. The intercept above represents the average cost of rent between genders when grouping by education. Based on the histogram, we can indicate that the average cost of rent among the is relatively low for respondents with low education.
dcoef <- data3 %>%
group_by(Education) %>%
do(mod = lm(Rent ~ Gender, data = .))
coef <- dcoef %>% do(data.frame(sexc = coef(.$mod)[2]))
ggplot(coef, aes(x = sexc)) + geom_histogram(fill = 'darkgreen') + xlab("Difference in low and high rent according to Gender Education Level")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The no-pooling slope model presents the regression for low and high rent differences bwtween gender and education level. This model suggest that there is a $500 difference between rend. In other words, on average, people with lower education can be expected to pay 500 less than there higher education counter-parts.
library(nlme)
m1 <- lme(Rent ~ Gender, data = data3, random = ~ 1|Education, method ="ML")
summary(m1)
## Linear mixed-effects model fit by maximum likelihood
## Data: data3
## AIC BIC logLik
## 25191.59 25213.11 -12591.79
##
## Random effects:
## Formula: ~1 | Education
## (Intercept) Residual
## StdDev: 21.34911 623.7095
##
## Fixed effects: Rent ~ Gender
## Value Std.Error DF t-value p-value
## (Intercept) 393.4917 50.98179 1585 7.718279 0.0000
## Gender 3.9267 31.28595 1585 0.125509 0.9001
## Correlation:
## (Intr)
## Gender -0.94
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.6592620 -0.6382389 -0.6040251 0.4403737 9.9326770
##
## Number of Observations: 1603
## Number of Groups: 17
USing the random Intercept Model, we can state that the standard deviation for males with low education level is 21.35. the data from the random intercept model further states that rent for males with low education is on average 393.5 while the rent for males with higer edcation is on average 3.9 higher.
m2 <- lme(Rent ~ Gender, data = data3, random = ~ Gender|Education, method = "ML", control = lmeControl(returnObject=TRUE))
summary(m2)
## Linear mixed-effects model fit by maximum likelihood
## Data: data3
## AIC BIC logLik
## 25194.7 25226.97 -12591.35
##
## Random effects:
## Formula: ~Gender | Education
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 51.17915 (Intr)
## Gender 50.19566 -1
## Residual 623.07310
##
## Fixed effects: Rent ~ Gender
## Value Std.Error DF t-value p-value
## (Intercept) 383.0838 53.30271 1585 7.186948 0.0000
## Gender 10.9097 35.75525 1585 0.305122 0.7603
## Correlation:
## (Intr)
## Gender -0.945
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.7296488 -0.6322641 -0.5722483 0.4371034 9.9598823
##
## Number of Observations: 1603
## Number of Groups: 17
For the random slope model, we can see that the rent for males with lowe education is 383 with a standard devaition of 51.18 for low education. For males with high, the rent across education levels is 10.91 higher, with a standard deviation of 50.2. Both the slope and the intercept have a negative correlation of -.95 which can indicate that the rent between low educated males tends to be low and the rent for higher educated males tends to be high.
AIC(cpooling, m1, m2)
## df AIC
## cpooling 3 25189.92
## m1 4 25191.59
## m2 6 25194.70
Based on the results for the AIC, w, the ecological regresssion seemes like the best fit for the data. As previously mentioned, the ecological regression mentions that females have a monthly rent that is on average 463.6 lower when compared to the monthly rent that males pay.
m0 <- lme(Rent ~ 1, random = ~ 1|Education, data = data3, method = "ML")
summary(m0)
## Linear mixed-effects model fit by maximum likelihood
## Data: data3
## AIC BIC logLik
## 25189.61 25205.74 -12591.8
##
## Random effects:
## Formula: ~1 | Education
## (Intercept) Residual
## StdDev: 21.18852 623.7169
##
## Fixed effects: Rent ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 399.4591 17.42984 1586 22.91812 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.6560695 -0.6413460 -0.6077210 0.4404050 9.9295071
##
## Number of Observations: 1603
## Number of Groups: 17
21.18852/(21.18852+623.7169)=0.03285523635 About 3.28% of the total variation in rent can be attributed to educational level while the remaining 96.71% can be attributed to gender differences (individual level).
intervals(m0)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 365.2819 399.4591 433.6364
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: Education
## lower est. upper
## sd((Intercept)) 2.477322 21.18852 181.2252
##
## Within-group standard error:
## lower est. upper
## 602.4622 623.7169 645.7215
After conducting my analysis of the data, I can therefore conclude that there does exist a gender difference between monthly rent amounts when analyzed at the individual level. Even though the analysis was also done by grouping for education levels, the anaylsis indicated there being little variation providing more emphasis on the gender differences overall which is surpring as I was expecting males to pay more in rent than their female counter parts.