| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max |
| MedianHomeValue2000 | 1,280 | 105,661 | 154,884 | 187,119 | 224,337 | 1,288,551 |
| MedianHomeValue2010 | 9,999 | 123,200 | 193,200 | 246,570 | 312,000 | 1,000,001 |
| MHV.Change.00.to.10 | -1,228,651 | 7,187 | 36,268 | 60,047 | 94,881 | 1,000,001 |
| MHV.Growth.00.to.12 | -97 | 6 | 25 | 34 | 50 | 17,494 |
# Part 1 - Select Three IVs
Hypothesis: Higher levels of college-educated individuals in 2000 will predict a larger increase in home value between 2000 and 2010.
Hypothesis: Higher levels of vacant housing units in 2000 will predict a larger increase in home value between 2000 and 2010.
Hypothesis: Higher levels of employment in professional fields in 2000 will predict a larger increase in home value between 2000 and 2010.
I have decided to apply the log10 transformation to each variable. This choice facilitates interpretation and also significantly reduces the skew in the distributions. Although some skewness remains, the transformed variables exhibit a more acceptable degree of skewness compared to their original distributions.
| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max |
| p.col | 0.00 | 11.88 | 21.11 | 25.67 | 35.89 | 100.00 |
| p.vacant | 0.00 | 3.10 | 5.11 | 7.17 | 8.53 | 100.00 |
| p.prof | 0.00 | 22.69 | 31.35 | 33.55 | 42.87 | 100.00 |
| Dependent variable: | |||
| mhv.growth | |||
| (1) | (2) | (3) | |
| Log College Graduates | 2.67*** | 2.34** | |
| (0.46) | (1.15) | ||
| Log Professional Occupations | 3.67*** | ||
| (0.55) | |||
| Log Vacant Housing | 4.27*** | 2.45 | |
| (0.74) | (1.84) | ||
| Constant | 25.92*** | 23.04*** | 19.72*** |
| (0.63) | (1.11) | (1.65) | |
| Observations | 58,822 | 58,797 | 58,788 |
| Adjusted R2 | 0.001 | 0.001 | 0.001 |
| Residual Std. Error | 35.17 (df = 58820) | 35.16 (df = 58795) | 35.15 (df = 58784) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
There is a strong positive correlation between the percentage of college graduates (log_col) and the percentage of professional workers (log_prof) with a correlation of 0.91. This high correlation suggests that multicollinearity may be a concern when including both variables in a regression model.
In Model 3 the coefficients of both variables (log_col and log_prof) are smaller compared to Model 1 and Model 2. This may be a result of the high correlation between these variables causing them to partially cancel each other out. Even more of concern is that in model 3 log_prof is no longer significant.
The standard errors of the coefficients in Model 3 are larger than in Model 1 and Model 2, indicating increased uncertainty in the estimates when both variables are included.
The adjusted R-squared value does not increase substantially when both variables are included in Model 3, suggesting that the addition of the second correlated variable does not improve the model significantly.
The residual standard error remains almost unchanged across the models.
Given these results, multicollienearity is most likely an issue with this model. This can be addressed by choosing one variable that best captures the underlying construct or creating an index where both variables are combined.
I don’t see evidence of a strong linear or a strong non-linear relationship between these variables. It does look as though a small relationship exists between vacancies and MHV.Growth.
| Statistic | Min | Pctl(25) | Median | Mean | Pctl(75) | Max |
| reg.data.mhv.00 | 1,280 | 105,661 | 154,884 | 187,119 | 224,337 | 1,288,551 |
| reg.data.mhv.10 | 9,999 | 123,200 | 193,200 | 246,570 | 312,000 | 1,000,001 |
| reg.data.mhv.change | -1,228,651 | 7,187 | 36,268 | 60,047 | 94,881 | 1,000,001 |
| reg.data.mhv.growth | -97 | 6 | 25 | 29 | 49 | 200 |
| reg.data.log_col | 0 | 1 | 1 | 1 | 2 | 2 |
| reg.data.log_vacant | 0 | 1 | 1 | 1 | 1 | 2 |
| reg.data.log_prof | 0 | 1 | 2 | 1 | 2 | 2 |
$36,267.8
$1,000,001
When using the reg.data dataset the coefficient is 0.78, indicating a
strong positive relationship between the change in home value and the
growth in home value between 2000 and 2010. If you use the d$ dataset,
the correlation coefficient is 0.54, which suggests a moderate positive
relationship. This d dataset includes growth rates over
200%. I went with reg.data because it made sense to me to use the
dataset that omitted outliers.
Although the two variables have a relatively strong correlation, they do not necessarily measure the same thing. The change in home value represents the difference between home values in 2000 and 2010, while the growth in home value represents the percentage increase or decrease in home values during the same period. The correlation suggests that there is a relationship between the change and growth in home values, but they are not identical measures.
(Despite there being a likely issue with multicollinearity, I’m keeping the log_prof in the model for explainatory purposes)
| Dependent variable: | ||
| mhv.change | mhv.growth | |
| Model 1: Change in Median Home Value | Model 2: Median Home Value Growth | |
| (1) | (2) | |
| Log College Graduates | 1,554.83 (3,951.38) | 19.72*** (1.65) |
| Log Vacant Housing | 51,011.43*** (2,740.31) | 2.34** (1.15) |
| Log Professional Occupations | -19,708.01*** (1,361.62) | 3.67*** (0.55) |
| log_prof | 4,565.85 (4,327.64) | 2.45 (1.84) |
| Observations | 59,500 | 58,788 |
| R2 | 0.05 | 0.001 |
| Adjusted R2 | 0.05 | 0.001 |
| Residual Std. Error | 88,432.31 (df = 59496) | 35.15 (df = 58784) |
| F Statistic | 958.51*** (df = 3; 59496) | 26.79*** (df = 3; 58784) |
| Note: | p<0.1; p<0.05; p<0.01 | |
Yes, two of the variables, the number of college graduates (log_col) and vacant housing (log_vacant), predicted changes to home value and home value growth in a meaningful and statistically significant way.
In the first model (change in median home value), the number of college graduates (log_col) had the largest impact. In the second model (median home value growth), vacant housing (log_vacant) had the largest impact.
The number of college graduates and vacant housing significantly influenced changes in home values and home value growth, while the number of professional occupations did not have a significant effect.
So, 2/3 predictions were correct.
This analysis explored the relationship between the change in median home value and the growth in median home value with three variables: % of college graduates, % of vacant housing, and % of professional occupations. The % of college graduates and % of vacant housing significantly influenced changes in home values and home value growth.
In the first analysis (model 1), areas with more college graduates experienced a greater increase in median home value. In contrast, areas with more vacant housing saw a smaller increase or even a decrease in median home value. In the second analysis (model 2), the growth of median home values was also significantly influenced by the number of college graduates and vacant housing. However, the number of vacant houses had a more substantial impact on home value growth in this model.
The number of professional occupations did not have a significant effect on either change in home values or home value growth.
I will be calculating the effect sizes using the full models
model1 <- lm(mhv.change ~ log_col + log_vacant + log_prof, data = reg.data)
and
model2 <- lm(mhv.growth ~ log_col + log_vacant + log_prof, data = reg.data)
as to interpret the relative importance of each variable in predicting
the outcome.
| Model | Log College Graduates | Log Vacant Housing | Log Professional Occupations |
|---|---|---|---|
| Change in Median Home Value | 23315.80752 | -7215.335742 | 1222.3555446 |
| Median Home Value Growth | 1.06811 | 1.343253 | 0.6547517 |
I model 1, college graduates has the largest effect size, indicating that it has the most significant impact on the outcome. In model 2, vacant housing has the largest effect size, and the log of professional occupations has the smallest effect size.