Introduction

Your introduction should do three things: 1) clearly state your argument; 2) introduce the main pieces of evidence for your argument; 3) provide a roadmap of your paper.

In this paper I will argue that there is a strong positive association between median income and the representation of females in the legislature in the 50 states. I construct a model to test incomes’s relationship with female legislators, controlling for student spending, new immigration, and population density. Estimates from that model show there is a strong positive relationship between income and female political representation. I also find evidence that indicates education does not deter all crime. Policies to combat crime, therefore, have to take into consideration the important differences that exist between different kinds of crime.

In the first section (theory) I will lay out the theory used to construct my model. In section two (analysis), I estimate the model, conduct diagnostic tests, and examine an important empirical implication of my findings from the first regression. I then discuss the implication my findings and then state my conclusions.

Theory

I posit there are 4 main causal factors that help explain the variation in female political representation among the 50 states. Education spending is an important factor in accounting for female representation in the legislature. With higher spending on students, more women will be able to acquire the education necessary to be considered a reasonable candidate for office. With less educational opprotunities, less women will have the chance to climb the educational ladder to gain a seat in the legislature. Income is another variable that should be included in the model. Women from more well off backgrounds again will have more ample access to the resources needed to get into a career in politics. It also seems reasonable that states wherein many citizens aren’t in constant fear of their financial security would be more likely to see unequal representation as a bigger issue and begin to want to address it as opposed to low income states wherein more people are just worried about their own security. Population density is another important variable to account for in the model. In more densely populated and area is, the more likely a woman with impressive qualifications and an attitude/platform that appeals to voters will exist. Naturally there will be more competition for seats in densely populated districts, and with more people naturally it’s more likely that one of those challengers is a woman. Finally, I argue that the amount of new immigration will have a positive affect on female legislators. The tendency for female politicians to be more Democratic, and progressive on immigration issues, means that new immigrants are probably more likely to vote for them, and the more immigration there is, the greater the impact that vote will be on the elections. My basic model is as follows:

Female Political Representation = f(education spening, income, population density, new immigration)

Analysis

Description of Data

To orient the reader, I generated a histogram (Figure 1) of female political representation in the legislature to show it’s distribution. As can be observed from the histogram, the percentage of female representation in the legislature ranges from about 8% to about 38% among the 50 states. The distribution seems to be bimodal. The modes seem to fall between 14% and 19% and 29% and 34%. The data also seems to be skewed left, meaning the mean will be less than the median. I’d predict the mean to be closer to 20% and the median closer to 25%. A quick look at a scatterplot between representation and median income (Figure 2) indicate that Maryland has the highest percent at about 35.5%. South Carolina has the lowest percentage and shares low levels of political representation for women with states like Kentucky and Alabama. The scatterplot also shows there is a strong positive relationship between median income and female political representation. This confirms the theory articulated above.

ggplot(states, aes(femleg)) + geom_histogram(bins=6, colour = "blue", fill = "white") +
   labs(title = paste("Figure 1: Histogram of Female Legislators")) +
  xlab("Percentage of Legislature That's Female")

ggplot(states, aes(medinc, femleg)) + geom_point() +
  geom_text(size=3, aes(label = st, size = 1, hjust = 0, vjust=-1), 
                           show.legend=FALSE) +
  labs(title = paste("Income Increases Female Legislators")) +
  geom_smooth(method="lm", se=F, fullrange=F) + theme_minimal() + 
  ylab("Percent Legislature Female") +
  xlab("State Median Income")

Estimation

Armed with the data and model let’s proceed to estimate the relationship between income and female political representation, controlling for education spending, population density, and new immigration.

The regression results (Table 1) indicate that after having controlled for education spending, income, and population density, the median income (medinc) seems to yield the strongest estimates. For every $1 increase in median income, there’s a .0005304 percentage point increase in femal representation, holding the other variables constant. In other words, for every $1,000 increse in median income, there’s a .5304% increase in female political representation, after controlling for other variables. For every person we add per square mile to a state’s population, there is actually an associated .0006% decrease in female representation. Of the variables in the regression, only the variable measuring states’ median income were statistically significant. We can be 99% confident that the estimate on the median income variable is not zero. The estimates also indicate that while we have a fairly good fit, there is much more to explain: the adjusted R-squared statistic is .200, meaning that our model explains 20 percent of the variation in female legislature representation throughout the 50 states. So while we’ve made a good start here, there is still 80% of the variation in homicides left to be explained. The second regression table (Table 2) indicates that logging density has relatively little impact on the coefficient for income.

# Here's the regression model
femleg.lm <- lm(femleg ~ stuspend + medinc + density + newimmig, data=states)


femleg1.lm <- lm(femleg ~ stuspend + medinc + log2(density) + newimmig, data=states)

pander(femleg.lm, caption=c("Table 1: Female Legislators on Student Spending"))
Table 1: Female Legislators on Student Spending
  Estimate Std. Error t value Pr(>|t|)
stuspend -0.0001196 0.0007504 -0.1593 0.8741
medinc 0.0005304 0.0001468 3.613 0.000759
density -0.006614 0.005006 -1.321 0.1931
newimmig 2.488e-05 2.403e-05 1.035 0.306
(Intercept) -2.531 6.732 -0.376 0.7087
pander(femleg1.lm, caption=c("Table 2: Female Legislators on Student Spending (density is logged)"))
Table 2: Female Legislators on Student Spending (density is logged)
  Estimate Std. Error t value Pr(>|t|)
stuspend -0.000548 0.0006972 -0.7861 0.436
medinc 0.0005071 0.0001486 3.413 0.00137
log2(density) -0.1523 0.4933 -0.3086 0.759
newimmig 2.213e-05 2.525e-05 0.8766 0.3854
(Intercept) 1.288 6.603 0.1951 0.8462
stargazer(femleg.lm, femleg1.lm, header=FALSE)

Diagnostics

It appears there is a strong empirical relationship between income and female political representation in the 50 states. Before concluding, I will test the stability of my results with respect to outliers and the specification of the model. To identify outlying cases, I will examine a residual plot, Cook’s Distance, and Leverages from model above where density variable is logged. The residual plot from the regression, shows that the biggest outliers are New Mexico, Kansas, South Carolina, Vermont, Pennsylvania, Nevada, North Dakota, Deleware and Virgina. The residual plot also indicates that there is no pattern to the data when plotting the residuals against the fitted values, indicating we are using an appropriate means to estimate our model.

residualPlot(femleg1.lm, id.n=10, labels = states$st)

Plots of Cook’s distance and Leverages show that Alaska, California, and New York are the cases with the most leverage on our estimates. Cook’s distance, which is a combination of the size of the residual and its leverage, singles Alaska out as an outlier. The leverage plot, or hat-value, indicates that California and New York have the most weight on the estimates. Both lie at the extremes of population density which might explain their relatively high leverage values. Since these two cases indicate they exert more influence on the estimates than the other states, I will remove them from the regression to test the stability of my results with respect to their inclusion in the regression (Model 2, Table 2). Removing those cases does have a slight impact on the estimate for income. Neither the level of statistical significance (remaining at 99% confidence) or the estimate change significantly. Thus, my results seem to be stable with respect to outliers.

influenceIndexPlot(femleg1.lm, id.n=10, labels = states$st)

To test whether my results are stable with respect to the specification of the model, I removed the immigration variable (newimmig) from the model and re-ran the regression (Model 3 in Table 2). I removed immigration because the tie to female legislators seems the most far-fetched. While immigrants probably are more likely to vote for those with immigrant friendly platforms, the relation to female legislators is less obvious than the other variables. When removing it from the regression, I find that the coefficient on medinc remains about the same in magnitude and significance, which implies that my results were not dependent on the inclusion of that variable.

femleg2.lm <- update(femleg1.lm, subset= states$st != "CA" & states$st != "NY")

femleg3.lm <- lm(femleg ~ stuspend + medinc + log2(density), data=states)

#Those wanting to make a regression table for a word document should #use pander here.
pander(femleg2.lm)
Fitting linear model: femleg ~ stuspend + medinc + log2(density) + newimmig
  Estimate Std. Error t value Pr(>|t|)
stuspend -0.00058 0.0008941 -0.6488 0.5199
medinc 0.0005129 0.0001668 3.076 0.003641
log2(density) -0.08433 0.5323 -0.1584 0.8749
newimmig 1.189e-06 4.946e-05 0.02404 0.9809
(Intercept) 1.038 6.774 0.1532 0.879
pander(femleg3.lm)
Fitting linear model: femleg ~ stuspend + medinc + log2(density)
  Estimate Std. Error t value Pr(>|t|)
stuspend -0.0005135 0.0006943 -0.7396 0.4633
medinc 0.0005083 0.0001482 3.43 0.001284
log2(density) -0.0265 0.4708 -0.05628 0.9554
(Intercept) 0.6776 6.55 0.1035 0.9181
stargazer(femleg1.lm, femleg2.lm, femleg3.lm, header=FALSE)

Empirical implication

It appears there’s a strong relationship between income and female equality (political representation). If median income is an important factor in generating more equal outcomes for genders politically, we might expect that income would have the same influence on equality in other spheres (female pay equality). In other words, if we think income is an important in generating gender equality then it should have the same association with female wage equality (when compared to mens). In order to test that, I used the same model designed to estimate incomes’s association with political representation, to estimate the relationship between income and pay equality.

First, we can simply look at a scatterplot that plots pay equality against median income. From that plot we see the results are not as clear as I predicted they would be: there is only a very slightly positive relationship between the two. Looking at the scatterplot (Figure 3), we see that the states with the highest levels of pay equality are Arkansas, North Dakota, and California. These states have very different population densities from each other, which makes me doubt the relationshipt that density might hold. So, I plotted a scatterplot of the log of density and pay equity, finding a non-statistically-significant positive relationship. When I perform the regression that tests the relationship between median income and pay equity, controlling for density, immigration, and student spending, I find there really is no relationship between median income and pay equity.

The regression analysis tells us the following (Table 3). First, each dollar increase in the median income is associated with a .0001% increase in the equality of pay between men and women. Note that the coefficient is not statistically significant so we can’t be sure whether it is zero. Each time we double the density variable, there is a .5% increase in pay equity. However, again this value is not statistically significant, so we can’t be sure that value is not zero also. Finally, the fit for this model is considerably worse than for female political equality. With an adjust R-squared of about .1, our model only accounts for about 10% of the variation in female pay equality, which leave a lot to be desired.

ggplot(states, aes(medinc, percwom)) + geom_point() +
  geom_text(size=3, aes(label = st, size = 1, hjust = 0, vjust=-1), show.legend=FALSE) +
  labs(title = paste("Figure 3: Median income increases Female Pay Equality")) +
  geom_smooth(method="lm", se=F, fullrange=F) + theme_minimal() + 
  ylab("Percentage of Women's Income When Compared to Men's") +
  xlab("Median Income")

ggplot(states, aes(log2(density), percwom)) + geom_point() +
  geom_text(size=3, aes(label = st, size = 1, hjust = 0, vjust=-1), show.legend=FALSE) +
  labs(title = paste("Figure 4: Median Income Decreases Pay Equity")) +
  geom_smooth(method="lm", se=F, fullrange=F) + theme_minimal() + 
  ylab("Percentage of Women's Income When Compared to Men's") +
  xlab("People per square mile")

payeq.lm <- lm(percwom ~ stuspend + medinc + log2(density) + newimmig, data=states)
pander(payeq.lm)
Fitting linear model: percwom ~ stuspend + medinc + log2(density) + newimmig
  Estimate Std. Error t value Pr(>|t|)
stuspend -0.0008756 0.0004481 -1.954 0.05693
medinc 9.697e-05 9.55e-05 1.015 0.3153
log2(density) 0.4926 0.3171 1.553 0.1273
newimmig 2.775e-05 1.623e-05 1.71 0.09413
(Intercept) 77.58 4.244 18.28 1.871e-22
stargazer(payeq.lm, header=FALSE)

Discussion

The policy implications for this work are several. First, we shouldn’t conflate all issues of gender inequality together, because some have different causes than others. Second, while higher median incomes seem to help increase political representation of women, they seem less effective in generating equality for women in the economic sphere. The failure of our empirical implication section, should be looked into further to determine exactly what the best means to ameliorate the wage gap is, but it’s almost certainly not the variables we discussed. In the future, we should look into other socio-economic variables to account for the discrepancy. One possible area to look into regarding the unanswered question would be state education level.

Conclusion

In this paper I argue that while income has a very important deterrent effect on female political representation, that impact does not extend to other issues of gender inequality. While regression results showed that income has a strong association with female representation, controlling for education spending,immigration, and population density, it does not have a strong association with pay equality.

The main implication of this paper is that not all issues of gender inequity should be treated the same, and that further investigation into the causes of the gender wage gap is necessary.