## Warning: package 'pacman' was built under R version 3.6.3
## [1] 25000 12
There looks like there are 25,000 observations and 12 variables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.004 4.600 8.000 10.616 13.000 143.600
This is the summary statistics of house hold income. It looks like the Median Value is: 80,000. It also looks like the Mean of house hold income is $106,160. Based on the look of the plot and just some basic inference of the summary data. I would have to say that I would think this data is skewed right. Just by plotting the data you can see that the majority of the data is clustered on the left-hand side of the screen. Also, just from being a practicing economist, I know that the average income, at least in America is close to 60 thousand a year, and even that could be a high estimate. Based on the summary data it was saying that bottom 25% of people have a house hold income of 46,000 thousand a year. This seems like a lot considering people work all over the country at minimum wage jobs and still don’t make 46 thousand in a year. Thus I believe this data is skewed right.
After dividing by the number of individuals in the household and then re-plotting the data, its quite obvious that the histogram is still very much skewed right. This doesn’t really change my perspective, to me this makes sense, if you are poor you tend to have more kids on average, thus more people in a single home. This adds to the frequency of Income per capital. We can see that there are a lot of poor families that have to live together to try to cut costs.
##
## ===============================================
## Dependent variable:
## ---------------------------
## hh_size
## -----------------------------------------------
## hh_income 0.024***
## (0.001)
##
## Constant 2.580***
## (0.013)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.027
## Adjusted R2 0.027
## Residual Std. Error 1.479 (df = 24998)
## F Statistic 690.440*** (df = 1; 24998)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The data above is skewed right. This graph reminds me of the graph of 1/x. The outcome variable was household size (number of people that live in the house). The explanatory variable was household income.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
##
## ===============================================
## Dependent variable:
## ---------------------------
## time_commuting
## -----------------------------------------------
## hh_income 0.175***
## (0.020)
##
## Constant 34.880***
## (0.301)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.003
## Adjusted R2 0.003
## Residual Std. Error 33.093 (df = 24998)
## F Statistic 74.569*** (df = 1; 24998)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a 0.175 increase in time commuting for a one unit increase in household income.
##
## ===============================================
## Dependent variable:
## ---------------------------
## time_commuting)
## -----------------------------------------------
## hh_income 0.007***
## (0.001)
##
## Constant 3.181***
## (0.008)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.006
## Adjusted R2 0.006
## Residual Std. Error 0.887 (df = 24998)
## F Statistic 159.839*** (df = 1; 24998)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a .7% increase in time commuting for a one unit increase in household income.
##
## ===============================================
## Dependent variable:
## ---------------------------
## time_commuting)
## -----------------------------------------------
## hh_income) 0.138***
## (0.006)
##
## Constant 2.976***
## (0.014)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.018
## Adjusted R2 0.018
## Residual Std. Error 0.882 (df = 24998)
## F Statistic 454.468*** (df = 1; 24998)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a .7% increase in time commuting for a 1% increase in household income.
##
## ===============================================
## Dependent variable:
## ---------------------------
## time_commuting
## -----------------------------------------------
## hh_income 0.179***
## (0.020)
##
## hh_share_nonwhite 1.310**
## (0.522)
##
## Constant 34.535***
## (0.330)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.003
## Adjusted R2 0.003
## Residual Std. Error 33.090 (df = 24997)
## F Statistic 40.442*** (df = 2; 24997)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a 0.179 increase in time commuting for a one unit increase in household income. If we compare this with the result from question 10, which was 0.175, we can see that there is a .004 difference (increase for non_white).
##
## ===============================================
## Dependent variable:
## ---------------------------
## time_commuting
## -----------------------------------------------
## i_moved -0.033
## (0.536)
##
## Constant 36.748***
## (0.233)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.00000
## Adjusted R2 -0.00004
## Residual Std. Error 33.143 (df = 24998)
## F Statistic 0.004 (df = 1; 24998)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
If you moved this year (i_moved == 1) then you have a reduced time of commuting by 0.033. When i_moved == 0 then the time comutting is just equal to the constant (in this case 36.748). However because the p-value is large, this is not statistical significant.
##
## ===============================================
## Dependent variable:
## ---------------------------
## time_commuting
## -----------------------------------------------
## i_moved -0.071
## (0.536)
##
## hh_share_nonwhite 0.973*
## (0.522)
##
## Constant 36.529***
## (0.261)
##
## -----------------------------------------------
## Observations 25,000
## R2 0.0001
## Adjusted R2 0.0001
## Residual Std. Error 33.141 (df = 24997)
## F Statistic 1.741 (df = 2; 24997)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01