Problem Set 1

Getting to Know Your Data

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.004   4.600   8.000  10.616  13.000 143.600

This is the summary statistics of house hold income. It looks like the Median Value is: 80,000. It also looks like the Mean of house hold income is $106,160. Based on the look of the plot and just some basic inference of the summary data. I would have to say that I would think this data is skewed right. Just by plotting the data you can see that the majority of the data is clustered on the left-hand side of the screen. Also, just from being a practicing economist, I know that the average income, at least in America is close to 60 thousand a year, and even that could be a high estimate. Based on the summary data it was saying that bottom 25% of people have a house hold income of 46,000 thousand a year. This seems like a lot considering people work all over the country at minimum wage jobs and still don’t make 46 thousand in a year. Thus I believe this data is skewed right.

After dividing by the number of individuals in the household and then re-plotting the data, its quite obvious that the histogram is still very much skewed right. This doesn’t really change my perspective, to me this makes sense, if you are poor you tend to have more kids on average, thus more people in a single home. This adds to the frequency of Income per capital. We can see that there are a lot of poor families that have to live together to try to cut costs.

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               hh_size          
## -----------------------------------------------
## hh_income                    0.024***          
##                               (0.001)          
##                                                
## Constant                     2.580***          
##                               (0.013)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                             0.027           
## Adjusted R2                    0.027           
## Residual Std. Error     1.479 (df = 24998)     
## F Statistic         690.440*** (df = 1; 24998) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The data above is skewed right. This graph reminds me of the graph of 1/x. The outcome variable was household size (number of people that live in the house). The explanatory variable was household income.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                           time_commuting       
## -----------------------------------------------
## hh_income                    0.175***          
##                               (0.020)          
##                                                
## Constant                     34.880***         
##                               (0.301)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                             0.003           
## Adjusted R2                    0.003           
## Residual Std. Error     33.093 (df = 24998)    
## F Statistic          74.569*** (df = 1; 24998) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a 0.175 increase in time commuting for a one unit increase in household income.

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                           time_commuting)      
## -----------------------------------------------
## hh_income                    0.007***          
##                               (0.001)          
##                                                
## Constant                     3.181***          
##                               (0.008)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                             0.006           
## Adjusted R2                    0.006           
## Residual Std. Error     0.887 (df = 24998)     
## F Statistic         159.839*** (df = 1; 24998) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a .7% increase in time commuting for a one unit increase in household income.

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                           time_commuting)      
## -----------------------------------------------
## hh_income)                   0.138***          
##                               (0.006)          
##                                                
## Constant                     2.976***          
##                               (0.014)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                             0.018           
## Adjusted R2                    0.018           
## Residual Std. Error     0.882 (df = 24998)     
## F Statistic         454.468*** (df = 1; 24998) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                           time_commuting       
## -----------------------------------------------
## hh_income                    0.179***          
##                               (0.020)          
##                                                
## hh_share_nonwhite             1.310**          
##                               (0.522)          
##                                                
## Constant                     34.535***         
##                               (0.330)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                             0.003           
## Adjusted R2                    0.003           
## Residual Std. Error     33.090 (df = 24997)    
## F Statistic          40.442*** (df = 2; 24997) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Looking at the table showing the regression results, we can see that the p-value is very small, signaling a statistical significance. It seems that there is a 0.179 increase in time commuting for a one unit increase in household income. If we compare this with the result from question 10, which was 0.175, we can see that there is a .004 difference (increase for non_white).

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                           time_commuting       
## -----------------------------------------------
## i_moved                       -0.033           
##                               (0.536)          
##                                                
## Constant                     36.748***         
##                               (0.233)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                            0.00000          
## Adjusted R2                  -0.00004          
## Residual Std. Error     33.143 (df = 24998)    
## F Statistic            0.004 (df = 1; 24998)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

If you moved this year (i_moved == 1) then you have a reduced time of commuting by 0.033. When i_moved == 0 then the time comutting is just equal to the constant (in this case 36.748). However because the p-value is large, this is not statistical significant.

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                           time_commuting       
## -----------------------------------------------
## i_moved                       -0.071           
##                               (0.536)          
##                                                
## hh_share_nonwhite             0.973*           
##                               (0.522)          
##                                                
## Constant                     36.529***         
##                               (0.261)          
##                                                
## -----------------------------------------------
## Observations                  25,000           
## R2                            0.0001           
## Adjusted R2                   0.0001           
## Residual Std. Error     33.141 (df = 24997)    
## F Statistic            1.741 (df = 2; 24997)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Problem Set 1

Cole Wilson

3/5/2020

Setup

Getting to Know Your Data