Question 1

## [1] "1a     Teleworking does appear to have a significant effect on income, as the ANOVA has a p-value less than .05, and teleworkers earn $350.76 more  on average than those who do not telework."

## [1] "1c     The model assumes that teleworking is the only factor affecting income.\nTeleworking explains only 5% of the variation in income."

Question 2

## [1] "2a     By adding whether the person works hourly or has a salary, we can more easily distinguish between independent contractors who work from home and employees who have been allowed to work from home."
## [1] "2b     The overall model and the variables are significant with p-values below .05, and the R^2 is significantly higher than the original model. If one is a telecommuter they typically earn $218.62 more per week, whereas they earn $540.66 more on average if they are paid hourly than if they are non-hourly. Yes, it is still a naïve model as the model does not take into consideration any interaction between the two x variables."

## [1] "2d     There is an improvement in fit, as the p value is less than .05, and the Adjusted R^2 of the new model is significantly higher than the R^2 of the original model (0.21 vs 0.05)."

Question 3

## [1] "3a     weekly_earnings = b0 + b1*(hours_worked)"
## [1] "3b     weekly_earnings = 66.0433 + 22.5887*(hours_worked)"
## [1] "3c     The model does not take into consideration any other variables.\nHours worked only explains 15.5% of the variation in weekly earnings\nThe model does not take into consideration any potential relations between the variables and other variables."
## [1] "3d     By changing the hours_worked with poly( , 3, raw=TRUE), we could better fit it to a linear model"

Question 4

## [1] "4a      weekly_earnings = b0 + b1*(age) + b2*(education)"
## [1] "4b      weekly_earnings = -650.91290 + 51.49191*(poly(age, 2, raw=TRUE)1) - 0.50467*(poly(age, 2, raw=TRUE)2) - 107.61850*(as.factor(education)32) - 127.12319*(as.factor(education)33) . + 1070.13295*(as.factor(education)46)"
## [1] "4c      The model only explains roughly 19% of the variance of weekly earnings.\nThe model assumes that only the listed variables influence weekly earnings.\nThe model assumes that there are no interactions between the independent variables."
## [1] "4d      Please view the above plots. Age appears to meet the linearity assumption after changing it with poly squared, as it seems to be inherently parabolic. Education does not meet the requirements for linearity, as it appears to follow a slight wavelike pattern, but more importantly is categorical."
## [1] "4e      Age and education may be colinear, which could be tested with a variance inflation facter test. Over 80% of the variance in weekly_earnings is still unexplained, which could be better explained by adding more variables to the model. Age still might not be truly linear."

Question 5

## [1] "5a     weekly_earnings = b0 + b1*(age) + b2*(sex) + b3*(hours_worked) + b4*(union_member) + b5*(education)"
## [1] "5b     weekly_earnings = -4.329e+02 +3.193e+01*(poly(age, 2, raw=TRUE)1) -2.810e-01*(poly(age, 2, raw=TRUE)1) - 1.626e+02*(as.factor(sex)2) - 2.235e+01*(poly(hours_worked, 3, raw=TRUE)1) + 1.091e+00*(poly(hours_worked, 3, raw=TRUE)2) - 8.220e-03*(poly(hours_worked, 3, raw=TRUE)3) - 5.751e+01*(as.factor(union_member)2) - 9.684e+01*(as.factor(education)32) - 5.119e+01*(as.factor(education)32) ... + 9.178e+02*(as.factor(education)46)"
## [1] "5c     After checking the variance inflation factor, I can safely conclude that none of the variables in my model seem to be colinear."
##                                       GVIF Df GVIF^(1/(2*Df))
## poly(age, 2, raw = TRUE)          1.102250  2        1.024637
## as.factor(sex)                    1.071395  1        1.035082
## poly(hours_worked, 3, raw = TRUE) 1.152645  3        1.023959
## as.factor(union_member)           1.013773  1        1.006863
## as.factor(education)              1.085609 15        1.002742
## [1] "5d     Between -$1682.00 (the min of my model) and $2535.47 in weekly earnings are where the model is best at making estimates."
## [1] "5e     In a scenario where someone is: age = 34, sex = male, hours_worked = 40, not a union member, and has a bachelor's degree; their predicted weekly earnings would be: $1236.63. The possible range of their weekly earnings is probably between $708.43 and $1764.83"