van Emmerik Homework 2

QUESTION 1

a) We know that there is an interaction between the units because the three lines do not run parallel to each other indicating that the relationship between “species rank in abudance” and “log abudance” was influenced by whether there had been a fire or not. b)Species rank and year (the two explanatory variables in the model) c) We that the dotted lines have a steeper gradient indicating low evenness because the high ranking species have much higher abudnances.

QUESTION 2 a)Based on this graph we see that private colleges have a higher graduation rate with lower admissions rate and that public colleges have a lower graduation with higher admissions rate. Private colleges graduation rate looks like it goes down as admissions rate increases, and public institutions graduation rate increases as admissions decreases. b)GradRate=.85-.31AdmisRate-.22TypePublic-.12AdmisRate*TypePublic+scatter c)Public GradRate=.85-.31AdmisRate-.22(1)-.12AdmisRate(1) Public GradRate=.63-.43AdmisRate Private GradRate=.85-.31AdmisRate-.22(0)-.12AdmisRate(0) Private GradRate=.85-.31AdmisRate d)The graduation rate for public colleges decreases at a faster rate with admissions that private colleges(-.43 versus -.31 slopes), and the intercept for private colleges is higher (.85 versus .63) meaning that graduation rate is going to start at a higher value than public colleges.

library(mosaic)
## Loading required package: grid Loading required package: lattice
## 
## Attaching package: 'mosaic'
## 
## The following objects are masked from 'package:stats':
## 
## D, IQR, binom.test, cor, cov, fivenum, median, prop.test, sd, t.test, var
## 
## The following objects are masked from 'package:base':
## 
## max, mean, min, print, prod, range, sample, sum
hw2 = read.csv("http://www.macalester.edu/~ajohns24/data/College.csv")
mod1 = lm(GradRate ~ AdmisRate, hw2)
summary(mod1)
## 
## Call:
## lm(formula = GradRate ~ AdmisRate, data = hw2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7028 -0.1043  0.0271  0.1306  0.3650 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.9343     0.0362    25.8   <2e-16 ***
## AdmisRate    -0.7233     0.0629   -11.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.183 on 193 degrees of freedom
## Multiple R-squared:  0.407,  Adjusted R-squared:  0.404 
## F-statistic:  132 on 1 and 193 DF,  p-value: <2e-16
xyplot(GradRate ~ AdmisRate, groups = Type, data = hw2, auto.key = T)

plot of chunk unnamed-chunk-1

mod2 = lm(GradRate ~ AdmisRate * Type, hw2)
summary(mod2)
## 
## Call:
## lm(formula = GradRate ~ AdmisRate * Type, data = hw2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7545 -0.0714  0.0277  0.0823  0.3386 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.8522     0.0343   24.81  < 2e-16 ***
## AdmisRate             -0.3053     0.0738   -4.14  5.3e-05 ***
## TypePublic            -0.2167     0.0709   -3.06   0.0026 ** 
## AdmisRate:TypePublic  -0.1156     0.1180   -0.98   0.3288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.14 on 191 degrees of freedom
## Multiple R-squared:  0.657,  Adjusted R-squared:  0.651 
## F-statistic:  122 on 3 and 191 DF,  p-value: <2e-16

QUESTION 3 a) This is a bad idea because in the previous answer the slope for AdmisRate was negative and when taking out the intercept the slope turns positive.

mod = lm(GradRate ~ AdmisRate - 1, hw2)
plot(GradRate ~ AdmisRate, hw2)

plot of chunk unnamed-chunk-2

QUESTION 4 a)The variable Metropolitan is an interaction term because the percentage of state's population that resides in metropolitan areas impacts the relationship between the percentage of the state that lives below the poverty line (poverty). and the number of series crimes per 100,000 people (CrimeRate) This makes intuitive sense because a state is going to have a larger percentage of people living under the poverty line if a higher percentage of the state is living in metropolitan areas. b) CrimeRate=-33.03+7.74poverty+1.85metropolitan +.31poverty*metropolitan CrimeRate(30)=-33.03 + 7.74poverty + 1.85 (30) + .31(30) poverty CrimeRate(30)=22.47+17.04poverty CrimeRate(70)=-33.03 + 7.74poverty + 1.85(70) + .31(70)poverty CrimeRate(70)=96.47+29.44poverty CrimeRate(100)=-33.03+7.74poverty + 1.85(100) + .31(100)poverty CrimeRate(100)=151.97+38.74poverty c)The states with a metropolitan rate of 100 have the largest intercept (151.97) and see crime rate increase at the highest rate with poverty (38.74)which makes sense because we see that crime and poverty are concentrated in metropolitan areas and would expect to see crime rates and poverty increase with an increase in a state's metropolitan rate. d) The positive sign on the interaction coefficient means that the relationship between poverty rate and crime rate is in fact influenced by metropolitan rate.

crime = read.csv("http://www.macalester.edu/~ajohns24/data/USCrime.csv")
crimesub = subset(crime, State != "DC")
crimemod = lm(CrimeRate ~ poverty * Metropolitan, crimesub)
summary(crimemod)
## 
## Call:
## lm(formula = CrimeRate ~ poverty * Metropolitan, data = crimesub)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -239.8  -88.6  -43.8   43.7  379.4 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)           -33.035    311.325   -0.11     0.92
## poverty                 7.736     25.889    0.30     0.77
## Metropolitan            1.852      4.423    0.42     0.68
## poverty:Metropolitan    0.314      0.379    0.83     0.41
## 
## Residual standard error: 146 on 46 degrees of freedom
## Multiple R-squared:  0.439,  Adjusted R-squared:  0.403 
## F-statistic:   12 on 3 and 46 DF,  p-value: 6.19e-06

QUESTION 5 a) Salary increases with experience, then levels out after about 30 years of experience and then slowly decreases a bit. b)Salary=46.14+1.14exper c)Salary=46.14+(1.14*30) = 80.34 $80,000 is the estimated annual salary of a worker with 30 years of experience

dat1 = read.csv("http://dl.dropbox.com/u/7315092/Data/SalarySim.csv")
xyplot(salary ~ exper, dat1)

plot of chunk unnamed-chunk-4

mod4 = lm(salary ~ exper, dat1)
xyplot(salary + fitted(mod4) ~ exper, dat1)

plot of chunk unnamed-chunk-4

QUESTION 6 a)BDI=25.79 +6.97 + (-1.75psiat) b)BDI=25.78+.70+3.9-1.75(5)=21.63 for a single person that is financially supported by their parents c)Because there is not interaction term (the lines run parallel then) we see that the intercept is indicative of which group is going to have the highest or lowest depression levels. Based on this those in the government assistance employment group and single are going to have the highest depression level because their intercept is the largest (25.79+10.71+.70). d)Those that are going to have the lowest depression levels are going to have the smallest intercept so this would be those in employment other and a marital status of other (25.79+2.76-3.52)

dat6 = read.csv("http://dl.dropbox.com/u/7315092/Data/socsupport.csv")
mod = lm(BDI ~ employment + marital + psisat, dat6)
summary(mod)
## 
## Call:
## lm(formula = BDI ~ employment + marital + psisat, data = dat6)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -17.92  -5.15  -0.68   3.91  33.58 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    25.785      7.690    3.35   0.0012 ** 
## employmentemployed part-time    6.979      6.259    1.12   0.2679    
## employmentgovt assistance      10.714      6.570    1.63   0.1066    
## employmentother                 2.763      6.436    0.43   0.6688    
## employmentparental support      3.923      6.580    0.60   0.5526    
## maritalother                   -3.522      3.834   -0.92   0.3608    
## maritalsingle                   0.696      3.097    0.22   0.8227    
## psisat                         -1.752      0.372   -4.71  9.1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.28 on 87 degrees of freedom
## Multiple R-squared:  0.304,  Adjusted R-squared:  0.248 
## F-statistic: 5.44 on 7 and 87 DF,  p-value: 3.36e-05