Problem #2: Texts Sent by Harvard Students

2a: Summary Stats

mean(fltxt)
length(fltxt)
sd(fltxt)

mean(mltxt)
length(fltxt)
sd(mltxt)

Women:
Mean = 3.1841049
Sample Size = 30
Standard Deviation 0.9537804

Men:
Mean = 3.1133718
Sample Size = 50
Standard Deviation 1.1732823

2b: Men’s and Women’s Variance t-test

Problem #3: Multiple Regression on Texts

3a: Investigating log(texts)

Based on the histogram of log(texts), it seems like the data is slightly left-skewed but not to the degree that it would be unreasonable to assume it could be accurately modeled by a Normal distribution.

3b: Fitting the Model

model1 = lm(formula=ltxt~female+sophomore+glasses+iphone+height,data=txtdata)
summary(model1)
## 
## Call:
## lm(formula = ltxt ~ female + sophomore + glasses + iphone + height, 
##     data = txtdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8474 -0.5999  0.0286  0.6492  2.2246 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.57052    1.73657  -0.904  0.36873    
## female       0.59640    0.29080   2.051  0.04382 *  
## sophomore    0.90933    0.21721   4.186 7.72e-05 ***
## glasses     -0.35428    0.22445  -1.578  0.11874    
## iphone       0.66932    0.22521   2.972  0.00399 ** 
## height       0.05785    0.02381   2.430  0.01753 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9206 on 74 degrees of freedom
## Multiple R-squared:  0.3323, Adjusted R-squared:  0.2872 
## F-statistic: 7.366 on 5 and 74 DF,  p-value: 1.191e-05

Model 1:
\(log(texts) = 0.596x_{female} + 0.909x_{sophomore} - 0.354x_{glasses} + 0.669x_{iphone} + 0.058x_{height} - 1.571\)

3c: Standard Deviation of the Residuals

From the summary in part 3b, the standard deviation of the residuals was calculated to be \(0.921\)

3d: Significant Predictors?

A predictor is significant in a multiple regression model if it’s p-value is below \(\alpha = 0.05\). According to our summary, there are 4 significant predictors in this model: female, sophomore, iphone, and height. The result that personally surprise me the most was height coming back significant. The others are reasonably explainable, but the fact that height is the third most significant predictor of numbe of log text messages sent by a Harvard student doesn’t seem likely. Of all the data, I expected height to be the least useful.

3e: Interpretation of Binary Predictor Coefficient

The interpretation of the female coefficient would be that if the person in question is a female, her fitted value for predicted number of log texts is .0596 higher than a male, holding all other variables constant.

3f: Confidence Interval