mean(fltxt)
length(fltxt)
sd(fltxt)
mean(mltxt)
length(fltxt)
sd(mltxt)
Women:
Mean = 3.1841049
Sample Size = 30
Standard Deviation 0.9537804
Men:
Mean = 3.1133718
Sample Size = 50
Standard Deviation 1.1732823
Based on the histogram of log(texts), it seems like the data is slightly left-skewed but not to the degree that it would be unreasonable to assume it could be accurately modeled by a Normal distribution.
model1 = lm(formula=ltxt~female+sophomore+glasses+iphone+height,data=txtdata)
summary(model1)
##
## Call:
## lm(formula = ltxt ~ female + sophomore + glasses + iphone + height,
## data = txtdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8474 -0.5999 0.0286 0.6492 2.2246
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.57052 1.73657 -0.904 0.36873
## female 0.59640 0.29080 2.051 0.04382 *
## sophomore 0.90933 0.21721 4.186 7.72e-05 ***
## glasses -0.35428 0.22445 -1.578 0.11874
## iphone 0.66932 0.22521 2.972 0.00399 **
## height 0.05785 0.02381 2.430 0.01753 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9206 on 74 degrees of freedom
## Multiple R-squared: 0.3323, Adjusted R-squared: 0.2872
## F-statistic: 7.366 on 5 and 74 DF, p-value: 1.191e-05
Model 1:
\(log(texts) = 0.596x_{female} + 0.909x_{sophomore} - 0.354x_{glasses} + 0.669x_{iphone} + 0.058x_{height} - 1.571\)
From the summary in part 3b, the standard deviation of the residuals was calculated to be \(0.921\)
A predictor is significant in a multiple regression model if it’s p-value is below \(\alpha = 0.05\). According to our summary, there are 4 significant predictors in this model: female, sophomore, iphone, and height. The result that personally surprise me the most was height coming back significant. The others are reasonably explainable, but the fact that height is the third most significant predictor of numbe of log text messages sent by a Harvard student doesn’t seem likely. Of all the data, I expected height to be the least useful.
The interpretation of the female coefficient would be that if the person in question is a female, her fitted value for predicted number of log texts is .0596 higher than a male, holding all other variables constant.