Facebook Attraction

In this WPA, you will analyze data from a study on attraction. In the study, 1000 heterosexual University students viewed the Facebook profile of another student (the “target”) of the opposite sex. Based on a target’s profile, each participant made three judgments about the target - intelligence, attractiveness, and dateability. The primary judgement was a dateability rating indicating how dateable the person was on a scale of 0 to 100.

The data are located in a tab-delimited text file at http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt

Here is how the first few rows of the data should look:

head(facebook)
##   session sex age haircolor university    education shirtless intelligence
## 1       1   m  23     brown   3.Geneva    3.Masters     2.Yes        1.low
## 2       1   m  19    blonde   2.Zurich 1.HighSchool      1.No     2.medium
## 3       1   f  22     brown   2.Zurich  2.Bachelors     2.Yes        1.low
## 4       1   f  22       red   2.Zurich  2.Bachelors      1.No     2.medium
## 5       1   m  23     brown   3.Geneva  2.Bachelors      1.No     2.medium
## 6       1   m  26    blonde   2.Zurich    3.Masters     2.Yes       3.high
##   attractiveness dateability
## 1         3.high          15
## 2       2.medium          44
## 3       2.medium         100
## 4         3.high         100
## 5       2.medium          63
## 6         3.high          76

Datafile description

The data file has 1000 rows and 10 columns. Here are the columns

Data loading and preparation

A. Open your WPA.RProject and open a new script. Save the script with the name WPA7.R.

B. Using read.table(), load the tab-delimited text file containing the data into R from http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt and assign it to a new object called facebook. Make sure to specify that the file is tab-delimited with the argument sep = \t and contains a header with the argument header = T.

C. Using write.table(), save the data as a text file called facebook.txt into the data folder in your working directory. That way you’ll always have access to the data even if it’s deleted from the website you downloaded it from.

facebook <- read.table("http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt",
                      sep = "\t",
                      header = T
                      )

Understand the data

D. Look at the first few rows of the dataframe with the head() function to make sure it looks ok.

head(facebook)
##   session sex age haircolor university    education shirtless intelligence
## 1       1   m  23     brown   3.Geneva    3.Masters     2.Yes        1.low
## 2       1   m  19    blonde   2.Zurich 1.HighSchool      1.No     2.medium
## 3       1   f  22     brown   2.Zurich  2.Bachelors     2.Yes        1.low
## 4       1   f  22       red   2.Zurich  2.Bachelors      1.No     2.medium
## 5       1   m  23     brown   3.Geneva  2.Bachelors      1.No     2.medium
## 6       1   m  26    blonde   2.Zurich    3.Masters     2.Yes       3.high
##   attractiveness dateability
## 1         3.high          15
## 2       2.medium          44
## 3       2.medium         100
## 4         3.high         100
## 5       2.medium          63
## 6         3.high          76

E. Using the summary() function, look at summary statistics for each column in the dataframe. Make sure everything looks ok.

summary(facebook)
##     session     sex          age         haircolor      university 
##  Min.   : 1.0   f:479   Min.   :17.00   blonde:250   1.Basel :310  
##  1st Qu.:13.0   m:521   1st Qu.:21.00   brown :500   2.Zurich:363  
##  Median :25.5           Median :22.00   red   :250   3.Geneva:327  
##  Mean   :25.5           Mean   :22.33                              
##  3rd Qu.:38.0           3rd Qu.:23.00                              
##  Max.   :50.0           Max.   :30.00                              
##         education   shirtless     intelligence  attractiveness
##  1.HighSchool:220   1.No :756   1.low   :309   1.low   :230   
##  2.Bachelors :497   2.Yes:244   2.medium:348   2.medium:317   
##  3.Masters   :189               3.high  :343   3.high  :453   
##  4.PhD       : 94                                             
##                                                               
##                                                               
##   dateability   
##  Min.   :  0.0  
##  1st Qu.: 35.0  
##  Median : 53.0  
##  Mean   : 54.2  
##  3rd Qu.: 74.0  
##  Max.   :100.0

Answer guidelines

For example, here is how I would analyze and answer the question: “Was there an effect of diets on Chicken Weights?”"

# ANOVA on Chicken Weights
#   IV = Diet, DV = weight

# ANOVA
summary(aov(formula = weight ~ Diet,
            data = ChickWeight
            ))
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## Diet          3  155863   51954   10.81 6.43e-07 ***
## Residuals   574 2758693    4806                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA was significant, so I'll conduct post-hoc tests

# Tukey post-hoc tests
TukeyHSD(aov(formula = weight ~ Diet,
            data = ChickWeight
            ))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ Diet, data = ChickWeight)
## 
## $Diet
##          diff         lwr      upr     p adj
## 2-1 19.971212  -0.2998092 40.24223 0.0552271
## 3-1 40.304545  20.0335241 60.57557 0.0000025
## 4-1 32.617257  12.2353820 52.99913 0.0002501
## 3-2 20.333333  -2.7268370 43.39350 0.1058474
## 4-2 12.646045 -10.5116315 35.80372 0.4954239
## 4-3 -7.687288 -30.8449649 15.47039 0.8277810

Answer: There was a significant main effect of diets on chicken weights (F(3, 574) = 10.81, p < .01). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, p < .01) and diets 1 and 4 (diff = 32.62, p < .01). All other pairwise differences were not significant at the 0.05 significance threshold.

One-way ANOVAS

  1. Was there a main effect of the university on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests
uni.aov <- aov(formula = dateability ~ university,
            data = facebook
            )

summary(uni.aov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## university    2  19769    9884   13.99 1.01e-06 ***
## Residuals   997 704173     706                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# It is significant

TukeyHSD(uni.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dateability ~ university, data = facebook)
## 
## $university
##                         diff        lwr       upr     p adj
## 2.Zurich-1.Basel  -10.275500 -15.099617 -5.451383 0.0000020
## 3.Geneva-1.Basel   -8.641689 -13.586611 -3.696767 0.0001311
## 3.Geneva-2.Zurich   1.633811  -3.122189  6.389811 0.6991862

Answer: There was a significant main effect of university on dateability (F(2, 997) = 13.99, p < .01). There was a significant difference between Zurich and Basel (diff = -10, p < .01) and between Geneva and basel (diff = -8.64, p < .01). The difference between Geneva and Zurich was non-significant (diff - 1.63, p = 0.70)

  1. Was there a main effect of intelligence on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests
intelligence.aov <- aov(formula = dateability ~ intelligence,
            data = facebook)

summary(intelligence.aov) # Significant!
##               Df Sum Sq Mean Sq F value Pr(>F)   
## intelligence   2   7070    3535   4.917 0.0075 **
## Residuals    997 716871     719                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(intelligence.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dateability ~ intelligence, data = facebook)
## 
## $intelligence
##                      diff       lwr       upr     p adj
## 2.medium-1.low   6.535264  1.615557 11.454970 0.0053173
## 3.high-1.low     2.865833 -2.070710  7.802375 0.3610523
## 3.high-2.medium -3.669431 -8.458242  1.119380 0.1705679

Answer: There was a significant main effect of intelligence on dateability (F(2, 997) = 4.92, p < .01). There was a significant difference between medium and low intelligence (diff = -6.5, p < .01). The difference between high and low intelligence was non-significant (diff = 2.87, p = 0.36) as was the difference between high and medium intelligence (diff = -3.67, p = 0.17)

  1. Was there a main effect of haircolor on dateability? Conduct a one-way ANOVA. If the result is significant (p < .05), conduct post-hoc tests
hair.aov <- aov(formula = dateability ~ haircolor,
            data = facebook
            )

summary(hair.aov) # Non-significant
##              Df Sum Sq Mean Sq F value Pr(>F)
## haircolor     2   1779   889.7   1.228  0.293
## Residuals   997 722162   724.3

Answer: There was a no significant main effect of haircolor on dateability (F(2, 997) = 1.23, p = 0.29).

Multi-independent ANOVAs

  1. Conduct a two-way ANOVA on dateability with both intelligence and university as IVs
int.uni.aov <- aov(formula = dateability ~ intelligence + university,
            data = facebook
            )

summary(int.uni.aov)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## intelligence   2   7070    3535   5.039  0.00665 ** 
## university     2  18838    9419  13.426 1.76e-06 ***
## Residuals    995 698033     702                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(int.uni.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dateability ~ intelligence + university, data = facebook)
## 
## $intelligence
##                      diff       lwr       upr     p adj
## 2.medium-1.low   6.535264  1.675738 11.394790 0.0046796
## 3.high-1.low     2.865833 -2.010323  7.741988 0.3520433
## 3.high-2.medium -3.669431 -8.399663  1.060800 0.1632717
## 
## $university
##                        diff        lwr       upr     p adj
## 2.Zurich-1.Basel  -9.988136 -14.796014 -5.180258 0.0000038
## 3.Geneva-1.Basel  -8.490653 -13.418930 -3.562376 0.0001673
## 3.Geneva-2.Zurich  1.497483  -3.242507  6.237473 0.7388000

Do your results for each variable change compared to your previous one-way ANOVAs?

Answer: No, the results are pretty much the same

  1. Conduct a multi-way anova including ALL independent variables predicting dateability. Conduct post-hoc tests on the conditions that differ.
big.aov <- aov(formula = dateability ~ .,
            data = facebook)

summary(big.aov)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## session          1    119     119   0.215  0.64268    
## sex              1  36745   36745  66.443 1.09e-15 ***
## age              1   1413    1413   2.556  0.11023    
## haircolor        2   1849     924   1.671  0.18852    
## university       2  19430    9715  17.567 3.19e-08 ***
## education        3   2052     684   1.237  0.29510    
## shirtless        1    108     108   0.195  0.65907    
## intelligence     2   5922    2961   5.355  0.00487 ** 
## attractiveness   2 112121   56061 101.370  < 2e-16 ***
## Residuals      984 544182     553                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Problem! We can't conduct a Tukey HSD test on non-categorical independent variables! (Specifically, session and age).
#  I will repeat the ANOVA without these variables, then conduct post-hoc tests:

big2.aov <- aov(formula = dateability ~ sex + haircolor + university + education + shirtless + intelligence + attractiveness,
            data = facebook)

summary(big2.aov)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## sex              1  36514   36514  66.103 1.28e-15 ***
## haircolor        2   1869     935   1.692  0.18466    
## university       2  19172    9586  17.354 3.92e-08 ***
## education        3   3186    1062   1.922  0.12424    
## shirtless        1    106     106   0.192  0.66106    
## intelligence     2   5904    2952   5.345  0.00491 ** 
## attractiveness   2 112544   56272 101.872  < 2e-16 ***
## Residuals      986 544646     552                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(big2.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dateability ~ sex + haircolor + university + education + shirtless + intelligence + attractiveness, data = facebook)
## 
## $sex
##          diff       lwr      upr p adj
## m-f -12.09605 -15.01559 -9.17652     0
## 
## $haircolor
##                   diff       lwr     upr     p adj
## brown-blonde 0.5017632 -3.771443 4.77497 0.9589918
## red-blonde   3.4562264 -1.478048 8.39050 0.2276581
## red-brown    2.9544631 -1.318744 7.22767 0.2364246
## 
## $university
##                         diff        lwr       upr     p adj
## 2.Zurich-1.Basel  -9.8015495 -14.067852 -5.535247 0.0000003
## 3.Geneva-1.Basel  -9.0414117 -13.414550 -4.668273 0.0000042
## 3.Geneva-2.Zurich  0.7601379  -3.445924  4.966199 0.9055940
## 
## $education
##                                diff         lwr       upr     p adj
## 2.Bachelors-1.HighSchool  3.2829164  -1.6148596  8.180692 0.3111948
## 3.Masters-1.HighSchool    5.4706664  -0.5279113 11.469244 0.0883508
## 4.PhD-1.HighSchool        2.8229708  -4.6298115 10.275753 0.7638807
## 3.Masters-2.Bachelors     2.1877500  -2.9809560  7.356456 0.6962122
## 4.PhD-2.Bachelors        -0.4599456  -7.2626327  6.342742 0.9981280
## 4.PhD-3.Masters          -2.6476956 -10.2812581  4.985867 0.8087188
## 
## $shirtless
##                 diff       lwr      upr     p adj
## 2.Yes-1.No 0.7529005 -2.642917 4.148718 0.6635954
## 
## $intelligence
##                      diff       lwr        upr     p adj
## 2.medium-1.low   5.917902  1.605769 10.2300343 0.0037664
## 3.high-1.low     2.356941 -1.969949  6.6838298 0.4076096
## 3.high-2.medium -3.560961 -7.758364  0.6364413 0.1148470
## 
## $attractiveness
##                     diff       lwr      upr p adj
## 2.medium-1.low  12.98680  8.208449 17.76516     0
## 3.high-1.low    24.54165 20.075066 29.00824     0
## 3.high-2.medium 11.55485  7.515186 15.59451     0

Answer: I found significant effects for sex (F(1, 986) = 66.10, p < .01), university (F(2, 986) = 17.35, p < .01), intelligence (F(2, 986) = 5.35, p < .01), and attractiveness (F(2, 986) = 101.87, p < .01). All other variables were non-significant at the .05 threshold. For the university variable, post-hoc tests revealed significant differences between Zurich and Basel (diff = -9.80, p < .01) and Geneva and Basel (diff = -9.04, p < .01). For the intelligence variable, post-hoc tests revealed significant differences between medium and low intelligence (dif = 5.92, p < .01). For the attractiveness variable, post-hoc tests showed significant differences between all group pairs (all ps < .01).

Add a new column to the dataframe called all.aov.dateability that has the predicted dateability for each person according to the multi-way ANOVA you just ran

facebook$all.aov.dateability <- big2.aov$fitted.values

Create a scatterplot showing the relationship between the actual dateability and predicted dateability. Add appropriate labels to the plot

 plot(x = facebook$dateability, 
      y = facebook$all.aov.dateability,
      main = "Predicted vs. Actual Dateability",
      xlab = "Actual",
      ylab = "Model Predictions"
      )

ANOVAs on subsets of data

  1. It turns out that the experimenter who ran sessions 1 through 30 (a man) was trying to score a date and slipped in his own profile picture into the study. We can’t trust these data. Repeat your multi anova from question 5 ONLY for sessions 31 through 50. Do your conclusions change compared to when you analyzed the data from all sessions?
big2.b.aov <- aov(formula = dateability ~ sex + haircolor + university + education + shirtless + intelligence + attractiveness,
            data = subset(facebook, session >= 31 & session <= 50))

summary(big2.b.aov)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## sex              1   6933    6933  12.425 0.000474 ***
## haircolor        2   2459    1230   2.203 0.111806    
## university       2  12099    6050  10.842 2.62e-05 ***
## education        3   2059     686   1.230 0.298479    
## shirtless        1    184     184   0.330 0.565884    
## intelligence     2    543     271   0.487 0.615135    
## attractiveness   2  36084   18042  32.334 1.04e-13 ***
## Residuals      386 215386     558                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: When I only include sessions 31 through 50, the main effect of intelligence is no longer significant (F(2, 386) = 0.49, p = 0.62). All other variables are qualitatively the same

Interactions

  1. Create a plot (e.g.; pirateplot(), barplot(), boxplot()) showing the distribution of dateability based on two independent variables: sex and shirtless
library(yarrr)
pirateplot(dateability ~ sex + shirtless, 
           data = facebook)

Based on what you see in the plot, do you expect there to be an interaction between sex and shirtless? Why or why not?

Answer: Yes, the effect of shirtless appears to differ between the two sexes

Test your prediction with the appropriate ANOVA

sex.shirt.int.aov <- aov(formula = dateability ~ sex * shirtless,
                         data = facebook
                         )

summary(sex.shirt.int.aov)
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## sex             1  36514   36514  62.689 6.43e-15 ***
## shirtless       1    247     247   0.425    0.515    
## sex:shirtless   1 107048  107048 183.786  < 2e-16 ***
## Residuals     996 580132     582                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: My prediction was correct! The interaction is significant (F(1, 996) = 183.79, p < .01)

CHECKPOINT!

Understanding how one-way ANOVAs make predictions

  1. Conduct a one-way ANOVA on the effect of attractiveness on dateability:
att.aov <- aov(formula = dateability ~ attractiveness,
            data = facebook
            )

summary(att.aov)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## attractiveness   2 133708   66854   112.9 <2e-16 ***
## Residuals      997 590234     592                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Add the fitted values from your previous ANOVA back to the dataframe as a new vector called attractiveness.aov.dateability

facebook$attractiveness.aov.dateability <- att.aov$fitted.values

Round attractiveness.aov.dateability to the nearest 3rd decimal place using the round() function. (Hint: Assign the variable in the dataframe to a rounded version of itself).

facebook$attractiveness.aov.dateability <- round(facebook$attractiveness.aov.dateability, 3)

Look at all the unique values of attractiveness.aov.dateability with table(). How many different values does the ANOVA predict?

table(facebook$attractiveness.aov.dateability)
## 
## 36.896 50.303 65.711 
##    230    317    453

Answer: There are only three different predictions….

Calculate the actual mean dateability for each level of attractiveness with aggregate() or dplyr()

aggregate(dateability ~ attractiveness, 
          FUN = mean, 
          data = facebook)
##   attractiveness dateability
## 1          1.low    36.89565
## 2       2.medium    50.30284
## 3         3.high    65.71082

Based on what you’ve found, how does an ANOVA fit specific values to data? In other words, if you conduct a one-way ANOVA, and make predictions for groups based on that model, what will the ANOVA predict for each observation in each group?

Answer: ANOVA just predicts the group mean for each group member!

One-way vs. multi-way ANOVAS

  1. Let’s study the relationship between university and attractiveness on dateability.

Conduct a one-way ANOVA on dateability with attractiveness as the IV (I know you just did it, but do it again)

att.aov <- aov(formula = dateability ~ attractiveness,
            data = facebook
            )

summary(att.aov)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## attractiveness   2 133708   66854   112.9 <2e-16 ***
## Residuals      997 590234     592                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: There appears to be a significant effect of attractiveness on dateability (F(2, 997) = 112.9, p < .01)

Conduct a one-way ANOVA on dateability with university as the IV

uni.aov <- aov(formula = dateability ~ university,
            data = facebook
            )

summary(uni.aov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## university    2  19769    9884   13.99 1.01e-06 ***
## Residuals   997 704173     706                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: There appears to be a significant effect of university on dateability (F(2, 997) = 13.99, p < .01)

Conduct a single multi-way ANVOA with both variables (use formula = attractiveness + university). What is your conclusion?

att.uni.aov <- aov(formula = dateability ~ attractiveness + university,
                    data = facebook
                    )
summary(att.uni.aov)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## attractiveness   2 133708   66854 112.739 <2e-16 ***
## university       2    203     101   0.171  0.843    
## Residuals      995 590031     593                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Did something change? If so, explain your findings!

Answer: Yes! The effect of university is not significant when we include attractiveness in the model. The problem is that there is an imbalance between attractiveness and university. Specifically, students from basel are generally more attractive than students from other universities. Therefore, when we only include university in the model, it looks like there is an effect of university. However, the true effect is only on attractiveness!

# Show the relationship between university and attractiveness

with(facebook, table(university, 
                     attractiveness))
##           attractiveness
## university 1.low 2.medium 3.high
##   1.Basel     34       29    247
##   2.Zurich   128      114    121
##   3.Geneva    68      174     85

Using lm() coefficients to understand group differences

  1. Conduct a multi-way ANOVA on dateability with sex and education as independent variables. No matter if they are significant or not, conduct post-hoc tests.
sex.ed.aov <- aov(dateability ~ sex + education,
                  data = facebook)

summary(sex.ed.aov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## sex           1  36514   36514   53.07 6.55e-13 ***
## education     3   2807     936    1.36    0.254    
## Residuals   995 684620     688                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(sex.ed.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dateability ~ sex + education, data = facebook)
## 
## $sex
##          diff       lwr       upr p adj
## m-f -12.09605 -15.35445 -8.837661     0
## 
## $education
##                                diff        lwr       upr     p adj
## 2.Bachelors-1.HighSchool  2.5729442  -2.893274  8.039163 0.6197997
## 3.Masters-1.HighSchool    5.2384035  -1.456377 11.933184 0.1835943
## 4.PhD-1.HighSchool        2.1344785  -6.183284 10.452241 0.9118887
## 3.Masters-2.Bachelors     2.6654593  -3.103134  8.434052 0.6339136
## 4.PhD-2.Bachelors        -0.4384657  -8.030682  7.153750 0.9988299
## 4.PhD-3.Masters          -3.1039250 -11.623449  5.415599 0.7846274

Repeat your analysis using regression instead of ANOVA to get regression coefficients.

summary(lm(dateability ~ sex + education,
           data = facebook
           ))
## 
## Call:
## lm(formula = dateability ~ sex + education, data = facebook)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -60.58 -19.58  -0.36  20.75  54.05 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            58.011      1.967  29.495  < 2e-16 ***
## sexm                  -12.056      1.661  -7.258  7.9e-13 ***
## education2.Bachelors    2.572      2.124   1.211   0.2262    
## education3.Masters      5.239      2.602   2.014   0.0443 *  
## education4.PhD          2.135      3.232   0.660   0.5091    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.23 on 995 degrees of freedom
## Multiple R-squared:  0.05432,    Adjusted R-squared:  0.05051 
## F-statistic: 14.29 on 4 and 995 DF,  p-value: 2.406e-11

What are the default values of sex and education in your regression analysis?

Answer: The default for sex is female, for education it is 1.High School

What dateability does the regression predict for a female with a high-school education?

Answer: 58.011 (the intercept)

What dateability does the regression predict for a male with a PhD?

Answer: 58.011 - 12.056 + 2.135 = 72.202

Are the significance levels for group differences the same in your ANOVA post-hoc tests and your regression analysis?

Answer: Not quite. In the Tukey test, we found no significant post-hoc differences between education groups. In the regression analysis, there is a significant difference between Masters and High School.

More interactions

  1. Create a plot (e.g.; pirateplot(), barplot(), boxplot()) showing the distribution of dateability based on two independent variables: sex and shirtless
library(yarrr)

pirateplot(dateability ~ sex + shirtless,
           data = facebook
           )

Based on what you see in the plot, do you expect there to be an interaction between sex and shirtless? Why or why not?

Answer: Yes, there appears to be a strong interaction. For non-shirtless people, gender does not appear to make a difference. For shirtless people, women are strongly preferred to men!

Test your prediction with the appropriate ANOVA

summary(aov(dateability ~ sex * shirtless, data = facebook))
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## sex             1  36514   36514  62.689 6.43e-15 ***
## shirtless       1    247     247   0.425    0.515    
## sex:shirtless   1 107048  107048 183.786  < 2e-16 ***
## Residuals     996 580132     582                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: Our prediction is confirmed! There is a significant interaction (F(1, 996) = 183.79, p < 0.01)

  1. Create a plot (e.g.; pirateplot(), barplot(), boxplot()) showing the distribution of dateability based on two independent variables: university and haircolor
pirateplot(dateability ~ university + haircolor,
           data = facebook
           )

Based on what you see in the plot, do you expect there to be an interaction between university and intelligence? Why or why not?

Answer: No, there does not appear to be an interaction. The effect of school on dateability appears the same in all haircolor conditions

Test your prediction with the appropriate ANOVA

summary(aov(dateability ~ university * haircolor,
            data = facebook))
##                       Df Sum Sq Mean Sq F value   Pr(>F)    
## university             2  19769    9884  13.967 1.04e-06 ***
## haircolor              2   1647     823   1.163    0.313    
## university:haircolor   4   1217     304   0.430    0.787    
## Residuals            991 701309     708                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: Confirmed. We do not find a significant interaction between haircolor and university (F(4, 991) = 0.43, p = 0.79)

Predicting new data

Here are data for 3 of your friends:

new.data <- data.frame(sex = c("m", "f", "m"),
                       age = c(22, 23, 26),
                       university = c("1.Basel", "1.Basel", "2.Zurich"),
                       intelligence = c("3.high", "1.low", "1.low"),
                       shirtless = c("1.No", "2.Yes", "2.Yes"),
                       attractiveness = c("3.high", "3.high", "1.low"),
                       stringsAsFactors = F
                       )

library(knitr)
kable(new.data, format = "markdown")
sex age university intelligence shirtless attractiveness
m 22 1.Basel 3.high 1.No 3.high
f 23 1.Basel 1.low 2.Yes 3.high
m 26 2.Zurich 1.low 2.Yes 1.low
  1. Create a new ANOVA object on the original data set containing ONLY variables in the data set for your 3 friends. Then, using this ANOVA, predict the ratings of your 3 friends
new.aov <- aov(formula = dateability ~ sex + age + university + intelligence + shirtless + attractiveness, 
               data = facebook)


new.data <- data.frame(sex = c("m", "f", "m"),
                       age = c(22, 23, 26),
                       university = c("1.Basel", "1.Basel", "2.Zurich"),
                       intelligence = c("3.high", "1.low", "1.low"),
                       shirtless = c("1.No", "2.Yes", "2.Yes"),
                       attractiveness = c("3.high", "3.high", "1.low"),
                       stringsAsFactors = F
                       )

predict(new.aov, 
        new.data)
##        1        2        3 
## 58.69826 69.01925 30.39767

Answer: The model predicts that my friends will have dateability ratings of 58.70, 69.02 and 30.40 respectively