Read in Data

library(readxl)

data1 <- read_excel("Downloads/_data/20201228perceptionOfAttractivenessRawData.xlsx")
data2 <- read.delim("Downloads/_data/lighting_tall.tsv")

View Data

head(data1)
  • Notice that data1 is wide form (77 columns) – not tidy format – not suitable for analysis.
head(data2)
  • Notice that data2 is long form (6 columns) – tidy format – suitable for analysis.

Data Comparison : Representation

  • Use long format, because most statistical modeling and visualization tools (e.g., regression, ANOVA, ggplot) expect each observation to be one row with variables stored in columns. This data is considered in tidy format.

Data Visualization :

Recall : Our focus was on the rating given subject lightings. We wish to know whats the best lighting for the best rating.

This is my personal plot to get an idea of whats going on :

tidy <- data2
library(tidyverse)
library(stringr)

tidy2 <- tidy %>%
  mutate(
    lightingGroup = str_extract(subjectLighting, "^[A-Z]"),   # A,B,D,...
    lightingNum   = str_extract(subjectLighting, "[1-6]")      # 1–6
  )
ggplot(
  tidy2 %>% filter(Sex == "Male"),
  aes(x = rating, fill = lightingNum)
) +
  geom_histogram(
    aes(y = after_stat(density)),
    position = "identity",
    alpha = 0.4,
    bins = 10
  ) +
  facet_wrap(~ lightingGroup, ncol = 3) +
  theme_minimal()

The key thing to take note of is that each of these are roughly normally distributed – we are extremely lucky

Personal Analysis :

stats <- tidy2 %>%
  group_by(lightingGroup) %>%
  summarise(
    mean = mean(rating, na.rm = TRUE),
    sd   = sd(rating, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    label = paste0(
      lightingGroup,
      "  (center =", round(mean,1),
      ", SD=", round(sd,1), ")"
    )
  )

curve_data <- stats %>%
  rowwise() %>%
  do({
    data.frame(
      lightingGroup = .$lightingGroup,
      x = seq(0, 10, length.out = 400),   # rating scale range
      y = dnorm(seq(0, 10, length.out = 400), .$mean, .$sd)
    )
  }) %>%
  ungroup()

curve_data <- curve_data %>%
  left_join(stats[, c("lightingGroup","label")], by="lightingGroup")
g <- ggplot(curve_data, aes(x = x, y = y, color = label)) +
  geom_line(linewidth = 1.2) +
  labs(
    title = "Normal Distributions of Ratings by Lighting Group",
    x = "Rating",
    y = "Density",
    color = "Lighting Group\n(mean, sd)"
  ) +
  theme_minimal()

Two-way ANOVA

fit <- aov(rating ~ lightingGroup + Sex + lightingGroup:Sex, data = tidy2)
summary(fit)
##                      Df Sum Sq Mean Sq F value   Pr(>F)    
## lightingGroup         9  15081  1675.7 512.103  < 2e-16 ***
## Sex                   1    106   105.6  32.260 1.38e-08 ***
## lightingGroup:Sex     9    187    20.7   6.335 5.29e-09 ***
## Residuals         11980  39201     3.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Lighting condition strongly affects ratings (< 2e-16) – Most Sign – Experiment was worth while

  • Gender also affects ratings (1.38e-08).

  • lighting, gender interaction is sign (5.29e-09)

    • Best lighting can differ slightly by gender, but overall trends remain clear.

Diagnostics

plot(fit, which =1)

Recall :

\[ \text{Residuals}=y -\hat{y} \]

Therefore, \(\text{Residuals} > 0 \implies \hat{y}>y\) , \(\text{Residuals} = 0 \implies \hat{y}=y\), \(\text{Residuals} < 0 \implies \hat{y}<y\)

  • This is less than ideal, this isnt going to disturb prediction but inference

  • There is a clear downward trend

    • Model appears to over estimate at first then underestimate
plot(fit, which =2)

hist(residuals(fit))

  • Our error appears to be normally distributed

  • Again – extremely lucky

plot(fit, which =1)

Pairwise Comparisons

TukeyHSD(fit, "lightingGroup")
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = rating ~ lightingGroup + Sex + lightingGroup:Sex, data = tidy2)
## 
## $lightingGroup
##           diff         lwr          upr     p adj
## B-A -1.6341667 -1.86784603 -1.400487305 0.0000000
## D-A -1.3683333 -1.60201269 -1.134653972 0.0000000
## G-A  0.6141667  0.38048731  0.847846028 0.0000000
## H-A -0.8291667 -1.06284603 -0.595487305 0.0000000
## I-A -0.2866667 -0.52034603 -0.052987305 0.0041190
## J-A  1.9050000  1.67132064  2.138679361 0.0000000
## K-A  1.1808333  0.94715397  1.414512695 0.0000000
## N-A  1.2583333  1.02465397  1.492012695 0.0000000
## O-A -0.5300000 -0.76367936 -0.296320639 0.0000000
## D-B  0.2658333  0.03215397  0.499512695 0.0118752
## G-B  2.2483333  2.01465397  2.482012695 0.0000000
## H-B  0.8050000  0.57132064  1.038679361 0.0000000
## I-B  1.3475000  1.11382064  1.581179361 0.0000000
## J-B  3.5391667  3.30548731  3.772846028 0.0000000
## K-B  2.8150000  2.58132064  3.048679361 0.0000000
## N-B  2.8925000  2.65882064  3.126179361 0.0000000
## O-B  1.1041667  0.87048731  1.337846028 0.0000000
## G-D  1.9825000  1.74882064  2.216179361 0.0000000
## H-D  0.5391667  0.30548731  0.772846028 0.0000000
## I-D  1.0816667  0.84798731  1.315346028 0.0000000
## J-D  3.2733333  3.03965397  3.507012695 0.0000000
## K-D  2.5491667  2.31548731  2.782846028 0.0000000
## N-D  2.6266667  2.39298731  2.860346028 0.0000000
## O-D  0.8383333  0.60465397  1.072012695 0.0000000
## H-G -1.4433333 -1.67701269 -1.209653972 0.0000000
## I-G -0.9008333 -1.13451269 -0.667153972 0.0000000
## J-G  1.2908333  1.05715397  1.524512695 0.0000000
## K-G  0.5666667  0.33298731  0.800346028 0.0000000
## N-G  0.6441667  0.41048731  0.877846028 0.0000000
## O-G -1.1441667 -1.37784603 -0.910487305 0.0000000
## I-H  0.5425000  0.30882064  0.776179361 0.0000000
## J-H  2.7341667  2.50048731  2.967846028 0.0000000
## K-H  2.0100000  1.77632064  2.243679361 0.0000000
## N-H  2.0875000  1.85382064  2.321179361 0.0000000
## O-H  0.2991667  0.06548731  0.532846028 0.0020856
## J-I  2.1916667  1.95798731  2.425346028 0.0000000
## K-I  1.4675000  1.23382064  1.701179361 0.0000000
## N-I  1.5450000  1.31132064  1.778679361 0.0000000
## O-I -0.2433333 -0.47701269 -0.009653972 0.0333342
## K-J -0.7241667 -0.95784603 -0.490487305 0.0000000
## N-J -0.6466667 -0.88034603 -0.412987305 0.0000000
## O-J -2.4350000 -2.66867936 -2.201320639 0.0000000
## N-K  0.0775000 -0.15617936  0.311179361 0.9891254
## O-K -1.7108333 -1.94451269 -1.477153972 0.0000000
## O-N -1.7883333 -2.02201269 -1.554653972 0.0000000
  • Many lighting groups differ significantly from each other.

  • J has the highest mean rating and is significantly higher than all other groups.

  • N and K are the next highest performers and are also significantly better than most others.

  • B and D are consistently among the lowest-rated lighting conditions.

Ratio between center and variation

r <- stats %>%
  mutate(score = mean/sd) %>%
  arrange(desc(score))
r
  • J — highest mean and strong consistency – best performer

  • N — second best overall

  • K — third best, similar consistency to N

  • A and G — mid-tier

  • B and D — lowest performance

Client Graphic & Recommendation :

Graphic & Basic Summary :

g

knitr::kable(
  head(r, 5),
  digits = 2,
  col.names = c("Lighting", "Mean", "SD", "Label", "Performance Score (Mean/SD)")
)
Lighting Mean SD Label Performance Score (Mean/SD)
J 7.22 1.48 J (center =7.2, SD=1.5) 4.90
N 6.58 1.47 N (center =6.6, SD=1.5) 4.49
K 6.50 1.57 K (center =6.5, SD=1.6) 4.15
G 5.93 1.93 G (center =5.9, SD=1.9) 3.07
A 5.32 1.76 A (center =5.3, SD=1.8) 3.02
  • As we can see generally higher scores come with narrower dist. – people are strong on their preferences

Recommendation :

Lighting has a clear impact on ratings, and some lighting setups consistently perform better than others. Lighting J produced the highest and most consistent scores, making it the best overall choice, with N and K as strong secondary options. For the most reliable and highest-rated results, we recommend using Lighting J