Data605 - Discussion12

Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?

The data is from Researchers at University of Texas, Austin which is for teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors. We will explore the regression between teaching evaluation score with beauty score.

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# read the data from github
pb <- read.csv("https://raw.githubusercontent.com/amit-kapoor/data605/master/profbeautyevals.csv")
bty <- pb$btystdave
ev <- pb$courseevaluation

# model evaluation 
m_ev_bty <- lm(ev ~ bty)

# beauty score/teaching evaluation plot
m_ev_bty %>% 
  ggplot(aes(bty, ev)) +
  geom_point() +
  geom_smooth(method = lm, se = F)

## `geom_smooth()` using formula 'y ~ x'

# residuals
m_ev_bty %>% 
  ggplot(aes(fitted(m_ev_bty), resid(m_ev_bty))) +
  geom_point() +
  geom_smooth(method = lm, se =F) +
  labs(title = "Residual Analysis",
       x = "Fitted Line", y = "Residuals") +
  theme_minimal()

## `geom_smooth()` using formula 'y ~ x'

hist(m_ev_bty$residuals, xlab = "Residuals", ylab = "")

m_ev_bty %>% 
  ggplot(aes(sample = resid(m_ev_bty))) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot") +
  theme_minimal()

Data605 - Discussion12

Amit Kapoor

4/12/2020

Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Conclusion