Objective

Impact of beauty on Instructor’s Teaching Ratings Dataset –> TeachingReatings (AER Package)

Packages

require(AER)

## Loading required package: AER

## Warning: package 'AER' was built under R version 3.3.2

## Loading required package: car

## Warning: package 'car' was built under R version 3.3.2

## Loading required package: lmtest

## Warning: package 'lmtest' was built under R version 3.3.2

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 3.3.2

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Warning: package 'sandwich' was built under R version 3.3.2

## Loading required package: survival

require(ggplot2)

## Loading required package: ggplot2

require(gridExtra)

## Loading required package: gridExtra

require(GGally)

## Loading required package: GGally

require(e1071)

## Loading required package: e1071

require(ellipse)

## Loading required package: ellipse

## 
## Attaching package: 'ellipse'

## The following object is masked from 'package:car':
## 
##     ellipse

require(car)
require(scatterplot3d)

## Loading required package: scatterplot3d

## Warning: package 'scatterplot3d' was built under R version 3.3.2

library(lmtest)
require(faraway)

## Loading required package: faraway

## 
## Attaching package: 'faraway'

## The following object is masked from 'package:GGally':
## 
##     happy

## The following object is masked from 'package:survival':
## 
##     rats

## The following objects are masked from 'package:car':
## 
##     logit, vif

data("TeachingRatings")

Introduction

Data on course evaluations, course characteristics, and professor characteristics for 463 courses for the academic years 2000-2002 at the University of Texas at Austin. A data frame containing 463 observations on 12 variables.

Minority - factor. Does the instructor belong to a minority (non-Caucasian)?
Age - the professor’s age.
Gender - factor indicating instructor’s gender.
Credits - factor. Is the course a single-credit elective (e.g., yoga, aerobics, dance)?
Beauty - rating of the instructor’s physical appearance by a panel of six students, averaged across the six panelists, shifted to have a mean of zero.
Eval - course overall teaching evaluation score, on a scale of 1 (very unsatisfactory) to 5 (excellent).
Division - factor. Is the course an upper or lower division course? (Lower division courses are mainly large freshman and sophomore courses)?
Native- factor. Is the instructor a native English speaker?
Tenure - factor. Is the instructor on tenure track?
Students - number of students that participated in the evaluation.
Allstudents - number of students enrolled in the course.
Prof - factor indicating instructor identifier.

Structure of data

str(TeachingRatings)

## 'data.frame':    463 obs. of  12 variables:
##  $ minority   : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...
##  $ age        : int  36 59 51 40 31 62 33 51 33 47 ...
##  $ gender     : Factor w/ 2 levels "male","female": 2 1 1 2 2 1 2 2 2 1 ...
##  $ credits    : Factor w/ 2 levels "more","single": 1 1 1 1 1 1 1 1 1 1 ...
##  $ beauty     : num  0.29 -0.738 -0.572 -0.678 1.51 ...
##  $ eval       : num  4.3 4.5 3.7 4.3 4.4 4.2 4 3.4 4.5 3.9 ...
##  $ division   : Factor w/ 2 levels "upper","lower": 1 1 1 1 1 1 1 1 1 1 ...
##  $ native     : Factor w/ 2 levels "yes","no": 1 1 1 1 1 1 1 1 1 1 ...
##  $ tenure     : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 1 ...
##  $ students   : num  24 17 55 40 42 182 33 25 48 16 ...
##  $ allstudents: num  43 20 55 46 48 282 41 41 60 19 ...
##  $ prof       : Factor w/ 94 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

head(TeachingRatings)

##   minority age gender credits     beauty eval division native tenure
## 1      yes  36 female    more  0.2899157  4.3    upper    yes    yes
## 2       no  59   male    more -0.7377322  4.5    upper    yes    yes
## 3       no  51   male    more -0.5719836  3.7    upper    yes    yes
## 4       no  40 female    more -0.6779634  4.3    upper    yes    yes
## 5       no  31 female    more  1.5097940  4.4    upper    yes    yes
## 6       no  62   male    more  0.5885687  4.2    upper    yes    yes
##   students allstudents prof
## 1       24          43    1
## 2       17          20    2
## 3       55          55    3
## 4       40          46    4
## 5       42          48    5
## 6      182         282    6

Summary of data

summary(TeachingRatings)

##  minority       age           gender      credits        beauty          
##  no :399   Min.   :29.00   male  :268   more  :436   Min.   :-1.4504940  
##  yes: 64   1st Qu.:42.00   female:195   single: 27   1st Qu.:-0.6562689  
##            Median :48.00                             Median :-0.0680143  
##            Mean   :48.37                             Mean   : 0.0000001  
##            3rd Qu.:57.00                             3rd Qu.: 0.5456024  
##            Max.   :73.00                             Max.   : 1.9700230  
##                                                                          
##       eval        division   native    tenure       students     
##  Min.   :2.100   upper:306   yes:435   no :102   Min.   :  5.00  
##  1st Qu.:3.600   lower:157   no : 28   yes:361   1st Qu.: 15.00  
##  Median :4.000                                   Median : 23.00  
##  Mean   :3.998                                   Mean   : 36.62  
##  3rd Qu.:4.400                                   3rd Qu.: 40.00  
##  Max.   :5.000                                   Max.   :380.00  
##                                                                  
##   allstudents          prof    
##  Min.   :  8.00   34     : 13  
##  1st Qu.: 19.00   50     : 13  
##  Median : 29.00   82     : 11  
##  Mean   : 55.18   10     : 10  
##  3rd Qu.: 60.00   20     : 10  
##  Max.   :581.00   58     : 10  
##                   (Other):396

Analysis:

The dataset doesn’t need any cleaning to be done.

Graphical Representation of Distribution ‘eval’ Variable

plot1 = qplot(eval, data = TeachingRatings, fill = "red", xlab = "Evaluation")
plot2 = qplot(eval, data = TeachingRatings, geom = "density", fill = "red")
plot3 = qplot(sample = eval, data = TeachingRatings) 
grid.arrange(plot1, plot2, plot3, ncol = 3)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Analysis:

The plots above show that eval variable is negatively skewed with mean value(3.998) which is lesser than the median(4.000), i.e, the data is skewed to the left.

Overlay Plot

ggplot(TeachingRatings, aes(x = eval, y =..density..)) + geom_histogram( fill = "cornsilk", colour =" grey60", size =.2) + geom_density()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Boxplot

boxplot(TeachingRatings$eval, ylab = "eval", main = "Box Plot")

Analysis:

Evaluation Score lies between 2.5 and 5.0.
There are outliers present.

Identify Skewness of Numeric Variables

Skewness is not applicable for ‘test’ variable.

skewness(TeachingRatings$eval)

## [1] -0.4643668

skewness(TeachingRatings$age)

## [1] 0.04835668

skewness(TeachingRatings$beauty)

## [1] 0.5124669

skewness(TeachingRatings$students)

## [1] 4.465658

skewness(TeachingRatings$allstudents)

## [1] 4.128874

Analysis:

The distribution of eval and age are approximately symmetric.
The distribution of beauty is moderately skewed.
The distribution of students and allstudents is highly skewed.

Graphical Representation of Distribution of Two or More Variables

Create subset of numeric variables

TeachingRatings_subset = subset(TeachingRatings, select = c(age, beauty, eval, students, allstudents))

Boxplot and stripchart

oldpar = par(mfrow = c(1,2))
boxplot(TeachingRatings_subset, main = "Boxplot of variables",col = (c("gold","darkgreen","red","blue","pink")))
stripchart(TeachingRatings_subset, vertical = TRUE, method = "jitter", col = "orange", pch = 1, main="Stripcharts of variables")

Analysis:

The age variable is widely distributed.

Boxplot between gender and eval

ggplot(aes(gender, eval), data = TeachingRatings) + geom_boxplot(aes(fill = gender))

Analysis:

Male professors have better evaluation score than female professors.
There are outliers present.

Boxplot between minority and eval

ggplot(aes(minority, eval), data = TeachingRatings) + geom_boxplot(aes(fill = minority))

Analysis:

Professor falling in minority category have low evaluation score
There are outliers present

Boxplot of Minority with Nativity

df1 <- data.frame(Minority = TeachingRatings$minority, Nativity = TeachingRatings$native, o = TeachingRatings$eval)
df1$MinorityNativity <- interaction(df1$Minority, df1$Nativity)
ggplot(aes(y = o, x = MinorityNativity), data = df1) + 
  geom_boxplot(aes(fill = Minority)) + ggtitle("Interactive Model of Minority and Nativity") +
  labs(y = "Evaluation", x = "Minority and Nativity")

Analysis:

Native professors seem to have got more rating compared to non-native professors.

Boxplotl of Gender with Credits

df2 <- data.frame(Gender = TeachingRatings$gender, Credits = TeachingRatings$credits, o = TeachingRatings$eval)
df2$GenderCredits <- interaction(df2$Gender, df2$Credits)
ggplot(aes(y = o, x = GenderCredits), data = df2) + 
  geom_boxplot(aes(fill = Gender)) + ggtitle("Interactive Model of Gender and credits") +
  labs(y = "Evaluation", x = "Gender and credits")

Analysis:

Male Professors who taught single course are valued much higher than female professors with single course.
Male professors and female professors teaching more than one course are values almost same.
It’s reasonble to say that if the course being taught by the professor is one, the evaluation course is more.

Boxplot of Tenure with Credits

df3 <- data.frame(Tenure = TeachingRatings$tenure, Credits = TeachingRatings$credits, o = TeachingRatings$eval)
df3$TenureCredits <- interaction(df3$Tenure, df3$Credits)
ggplot(aes(y = o, x = TenureCredits), data = df3) + 
  geom_boxplot(aes(fill = Credits)) + ggtitle("Interactive Model of Tenure and Credits") +
  labs(y = "Evaluation", x = "Tenure and Credits")

Analysis:

From the last two graphs, credits is inversely proportional to evaluation course.
Tenure variable alone may not be affecting the evaluation score much.

Regression and Correlation

summary(TeachingRatings_subset)

##       age            beauty                eval          students     
##  Min.   :29.00   Min.   :-1.4504940   Min.   :2.100   Min.   :  5.00  
##  1st Qu.:42.00   1st Qu.:-0.6562689   1st Qu.:3.600   1st Qu.: 15.00  
##  Median :48.00   Median :-0.0680143   Median :4.000   Median : 23.00  
##  Mean   :48.37   Mean   : 0.0000001   Mean   :3.998   Mean   : 36.62  
##  3rd Qu.:57.00   3rd Qu.: 0.5456024   3rd Qu.:4.400   3rd Qu.: 40.00  
##  Max.   :73.00   Max.   : 1.9700230   Max.   :5.000   Max.   :380.00  
##   allstudents    
##  Min.   :  8.00  
##  1st Qu.: 19.00  
##  Median : 29.00  
##  Mean   : 55.18  
##  3rd Qu.: 60.00  
##  Max.   :581.00

Finding Z-Scores

TeachingRatings_r = data.frame(scale(TeachingRatings_subset))

Summary table of Z - Score

summary(TeachingRatings_r)

##       age               beauty              eval          
##  Min.   :-1.97547   Min.   :-1.83922   Min.   :-3.421139  
##  1st Qu.:-0.64931   1st Qu.:-0.83214   1st Qu.:-0.717781  
##  Median :-0.03724   Median :-0.08624   Median : 0.003114  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.000000  
##  3rd Qu.: 0.88087   3rd Qu.: 0.69182   3rd Qu.: 0.724009  
##  Max.   : 2.51307   Max.   : 2.49798   Max.   : 1.805352  
##     students         allstudents      
##  Min.   :-0.70247   Min.   :-0.62842  
##  1st Qu.:-0.48034   1st Qu.:-0.48189  
##  Median :-0.30263   Median :-0.34869  
##  Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.07499   3rd Qu.: 0.06424  
##  Max.   : 7.62744   Max.   : 7.00417

Analysis:

The mean after rescaling the variables is 0 for all the attributes.
All the variables lie between -2 and +3 with few exceptions.

Boxplot and stripchart on the basis of Z-Score

oldpar = par(mfrow = c(1,2))
boxplot(TeachingRatings_r, main = "Boxplot of re-scaled variables",col = (c("gold","darkgreen","red","blue","pink")))
stripchart(TeachingRatings_r, vertical = TRUE, method = "jitter", col = (c("gold","darkgreen","red","blue","pink")), pch = 1, main = "Stripcharts of re-scaled variables")

Analysis:

It provides confirmation of the variable transformations as all the variables now have mean 0.

Correlation matrix

cor(TeachingRatings_subset)

##                     age      beauty         eval    students  allstudents
## age          1.00000000 -0.29789253 -0.051696191 -0.03046108 -0.012626464
## beauty      -0.29789253  1.00000000  0.189039091  0.13064984  0.099601914
## eval        -0.05169619  0.18903909  1.000000000  0.03546667 -0.001229338
## students    -0.03046108  0.13064984  0.035466674  1.00000000  0.972056127
## allstudents -0.01262646  0.09960191 -0.001229338  0.97205613  1.000000000

Analysis:

Eval & Age have a weak negative linear relationship.
Eval & Beauty have a weak positive linear relationship.
Eval & Students have a weak positive linear relationship.
Eval & Allstudents have a weak nagative linear relationship.

Identify Predictors and Outliers

Generalized Pairs Plot

ggpairs(TeachingRatings, columns = c(1, 3, 7:9, 2, 5, 10, 6), mapping = aes(colour = gender))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Gender: Blue : Female Red : Male

Analysis:

From the scatterplot matrix, it is reasonable to say that beauty and age seems to be better predictors followed by students and allstudents.
There are outliers present.

Linear Model

Fit a model for eval against all the other variables

fit = lm(eval ~ ., data = TeachingRatings) 
summary(fit)

## 
## Call:
## lm(formula = eval ~ ., data = TeachingRatings)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.46212 -0.19511  0.00898  0.18983  1.00008 
## 
## Coefficients: (6 not defined because of singularities)
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   13.1386717 36.0968676   0.364 0.716081    
## minorityyes   -1.6043290  4.5205210  -0.355 0.722870    
## age           -0.1020625  0.4188715  -0.244 0.807630    
## genderfemale   0.6301714  0.2747557   2.294 0.022383 *  
## creditssingle  0.3926076  0.1324904   2.963 0.003243 ** 
## beauty        -1.3043190  6.6622697  -0.196 0.844894    
## divisionlower -0.0584322  0.0733228  -0.797 0.426017    
## nativeno       0.0466904  3.1888844   0.015 0.988326    
## tenureyes     -3.7843180 14.4927882  -0.261 0.794149    
## students      -0.0003343  0.0024521  -0.136 0.891641    
## allstudents   -0.0029406  0.0014310  -2.055 0.040598 *  
## prof2         -0.6494872  1.8428949  -0.352 0.724721    
## prof3         -1.0498919  4.0645450  -0.258 0.796317    
## prof4         -2.6839369  9.3901062  -0.286 0.775173    
## prof5         -0.3245685  1.4626494  -0.222 0.824512    
## prof6          3.0135233  8.3048114   0.363 0.716916    
## prof7         -2.8180458  8.6458919  -0.326 0.744656    
## prof8         -0.9993021  2.0003539  -0.500 0.617684    
## prof9         -2.0263746  6.8098101  -0.298 0.766203    
## prof10        -2.9915172 12.8149360  -0.233 0.815551    
## prof11        -0.5674736  4.0907442  -0.139 0.889747    
## prof12        -5.2362883 21.0433760  -0.249 0.803630    
## prof13        -0.8893128  2.5757762  -0.345 0.730098    
## prof14        -1.0592936  5.9952185  -0.177 0.859850    
## prof15        -2.3315374  2.4798728  -0.940 0.347745    
## prof16        -0.3477871  3.1815474  -0.109 0.913014    
## prof17        -5.0550549 19.2072914  -0.263 0.792557    
## prof18        -3.9270391 16.3612888  -0.240 0.810449    
## prof19        -0.9322029  2.8259268  -0.330 0.741684    
## prof20        -4.3660730 12.0499005  -0.362 0.717313    
## prof21        -4.5256697 12.7460120  -0.355 0.722746    
## prof22        -3.1912762 13.8065872  -0.231 0.817334    
## prof23         0.3225347  0.6282508   0.513 0.607993    
## prof24         1.6081037  4.4724699   0.360 0.719387    
## prof25         0.1573405  4.4647874   0.035 0.971907    
## prof26         0.4967518  0.6232505   0.797 0.425949    
## prof27         1.8327860  2.9478535   0.622 0.534504    
## prof28         1.1144268  4.2096446   0.265 0.791366    
## prof29         0.9001250  2.8660077   0.314 0.753648    
## prof30        -2.9289498  1.7902831  -1.636 0.102696    
## prof31        -2.9770806  9.7805324  -0.304 0.761005    
## prof32        -1.9190175  8.0807807  -0.237 0.812418    
## prof33        -0.0535542  1.4464828  -0.037 0.970486    
## prof34         0.0492405  2.0160550   0.024 0.980528    
## prof35        -0.9337594  6.1894514  -0.151 0.880167    
## prof36         1.3231659  8.2582198   0.160 0.872793    
## prof37        -0.6304172  4.4340350  -0.142 0.887019    
## prof38         0.1984293  0.6124225   0.324 0.746118    
## prof39         0.7934674  0.9390916   0.845 0.398703    
## prof40        -0.8055888  0.4341491  -1.856 0.064322 .  
## prof41        -0.6729398  5.6028405  -0.120 0.904465    
## prof42        -0.7982813  5.1262982  -0.156 0.876338    
## prof43        -3.0353337  6.6798139  -0.454 0.649808    
## prof44         1.1035560  5.9280106   0.186 0.852423    
## prof45         1.7699295  5.5558923   0.319 0.750236    
## prof46        -0.3622876  1.2399309  -0.292 0.770312    
## prof47        -0.3107110  3.4392986  -0.090 0.928065    
## prof48        -1.0727251  0.3168721  -3.385 0.000788 ***
## prof49        -5.6826892 19.5574376  -0.291 0.771550    
## prof50        -0.6246916  4.2033557  -0.149 0.881938    
## prof51        -1.0313551  2.9927472  -0.345 0.730580    
## prof52         1.7411649  6.4869813   0.268 0.788536    
## prof53         0.0719133  1.4550675   0.049 0.960610    
## prof54        -4.4452413 17.0540241  -0.261 0.794504    
## prof55        -0.9945627  0.4318984  -2.303 0.021854 *  
## prof56        -2.0520824 10.3765111  -0.198 0.843341    
## prof57        -0.3872228  1.8890156  -0.205 0.837697    
## prof58        -2.0562549  7.3448334  -0.280 0.779667    
## prof59        -2.7999305 12.4156575  -0.226 0.821704    
## prof60        -1.2816455  1.0323594  -1.241 0.215229    
## prof61        -1.9744362 10.0178872  -0.197 0.843866    
## prof62        -0.1425869  1.0703896  -0.133 0.894100    
## prof63         0.4275225  5.3229625   0.080 0.936029    
## prof64         0.5248498  7.2513765   0.072 0.942340    
## prof65        -5.6579336 21.6004357  -0.262 0.793518    
## prof66        -0.4879983  1.9565841  -0.249 0.803181    
## prof67         1.5875309  8.7740774   0.181 0.856520    
## prof68        -2.4585656  5.9668559  -0.412 0.680554    
## prof69        -1.0555423  2.3890812  -0.442 0.658882    
## prof70        -1.1839153  6.8826579  -0.172 0.863522    
## prof71        -2.6727493 14.5557250  -0.184 0.854412    
## prof72         0.9727225  5.2238491   0.186 0.852385    
## prof73         3.4017003  5.6521512   0.602 0.547653    
## prof74        -2.0374718  9.4786592  -0.215 0.829924    
## prof75        -0.9431379  0.4579671  -2.059 0.040165 *  
## prof76        -0.4763093  0.3592227  -1.326 0.185687    
## prof77        -7.0507263 27.0450841  -0.261 0.794469    
## prof78        -0.8982174  3.7788599  -0.238 0.812251    
## prof79        -0.4938072  2.7256979  -0.181 0.856337    
## prof80         0.4676497  6.3317582   0.074 0.941164    
## prof81        -4.8609457 20.8182944  -0.233 0.815509    
## prof82        -0.3883115  2.8699617  -0.135 0.892448    
## prof83        -3.4104781  8.7702420  -0.389 0.697600    
## prof84         3.1205717 13.7035582   0.228 0.819992    
## prof85                NA         NA      NA       NA    
## prof86         1.6174389  7.7929949   0.208 0.835696    
## prof87        -0.3530191  2.9813985  -0.118 0.905810    
## prof88        -1.0902595  3.5360276  -0.308 0.758008    
## prof89         1.4308338  9.3300214   0.153 0.878201    
## prof90                NA         NA      NA       NA    
## prof91                NA         NA      NA       NA    
## prof92                NA         NA      NA       NA    
## prof93                NA         NA      NA       NA    
## prof94                NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3908 on 365 degrees of freedom
## Multiple R-squared:  0.6082, Adjusted R-squared:  0.504 
## F-statistic:  5.84 on 97 and 365 DF,  p-value: < 2.2e-16

Analysis:

p-value and f-value do a good job on deciding the goodness of a linear model.
If the p-value is less than or equal to the alpha (i.e p < .05), the result is statistically significant. If the p-value is greater than alpha (p > .05), the result is statistically insignificant.
The f-Ratio 5.84 is large and p-value 2.2e-16 is less than (0.05). The result is significant.

Fit Plot and Residual Plot

Fit plot for the variable ‘eval’

qplot(fitted.values(fit), eval, data = TeachingRatings) + geom_abline(intercept = 0, slope = 1, color = "green")

Analysis:

It is not a good fit plot as the distribution is widely spread and not close to the line but there is a mild pattern observed.

Residual plot for the variable ‘eval’

ggplot(fit, aes(.fitted, .resid)) + geom_point() + geom_hline(yintercept = 0, color = "red", linetype = "dashed") + ggtitle("Residual Plot")

Analysis:

The points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data.

Exploring model structure

cor(fit$resid, TeachingRatings$eval)

## [1] 0.6259743

Analysis:

As the correlation is high, it indicates that there is some issue with the model.

plot1 = qplot(minority, fit$resid, geom = "boxplot", data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot2 = qplot(age, fit$resid, data = TeachingRatings) + 
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot3 = qplot(gender, fit$resid, geom = "boxplot", data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot4 = qplot(credits, fit$resid, geom = "boxplot", data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot5 = qplot(beauty, fit$resid, data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot6 = qplot(division, fit$resid, geom = "boxplot", data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot7 = qplot(native, fit$resid, geom = "boxplot", data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot8 = qplot(tenure, fit$resid, geom = "boxplot", data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot9 = qplot(students, fit$resid, data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot10 = qplot(allstudents, fit$resid, data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
plot11 = qplot(prof, fit$resid, data = TeachingRatings) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed")
grid.arrange(plot1, plot3, plot4, plot6, plot7, plot8, nrow = 3)

grid.arrange(plot2, plot5, plot9, plot10, plot11, nrow = 3)

Analysis:

We see pronounced patters indicating that we do not need to include square of the predictors or other transforms of the predictors.

Normality of the Residual

mod = fortify(fit)
plot1 = qplot(.stdresid, data = mod, geom = "histogram")
plot2 = qplot(.stdresid, data = mod, geom = "density")
plot3 = qplot(sample = .stdresid, data = mod, geom = "qq") +geom_abline()
grid.arrange(plot1, plot2, plot3, nrow = 1)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 7 rows containing non-finite values (stat_bin).

## Warning: Removed 7 rows containing non-finite values (stat_density).

## Warning: Removed 7 rows containing non-finite values (stat_qq).

Analysis:

The residual do not look as though they come from a normal distribution.

Comparing Models

fit.bg = lm(eval ~ ., data = TeachingRatings)
fit.sm = lm(eval ~ 1, data = TeachingRatings)
anova(fit.sm, fit.bg)

## Analysis of Variance Table
## 
## Model 1: eval ~ 1
## Model 2: eval ~ minority + age + gender + credits + beauty + division + 
##     native + tenure + students + allstudents + prof
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    462 142.239                                  
## 2    365  55.735 97    86.503 5.8401 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis:

The F-Ratio 5.8401 is big, therefore conclude that the big model fits the data better than than the small model since the p-value 2.2e-16 which is less than 0.05.

Hypothesis Testing

Fit the model for variable ‘eval’ against all the other variables

fit_1 = lm(eval ~ age + gender + credits + beauty + division + native + tenure + students + allstudents + prof, data = TeachingRatings)
summary(fit_1)$r.squared

## [1] 0.6081562

Analysis:

The f-Ratio 5.84 is large and p-value 2.2e-16 is less than (0.05).

Remove factor variables, credits and tenure

fit_2 = lm(eval ~ age + gender + beauty + division + native + students + allstudents + prof, data = TeachingRatings) 
summary(fit_1)$r.squared

## [1] 0.6081562

summary(fit_2)$r.squared

## [1] 0.5987293

anova(fit_1,fit_2)

## Analysis of Variance Table
## 
## Model 1: eval ~ age + gender + credits + beauty + division + native + 
##     tenure + students + allstudents + prof
## Model 2: eval ~ age + gender + beauty + division + native + students + 
##     allstudents + prof
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    365 55.735                                
## 2    366 57.076 -1   -1.3409 8.7811 0.003243 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis:

The f-value is 8.7811 and p-value is 0.0032 which is much less than 0.05.

Remove factor variables divison, native and gender

fit_3 = lm(eval ~ age + beauty + students + allstudents + prof, data = TeachingRatings) 
summary(fit_1)$r.squared

## [1] 0.6081562

summary(fit_3)$r.squared

## [1] 0.5985692

anova(fit_1,fit_3)

## Analysis of Variance Table
## 
## Model 1: eval ~ age + gender + credits + beauty + division + native + 
##     tenure + students + allstudents + prof
## Model 2: eval ~ age + beauty + students + allstudents + prof
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1    365 55.735                              
## 2    367 57.099 -2   -1.3636 4.4651 0.01214 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis:

The f-value has come down to 4.4651 and p-value is 0.01214 which is much less than 0.05.
age variable: p-value 0.020742 is too less than the alpha (0.05) for the null hypothesis to be true.
beauty variable: p-value 0.403375 is greater than the alpha (0.05) and so, we fail to reject the null hypothesis.
students variable: p-value 0.887564 is greater than the alpha (0.05) and so, we fail to reject the null hypothesis.
allstudents variable: p-value 0.028405 is too less than the alpha (0.05) for the null hypothesis to be true.

Remove age

fit_4 = lm(eval ~ beauty + students + allstudents + prof, data = TeachingRatings) 
summary(fit_1)$r.squared

## [1] 0.6081562

summary(fit_4)$r.squared

## [1] 0.5985692

anova(fit_1,fit_4)

## Analysis of Variance Table
## 
## Model 1: eval ~ age + gender + credits + beauty + division + native + 
##     tenure + students + allstudents + prof
## Model 2: eval ~ beauty + students + allstudents + prof
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1    365 55.735                              
## 2    367 57.099 -2   -1.3636 4.4651 0.01214 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis:

The f-value 4.4651 is smaller and the p-value, 0.01214, is less than alpha (i.e p < .05). We accept the null hypothesis.

Remove beauty

fit_5 = lm(eval ~ students + allstudents + prof, data = TeachingRatings) 
summary(fit_5)

## 
## Call:
## lm(formula = eval ~ students + allstudents + prof, data = TeachingRatings)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.51125 -0.19979  0.00942  0.19494  1.01597 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.3470122  0.2077959  20.920  < 2e-16 ***
## students    -0.0003496  0.0024708  -0.141 0.887564    
## allstudents -0.0031156  0.0014160  -2.200 0.028405 *  
## prof2       -0.6950659  0.3038517  -2.288 0.022734 *  
## prof3       -0.4785411  0.3421520  -1.399 0.162771    
## prof4       -0.2394047  0.2454749  -0.975 0.330068    
## prof5        0.1900744  0.2562636   0.742 0.458735    
## prof6        0.9739548  0.2642924   3.685 0.000263 ***
## prof7       -0.3770876  0.2672464  -1.411 0.159088    
## prof8       -0.2339032  0.2515044  -0.930 0.352974    
## prof9        0.0558806  0.2497346   0.224 0.823069    
## prof10       0.2778802  0.2387573   1.164 0.245237    
## prof11      -0.9869112  0.3044501  -3.242 0.001297 ** 
## prof12      -0.1285482  0.2681507  -0.479 0.631948    
## prof13      -0.4433878  0.2500451  -1.773 0.077020 .  
## prof14      -0.6628805  0.2820196  -2.350 0.019278 *  
## prof15      -1.1817290  0.2818103  -4.193 3.45e-05 ***
## prof16       0.1379471  0.2559060   0.539 0.590177    
## prof17      -0.0592205  0.2668526  -0.222 0.824498    
## prof18       0.0173654  0.2417590   0.072 0.942777    
## prof19       0.0844698  0.2414515   0.350 0.726658    
## prof20      -0.7962473  0.2395309  -3.324 0.000976 ***
## prof21      -0.6528306  0.2584028  -2.526 0.011943 *  
## prof22      -0.9593634  0.4434698  -2.163 0.031162 *  
## prof23      -0.0551294  0.2662047  -0.207 0.836051    
## prof24       0.2344740  0.2512613   0.933 0.351337    
## prof25       0.0150680  0.3440225   0.044 0.965088    
## prof26       0.0001329  0.2651912   0.001 0.999601    
## prof27      -0.2973520  0.2524220  -1.178 0.239562    
## prof28      -0.3895891  0.2838741  -1.372 0.170777    
## prof29      -0.2670415  0.2860347  -0.934 0.351125    
## prof30      -2.0061287  0.4444934  -4.513 8.61e-06 ***
## prof31      -0.5977400  0.2523059  -2.369 0.018348 *  
## prof32      -0.5009359  0.3063302  -1.635 0.102847    
## prof33       0.2204366  0.2684540   0.821 0.412103    
## prof34      -0.5905636  0.2305408  -2.562 0.010817 *  
## prof35      -0.0267562  0.3057718  -0.088 0.930319    
## prof36      -0.4844748  0.2812928  -1.722 0.085855 .  
## prof37      -0.6487074  0.2469256  -2.627 0.008972 ** 
## prof38      -0.1704852  0.3062421  -0.557 0.578071    
## prof39       0.1910362  0.2457589   0.777 0.437464    
## prof40      -0.7888025  0.4439598  -1.777 0.076439 .  
## prof41       0.3605282  0.2698872   1.336 0.182427    
## prof42       0.2576044  0.2810964   0.916 0.360045    
## prof43      -0.1880755  0.2826296  -0.665 0.506182    
## prof44      -0.2520091  0.3029762  -0.832 0.406075    
## prof45       0.2128984  0.2820087   0.755 0.450772    
## prof46      -0.4793289  0.3051750  -1.571 0.117121    
## prof47      -0.7319911  0.4428641  -1.653 0.099215 .  
## prof48      -0.8107118  0.3054797  -2.654 0.008303 ** 
## prof49      -0.4160068  0.2514346  -1.655 0.098874 .  
## prof50      -0.0535531  0.2285335  -0.234 0.814857    
## prof51       0.2954874  0.2605779   1.134 0.257547    
## prof52       0.1201735  0.2810010   0.428 0.669148    
## prof53       0.4346320  0.2483983   1.750 0.080999 .  
## prof54      -0.4025378  0.2560864  -1.572 0.116839    
## prof55      -0.9788460  0.3046807  -3.213 0.001431 ** 
## prof56       0.1080650  0.2734638   0.395 0.692946    
## prof57      -0.2060806  0.3421015  -0.602 0.547283    
## prof58      -0.1015104  0.2334153  -0.435 0.663897    
## prof59      -0.8962169  0.3451706  -2.596 0.009798 ** 
## prof60      -1.3107118  0.3054797  -4.291 2.28e-05 ***
## prof61      -0.0633354  0.4420009  -0.143 0.886138    
## prof62       0.0510323  0.4432009   0.115 0.908393    
## prof63      -0.2338352  0.3420928  -0.684 0.494695    
## prof64      -0.5219975  0.3025884  -1.725 0.085350 .  
## prof65      -0.1087745  0.2483484  -0.438 0.661650    
## prof66      -0.6154672  0.2605524  -2.362 0.018690 *  
## prof67      -0.3523139  0.3458348  -1.019 0.309000    
## prof68      -1.7154487  0.3057442  -5.611 3.97e-08 ***
## prof69      -1.1218336  0.4415050  -2.541 0.011467 *  
## prof70       0.2375440  0.2421761   0.981 0.327301    
## prof71       0.5111179  0.2380797   2.147 0.032461 *  
## prof72      -0.3034285  0.2562312  -1.184 0.237101    
## prof73       2.0999060  0.3895491   5.391 1.26e-07 ***
## prof74      -0.2122714  0.2792482  -0.760 0.447651    
## prof75      -0.5952872  0.3436035  -1.732 0.084028 .  
## prof76      -0.6660992  0.3453437  -1.929 0.054526 .  
## prof77      -0.2189106  0.2570192  -0.852 0.394920    
## prof78      -0.2702403  0.2821959  -0.958 0.338878    
## prof79      -0.3172439  0.3025911  -1.048 0.295133    
## prof80      -0.3822178  0.2812016  -1.359 0.174908    
## prof81       0.2721896  0.2834314   0.960 0.337518    
## prof82      -0.2484547  0.2363601  -1.051 0.293872    
## prof83      -0.1108817  0.2696040  -0.411 0.681110    
## prof84      -0.0200726  0.2691894  -0.075 0.940600    
## prof85       0.5027269  0.2466783   2.038 0.042268 *  
## prof86      -0.0096631  0.3051476  -0.032 0.974755    
## prof87       0.3079574  0.3426071   0.899 0.369314    
## prof88      -0.9037480  0.2525310  -3.579 0.000392 ***
## prof89      -0.4585025  0.3040943  -1.508 0.132475    
## prof90      -0.5058982  0.3446608  -1.468 0.143011    
## prof91       0.2832747  0.3031993   0.934 0.350771    
## prof92      -0.3617047  0.2521869  -1.434 0.152346    
## prof93       0.1133988  0.2555754   0.444 0.657521    
## prof94      -0.6598206  0.2799070  -2.357 0.018934 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3944 on 367 degrees of freedom
## Multiple R-squared:  0.5986, Adjusted R-squared:  0.4947 
## F-statistic:  5.76 on 95 and 367 DF,  p-value: < 2.2e-16

summary(fit_1)$r.squared

## [1] 0.6081562

summary(fit_5)$r.squared

## [1] 0.5985692

anova(fit_1,fit_5)

## Analysis of Variance Table
## 
## Model 1: eval ~ age + gender + credits + beauty + division + native + 
##     tenure + students + allstudents + prof
## Model 2: eval ~ students + allstudents + prof
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1    365 55.735                              
## 2    367 57.099 -2   -1.3636 4.4651 0.01214 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis:

The f-value 4.4651 is smaller and the p-value, 0.01214, is less than alpha (i.e p < .05). We accept the null hypothesis.

Goodness of the Model

Log Transformation

fit_6 = lm(log(eval) ~ students + allstudents + prof, data = TeachingRatings)

Sqrt Transformation

fit_7 = lm(sqrt(eval) ~ students + allstudents + prof, data = TeachingRatings)

Fit Plot

p1 = qplot(fitted(fit_5), eval, data = TeachingRatings) + 
  geom_abline(intercept = 0, slope = 1, color = "red") 
p2 = qplot(fitted(fit_6), log(TeachingRatings$eval), data = TeachingRatings) + 
  geom_abline(intercept = 0, slope = 1, color = "red")
p3 = qplot(fitted(fit_7), sqrt(TeachingRatings$eval), data = TeachingRatings) + 
  geom_abline(intercept = 0, slope = 1, color = "red")
grid.arrange(p1,p2,p3)

Analysis:

The original fit is good because the data is not much scattered.
There is no much difference after transformation.
We could consider using the original model.

Pearson Correlation

cor(fitted(fit_5), TeachingRatings$eval)

## [1] 0.7736726

cor(fitted(fit_6), log(TeachingRatings$eval))

## [1] 0.7725017

cor(fitted(fit_7), sqrt(TeachingRatings$eval))

## [1] 0.7732936

Analysis:

Pearson Correlation Coefficient confirms that the original model is better than the transformed models.

Plot the model fit_5 to estimate the goodness

plot(fit_5)

## Warning: not plotting observations with leverage one:
##   22, 30, 40, 47, 61, 62, 69

## Warning: not plotting observations with leverage one:
##   22, 30, 40, 47, 61, 62, 69

Analysis:

The first plot (residuals vs. fitted values) is not so random because of most distribution around the points 3.0 and 4.8, and has three outliers.
The second plot (normal Q-Q) shows a straight line which means that the errors are distributed normally in the middle and wide variance towards the either extremes. Points 99, 126 and 459 deviate from the straight line.
The third plot (Scale-Location), like the the first, is not so random because of most distribution around the points 3.0 and 4.8. There is no pattern.
The last plot (Cooks distance) tells us that the points 46 and 234 have the greatest influence on the regression (leverage points).

Confidence Interval

CI for fit_5

Between 5% and 95%

confint(fit_5)

##                    2.5 %        97.5 %
## (Intercept)  3.938392272  4.7556322269
## students    -0.005208219  0.0045090601
## allstudents -0.005900103 -0.0003311933
## prof2       -1.292574780 -0.0975569671
## prof3       -1.151365512  0.1942832214
## prof4       -0.722118553  0.2433090621
## prof5       -0.313855009  0.6940037589
## prof6        0.454237273  1.4936722355
## prof7       -0.902614016  0.1484387866
## prof8       -0.728473782  0.2606672956
## prof9       -0.435209791  0.5469709169
## prof10      -0.191623806  0.7473841102
## prof11      -1.585596789 -0.3882255219
## prof12      -0.655852936  0.3987565112
## prof13      -0.935088712  0.0483131025
## prof14      -1.217457614 -0.1083034012
## prof15      -1.735894624 -0.6275633061
## prof16      -0.365279028  0.6411732435
## prof17      -0.583972435  0.4655314552
## prof18      -0.458041232  0.4927721280
## prof19      -0.390332346  0.5592718836
## prof20      -1.267272617 -0.3252219843
## prof21      -1.160966427 -0.1446947504
## prof22      -1.831424201 -0.0873025796
## prof23      -0.578607391  0.4683485629
## prof24      -0.259618563  0.7285665576
## prof25      -0.661434734  0.6915707183
## prof26      -0.521352055  0.5216177705
## prof27      -0.793726930  0.1990229476
## prof28      -0.947812926  0.1686347371
## prof29      -0.829514175  0.2954311231
## prof30      -2.880202243 -1.1320551137
## prof31      -1.093886656 -0.1015932543
## prof32      -1.103318536  0.1014466734
## prof33      -0.307464498  0.7483376983
## prof34      -1.043910273 -0.1372168475
## prof35      -0.628040866  0.5745285115
## prof36      -1.037622814  0.0686731899
## prof37      -1.134274050 -0.1631408320
## prof38      -0.772694722  0.4317243358
## prof39      -0.292236165  0.6743085180
## prof40      -1.661826888  0.0842218061
## prof41      -0.170191082  0.8912475349
## prof42      -0.295157236  0.8103660835
## prof43      -0.743852223  0.3677011853
## prof44      -0.847796350  0.3437782411
## prof45      -0.341657231  0.7674540398
## prof46      -1.079439947  0.1207821850
## prof47      -1.602860863  0.1388785644
## prof48      -1.411422120 -0.2100015419
## prof49      -0.910440064  0.0784263678
## prof50      -0.502952586  0.3958463883
## prof51      -0.216925709  0.8079005266
## prof52      -0.432400500  0.6727475599
## prof53      -0.053830548  0.9230944566
## prof54      -0.906118538  0.1010430125
## prof55      -1.577985022 -0.3797069824
## prof56      -0.429687569  0.6458174831
## prof57      -0.878805621  0.4666444929
## prof58      -0.560509669  0.3574888309
## prof59      -1.574977277 -0.2174565662
## prof60      -1.911422120 -0.7100015419
## prof61      -0.932507547  0.8058366705
## prof62      -0.820499595  0.9225641797
## prof63      -0.906543092  0.4388727814
## prof64      -1.117022146  0.0730272254
## prof65      -0.597138993  0.3795899033
## prof66      -1.127830141 -0.1031043312
## prof67      -1.032380311  0.3277525683
## prof68      -2.316679111 -1.1142182108
## prof69      -1.990030671 -0.2536365172
## prof70      -0.238682863  0.7137709112
## prof71       0.042946332  0.9792895328
## prof72      -0.807294017  0.2004370483
## prof73       1.333877482  2.8659344460
## prof74      -0.761398799  0.3368559644
## prof75      -1.270966060  0.0803915655
## prof76      -1.345199919  0.0130014355
## prof77      -0.724325667  0.2865045152
## prof78      -0.825164060  0.2846834691
## prof79      -0.912273742  0.2777860277
## prof80      -0.935186406  0.1707507472
## prof81      -0.285163766  0.8295430113
## prof82      -0.713244705  0.2163353753
## prof83      -0.641044190  0.4192808356
## prof84      -0.549419752  0.5092745267
## prof85       0.017646603  0.9878072622
## prof86      -0.609720219  0.5903940337
## prof87      -0.365761910  0.9816767121
## prof88      -1.400337313 -0.4071586570
## prof89      -1.056488459  0.1394835396
## prof90      -1.183656118  0.1718597930
## prof91      -0.312951302  0.8795006273
## prof92      -0.857617317  0.1342080074
## prof93      -0.389177236  0.6159747881
## prof94      -1.210243315 -0.1093977970

Using Boniferroni Correction, 99%

confint(fit_5, level = 0.99)

##                    0.5 %        99.5 %
## (Intercept)  3.808968111  4.8850563878
## students    -0.006747119  0.0060479603
## allstudents -0.006782037  0.0005507404
## prof2       -1.481826635  0.0916948880
## prof3       -1.364472393  0.4073901026
## prof4       -0.875010807  0.3962013162
## prof5       -0.473466974  0.8536157241
## prof6        0.289624668  1.6582848403
## prof7       -1.069066508  0.3148912785
## prof8       -0.885121475  0.4173149890
## prof9       -0.590755189  0.7025163148
## prof10      -0.340332041  0.8960923456
## prof11      -1.775221355 -0.1986009562
## prof12      -0.822868684  0.5657722597
## prof13      -1.090827493  0.2040518838
## prof14      -1.393111475  0.0673504596
## prof15      -1.911418165 -0.4520397652
## prof16      -0.524668250  0.8005624655
## prof17      -0.750179629  0.6317386498
## prof18      -0.608619065  0.6433499607
## prof19      -0.540718692  0.7096582295
## prof20      -1.416462720 -0.1760318817
## prof21      -1.321910723  0.0162495453
## prof22      -2.107636194  0.1889094140
## prof23      -0.744411076  0.6341522476
## prof24      -0.416114864  0.8850628586
## prof25      -0.875706680  0.9058426641
## prof26      -0.686524467  0.6867901824
## prof27      -0.950946140  0.3562421573
## prof28      -1.124621831  0.3454436425
## prof29      -1.007668829  0.4735857769
## prof30      -3.157051746 -0.8552056109
## prof31      -1.251033575  0.0555536645
## prof32      -1.294114061  0.2922421984
## prof33      -0.474669139  0.9155423393
## prof34      -1.187500946  0.0063738262
## prof35      -0.818488643  0.7649762884
## prof36      -1.212824028  0.2438744036
## prof37      -1.288069885 -0.0093449964
## prof38      -0.963435428  0.6224650417
## prof39      -0.445305326  0.8273776793
## prof40      -1.938344067  0.3607389852
## prof41      -0.338288348  1.0593448012
## prof42      -0.470236081  0.9854449291
## prof43      -0.919886039  0.5437350005
## prof44      -1.036502911  0.5324848022
## prof45      -0.517304292  0.9431011000
## prof46      -1.269515997  0.3108582347
## prof47      -1.878695595  0.4147132962
## prof48      -1.601687964 -0.0197356974
## prof49      -1.067044263  0.2350305663
## prof50      -0.645293036  0.5381868384
## prof51      -0.379224767  0.9701995854
## prof52      -0.607419916  0.8477669766
## prof53      -0.208543614  1.0778075224
## prof54      -1.065620086  0.2605445612
## prof55      -1.767753191 -0.1899388136
## prof56      -0.600012499  0.8161424139
## prof57      -1.091881047  0.6797199193
## prof58      -0.705890698  0.5028698598
## prof59      -1.789964293 -0.0024695507
## prof60      -2.101687964 -0.5197356974
## prof61      -1.207804588  1.0811337118
## prof62      -1.096544060  1.1986086449
## prof63      -1.119613096  0.6519427853
## prof64      -1.305487162  0.2614922414
## prof65      -0.751821001  0.5342719119
## prof66      -1.290113296  0.0591788235
## prof67      -1.247781009  0.5431532661
## prof68      -2.507109708 -0.9237876133
## prof69      -2.265018886  0.0213516975
## prof70      -0.389520484  0.8646085320
## prof71      -0.105339899  1.1275757642
## prof72      -0.966885758  0.3600287896
## prof73       1.091249616  3.1085623127
## prof74      -0.935326542  0.5107837077
## prof75      -1.484977044  0.2944025492
## prof76      -1.560294726  0.2280962428
## prof77      -0.884408208  0.4465870556
## prof78      -1.000927720  0.4604471285
## prof79      -1.100740404  0.4662526904
## prof80      -1.110330789  0.3458951307
## prof81      -0.461696972  1.0060762172
## prof82      -0.860459879  0.3635505488
## prof83      -0.808965100  0.5872017453
## prof84      -0.717082405  0.6769371793
## prof85      -0.135995211  1.1414490763
## prof86      -0.799779185  0.7804529989
## prof87      -0.579152251  1.1950670534
## prof88      -1.557624427 -0.2498715429
## prof89      -1.245891426  0.3288865067
## prof90      -1.398325639  0.3865293135
## prof91      -0.501796805  1.0683461303
## prof92      -1.014690107  0.2912807979
## prof93      -0.548360542  0.7751580934
## prof94      -1.384581349  0.0649402373

Analysis:

All confidence intervals except credit variable contains zero which means the Bonferoni correction for this family of confidence intervals is too conservative (weak) for Fit_1.
All confidence intervals contain zero which means the Bonferoni correction for this family of confidence intervals is too conservative (weak) for Fit_5.

Joint Confidence Region

Check hypothesis for students and allstudents variables

plot(ellipse(fit_5, c("students", "allstudents")), 
     type = "l", 
     main = "Joint Confidence Region")
points(0,0)
points(coef(fit_5)["students"], coef(fit_5)["allstudents"], 
       pch=18)
abline(v = confint(fit_5)["students",], lty = 2, color = 'red')

## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...):
## "color" is not a graphical parameter

abline(h = confint(fit_5)["allstudents",], lty = 2, color = 'red')

## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...):
## "color" is not a graphical parameter

Analysis:

The zero point is outside the allstudents CI, inside the students CI.
Only the one 95% CI for allstudents rejects H0:allstudents=0.
Note the zero point is outside the 95% CR
The joint 95% CR rejects the null hypothesis H0:students=0
The 95% CR is equivalent to testing the full model lm(eval ~ students + allstudents + prof, data = TeachingRatings) versus the reduced model lm(eval ~ prof, data = TeachingRatings) using a level of significance equal to 5%

Checking for Non-Constant Variance

mod <- fortify(fit_5)
p1 <- qplot(.fitted, .resid, data = mod) + 
  geom_hline(yintercept = 0, linetype = "dashed") + 
  labs(title = "Residuals vs Fitted", x = "Fitted", y = "Residuals") + 
  geom_smooth(color = "red", se = F)
p2 <- qplot(.fitted, abs(.resid), data = mod) + 
  geom_hline(yintercept = 0, linetype = "dashed") + 
  labs(title = "Scale-Location", x = "Fitted", y = "|Residuals|") + 
  geom_smooth(method = "lm", color = "red", se = F)
grid.arrange(p1, p2, nrow = 2)

Analysis:

Since we do not see a pattern in the Residuals Vs Fitted plot, we can conclude that there exists no non-constant variance.
Heteroskedasticiy is not present, if the red line is a straight line.
In our case, the red line is a straight line so the inference here is, heteroscedasticity does not exists.

An approximate test of non-contant error variance.

summary(lm(abs(residuals(fit_5)) ~ fitted(fit_5)))

## 
## Call:
## lm(formula = abs(residuals(fit_5)) ~ fitted(fit_5))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33889 -0.17483 -0.05049  0.12348  1.24902 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.44193    0.10138   4.359 1.61e-05 ***
## fitted(fit_5) -0.04480    0.02521  -1.777   0.0763 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2326 on 461 degrees of freedom
## Multiple R-squared:  0.006802,   Adjusted R-squared:  0.004648 
## F-statistic: 3.157 on 1 and 461 DF,  p-value: 0.07625

Analysis:

The null hypothesis says that nonconstant error variance is present.
The t-test rejects the hypothesis with a level of significance 10%, since the p-value, 0.07625, is less than 0.10
Hence, there is no nonconstant error variance.

Breuch Pagan test to formally check presence of heteroscedasticity

bptest(fit_5)

## 
##  studentized Breusch-Pagan test
## 
## data:  fit_5
## BP = 165.02, df = 95, p-value = 1.134e-05

Analysis:

As p-value is less than 1.134e-05 thus we confirm that there is no heteroskedasticity.

An F-test for non-constant error variance between two groups defined by a predictor

group <- TeachingRatings$students > 36
p1 <- qplot(students, .resid, data = mod, color = group)
p2 <- qplot(group, .resid, data = mod, geom = "boxplot")
grid.arrange(p1, p2, nrow = 2)

var.test(residuals(fit_5)[TeachingRatings$students > 36], residuals(fit_5)[TeachingRatings$students < 36])

## 
##  F test to compare two variances
## 
## data:  residuals(fit_5)[TeachingRatings$students > 36] and residuals(fit_5)[TeachingRatings$students < 36]
## F = 0.51042, num df = 126, denom df = 329, p-value = 1.973e-05
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.3850553 0.6896998
## sample estimates:
## ratio of variances 
##          0.5104208

Analysis:

We conclude that there is difference in the variance between these two groups with level of significance 10% since the p-value is less than 0.10.

Checking for Non-Normal Errors

Normal QQ-plots for detecting nonnormality

gs <- lm(sqrt(eval) ~ students + allstudents + prof, data = TeachingRatings)
modgg <- fortify(fit_5)
modgs <- fortify(gs)
p1 <- qplot(sample = scale(.resid), data = modgg) + 
  geom_abline(intercept = 0, slope = 1, color = "red") + 
  labs(title = "Untransformed y", y = "Residuals")

p2 <- qplot(sample = scale(.resid), data = modgs) + 
  geom_abline(intercept = 0, slope = 1, color = "red") + 
  labs(title = "Sqrt-Tranformed y", y = "Residuals")
grid.arrange(p1, p2, nrow = 2)

Analysis:

We do not see much difference. To be sure about this, we further plot Histogram plot.

Hisograms, kernal density plots

p1 <- qplot(scale(.resid), data = modgg, geom = "blank") + 
  geom_line(aes(y = ..density.., colour = "Empirical"), stat = "density") + 
  stat_function(fun = dnorm, aes(colour = "Normal")) + 
  geom_histogram(aes(y = ..density..), alpha = 0.4) + 
  scale_colour_manual(name = "Density", values = c("red", "blue")) + 
  theme(legend.position = c(0.85, 0.85)) + labs(title = "Untransformed y", 
                                                y = "Residuals")
p2 <- qplot(scale(.resid), data = modgs, geom = "blank") + 
  geom_line(aes(y = ..density.., colour = "Empirical"), stat = "density") + 
  stat_function(fun = dnorm, aes(colour = "Normal")) + 
  geom_histogram(aes(y = ..density..), alpha = 0.4) + 
  scale_colour_manual(name = "Density", values = c("red", "blue")) + 
  theme(legend.position = c(0.85, 0.85)) + labs(title = "Sqrt-Tranformed y", 
                                                y = "Residuals")
grid.arrange(p1, p2, nrow = 2)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Analysis:

We see a difference in the sqrt-tranformed model.

The Shapiro-Wilk test of normality

shapiro.test(residuals(fit_5))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(fit_5)
## W = 0.98411, p-value = 5.904e-05

shapiro.test(residuals(gs))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(gs)
## W = 0.97694, p-value = 1.058e-06

Analysis:

The null hypothesis say that the errors are normal.
We reject the null hypothesis of normality for the residuals of Untransformed model with level of significance 10% since the p-value is less than 0.10.
Hence, there are non-normal errors.

Box-Cox Power Transform

lambda <- powerTransform(fit_5)
lam <- lambda$lambda
glam <- lm(eval^lam ~ students + allstudents + prof, data = TeachingRatings)
modlam <- fortify(glam)

p1 <- qplot(sample = scale(.resid), data = modgs) + 
  geom_abline(intercept = 0, slope = 1, color = "red") + 
  labs(title = "Normal QQ-Plot", y = "Residuals Sqrt-transformed")

p2 <- qplot(sample = scale(.resid), data = modlam) + 
  geom_abline(intercept = 0, slope = 1, color = "red") + 
  labs(title = "Normal QQ-Plot", y = "Residuals Box-Cox-Transform")

grid.arrange(p1, p2, nrow = 1)

shapiro.test(residuals(glam))

## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(glam)
## W = 0.99193, p-value = 0.01298

Analysis:

Here the Power Transform does not seem to work, as there are too many outliers still present and the Shapiro-Wilk test rejects normal errors
The Shapiro-Wilk test concludes that the errors are not normal since the p-value, 0.01298, is lesser than 0.10

Data Analysis: & Decision Making - Oral Presentation Report

Project Group 9: Akshat Shah, Madhuri Rupaakula, Namita Kadam, Shraddha Somani

2016-12-06

Objective

Packages

Introduction

Structure of data

Summary of data

Graphical Representation of Distribution ‘eval’ Variable

Overlay Plot

Boxplot

Identify Skewness of Numeric Variables

Skewness is not applicable for ‘test’ variable.

Graphical Representation of Distribution of Two or More Variables

Create subset of numeric variables

Boxplot and stripchart

Boxplot between gender and eval

Boxplot between minority and eval

Boxplot of Minority with Nativity

Boxplotl of Gender with Credits

Boxplot of Tenure with Credits

Regression and Correlation

Finding Z-Scores

Summary table of Z - Score

Boxplot and stripchart on the basis of Z-Score

Correlation matrix

Identify Predictors and Outliers

Generalized Pairs Plot

Linear Model

Fit a model for eval against all the other variables

Fit Plot and Residual Plot

Fit plot for the variable ‘eval’

Residual plot for the variable ‘eval’

Exploring model structure

Normality of the Residual

Comparing Models

Hypothesis Testing

Fit the model for variable ‘eval’ against all the other variables

Remove factor variables, credits and tenure

Remove factor variables divison, native and gender

Remove age

Remove beauty

Goodness of the Model

Log Transformation

Sqrt Transformation

Fit Plot

Pearson Correlation

Plot the model fit_5 to estimate the goodness

Confidence Interval

CI for fit_5

Between 5% and 95%

Using Boniferroni Correction, 99%

Joint Confidence Region

Check hypothesis for students and allstudents variables

Checking for Non-Constant Variance

An approximate test of non-contant error variance.

Breuch Pagan test to formally check presence of heteroscedasticity

An F-test for non-constant error variance between two groups defined by a predictor

Checking for Non-Normal Errors

Normal QQ-plots for detecting nonnormality

Hisograms, kernal density plots

The Shapiro-Wilk test of normality

Box-Cox Power Transform

Checking Influential Outliers

Influence plot

Omnibus diagnostic plot function