Recipes for the Design of Experiments:

Recipe Outline

Trevor Corrao

RPI

10/10/16 Version 1

1. Setting

System under test

For this recipe, a dataset of responses to a clerical survey in a large financial organization are analyzed. These responses were used to generate ratings for 30 random departments within the corporation.

  summary(attitude)
##      rating        complaints     privileges       learning    
##  Min.   :40.00   Min.   :37.0   Min.   :30.00   Min.   :34.00  
##  1st Qu.:58.75   1st Qu.:58.5   1st Qu.:45.00   1st Qu.:47.00  
##  Median :65.50   Median :65.0   Median :51.50   Median :56.50  
##  Mean   :64.63   Mean   :66.6   Mean   :53.13   Mean   :56.37  
##  3rd Qu.:71.75   3rd Qu.:77.0   3rd Qu.:62.50   3rd Qu.:66.75  
##  Max.   :85.00   Max.   :90.0   Max.   :83.00   Max.   :75.00  
##      raises         critical        advance     
##  Min.   :43.00   Min.   :49.00   Min.   :25.00  
##  1st Qu.:58.25   1st Qu.:69.25   1st Qu.:35.00  
##  Median :63.50   Median :77.50   Median :41.00  
##  Mean   :64.63   Mean   :74.77   Mean   :42.93  
##  3rd Qu.:71.00   3rd Qu.:80.00   3rd Qu.:47.75  
##  Max.   :88.00   Max.   :92.00   Max.   :72.00

Factors and Levels

The four factors being considered for the experiment are:

  1. Complaints: Handling of employee complaints (Levels: 37-90)

  2. Learning: Opportunity to learn (Levels: 34-75)

  3. Raises: Raises based on performance (Levels: 43-88)

  4. Critical: Too critical Levels: (49-92)

  attitude$ complaints[attitude$ complaints>=37 & attitude$ complaints<=90]
##  [1] 51 64 70 63 78 55 67 75 82 61 53 60 62 83 77 90 85 60 70 58 40 61 66
## [24] 37 54 77 75 57 85 82
  attitude$ learning[attitude$ learning>=34 & attitude$ learning<=75]
##  [1] 39 54 69 47 66 44 56 55 67 47 58 39 42 45 72 72 69 75 57 54 34 62 50
## [24] 58 48 63 74 45 71 59
  attitude$ raises[attitude$ raises>=43 & attitude$ raises<=88]
##  [1] 61 63 76 54 71 54 66 70 71 62 58 59 55 59 79 60 79 55 75 64 43 66 63
## [24] 50 66 88 80 51 77 64
  attitude$ critical[attitude$ critical>=49 & attitude$ critical<=92]
##  [1] 92 73 86 84 83 49 68 66 83 80 67 74 63 77 77 54 79 80 85 78 64 80 80
## [24] 57 75 76 78 83 74 78
  attitude$complaints=as.factor(attitude$complaints)
attitude$learning=as.factor(attitude$learning)
attitude$raises=as.factor(attitude$raises)
attitude$critical=as.factor(attitude$critical)

The levels are nominal with an interval of 1, so practically speaking, they are integers.

Response Variable

The rating is the response variable. This represetns a departments overall rating, based on the percentage of favorable responses provided in the survey for each department respectively.

The Data: How is it organized and what does it look like?

The dataset represents the survey answers from provided by clerical employees of a large financial organization. The data are gathered from approximately 35 employees from each of the 30 randomly selected departments within the firm. The numbers that we see are the percent proportion of favorable survey responses to the questions.

The structure of the attitude data is as follows:

  str(attitude)
## 'data.frame':    30 obs. of  7 variables:
##  $ rating    : num  43 63 71 61 81 43 58 71 72 67 ...
##  $ complaints: Factor w/ 23 levels "37","40","51",..: 3 13 16 12 19 6 15 17 20 10 ...
##  $ privileges: num  30 51 68 45 56 49 42 50 72 45 ...
##  $ learning  : Factor w/ 23 levels "34","39","42",..: 2 9 19 6 17 4 11 10 18 6 ...
##  $ raises    : Factor w/ 21 levels "43","50","51",..: 9 11 17 4 15 4 13 14 15 10 ...
##  $ critical  : Factor w/ 21 levels "49","54","57",..: 21 9 20 18 17 1 8 6 17 16 ...
##  $ advance   : num  45 47 48 35 47 34 35 41 31 41 ...

First and last 6 observations of the dataset:

 head(attitude)
##   rating complaints privileges learning raises critical advance
## 1     43         51         30       39     61       92      45
## 2     63         64         51       54     63       73      47
## 3     71         70         68       69     76       86      48
## 4     61         63         45       47     54       84      35
## 5     81         78         56       66     71       83      47
## 6     43         55         49       44     54       49      34
 tail(attitude)
##    rating complaints privileges learning raises critical advance
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39

2.(Experimental)Design

How will the experiment be organized and conducted to test the hypothesis?

The experiment will be a multifactor design with 4 factors to test if the variation in attitude ratings can be explained by the results of the complaints, learning, raises, and critical portions of the survey. An ANOVA will be built to test the null hypothesis that the variation in attitude rating cannot be explained by anything besides variation.

What is the rationale for this design?

This design will allow for the analysis of the difference of means between groups and visualize variation amongst groups.

Randomize: What is the randomization Scheme?

This data is random because it is drawn randomly from 35 different employees across 30 random business departments. Of course, there will be a certain degree of monotony due to the natur of the business and the nature of the role that the surveyees held. But for the most part, it was a random survey that allowed for an integer rating scale. these integers were discretely selected for each factor by each surveyee independently.

Replicate: Are there replicates and/or repeated measures?

There are replicates, because the survey was filled out by multiple different employees. Each completed survey is a replication. There are no repeated measures. An example of a repeated measure would be having the same clerk fill out the survey several times.

Blocking: Did you use blocking in the design?

Yes, somewhat. I am blocking the other factors that were measured during the survey, including privileges and advance.I did not block any number of experiments, simply the factors.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Summary Stats for the entire dataset:

  summary(attitude)
##      rating        complaints   privileges       learning      raises  
##  Min.   :40.00   60     : 2   Min.   :30.00   39     : 2   66     : 3  
##  1st Qu.:58.75   61     : 2   1st Qu.:45.00   45     : 2   54     : 2  
##  Median :65.50   70     : 2   Median :51.50   47     : 2   55     : 2  
##  Mean   :64.63   75     : 2   Mean   :53.13   54     : 2   59     : 2  
##  3rd Qu.:71.75   77     : 2   3rd Qu.:62.50   58     : 2   63     : 2  
##  Max.   :85.00   82     : 2   Max.   :83.00   69     : 2   64     : 2  
##                  (Other):18                   (Other):18   (Other):17  
##     critical     advance     
##  80     : 4   Min.   :25.00  
##  78     : 3   1st Qu.:35.00  
##  83     : 3   Median :41.00  
##  74     : 2   Mean   :42.93  
##  77     : 2   3rd Qu.:47.75  
##  49     : 1   Max.   :72.00  
##  (Other):15

Mean Attitude Rating:

 mean(attitude$rating)
## [1] 64.63333

Boxplots for complaints, learning, raises, and critical.

  boxplot(attitude$rating~attitude$complaints, xlab="Handling of employee complaints", ylab="Rating")

  boxplot(attitude$rating~attitude$learning, xlab="Opportunity to learn", ylab="Rating")

  boxplot(attitude$rating~attitude$raises, xlab="Raises based on performance", ylab="Rating")

  boxplot(attitude$rating~attitude$critical, xlab="Too critical", ylab="Rating")

After analyzing the box plots for the following factors, there are a lot of things we can conclude. First and foremost, the plots look rather sloppy. They are still accurate, though. They are just representing the independent measurements taken during the experiments. The scale was so broad that it was highly unlikely for the 35 employees all to choose the exact same numbers for each factor. A simple way to improve the visual appeal, thus readability and usability of these box plots would be to change the survey scale to a 1-10 scale. This way the box plot would look cleaner and be easier to interpret. But this change also may decrease the randomization and customizaton of responses.

The complaints box plot suggests that the higher the survey results for the handling of employee complaints, the higher the department rating. This would make sense, realistically, becasue employees take their personal needs and grievances to heart. If they are not met by their employers, the employee feedback will not be good!

The learning box plot suggests that the lower the survey results for the opportunities to learn, the lower the rating is for the department. This makes sense when applied to the workplace because often times employees are looking for way to better themselves. They are looking for classes, workshops, and opportunities which are hosted, endorsed, or sponsored y their employer. So this relationship is relatively acceptable.

The boxplot for raises is relatively fair. The departments that offer less performance based raises are typically rated lower than those who offer more. The departments that offer more raises often receive higher ratings, and the middle of the road companies, in terms of raises, are averaging middle of the road results in terms of ratings. This positive relationship is certainly feasible when thought about in terms of work environment. Most employees are going to want compensation for their hard work. But, like the plot shows, there are exceptional cases.

The boxplot for critical suggests that employers who are too critical actually rank highly on the attitude rating scale. This is contrary to what I would have expected, but I can definitely understand why. Many employees like structure, and gudance, but not insults and condescendance. This could explain why the highest ranking departments are in the middle of the road when being considered too critical.

Testing

Model 1: ANOVA for the handling of complaints’ affect on rating. In simpler terms, the test is being performed to see if the way departments handle their employees complaints has an impact on the rating of a department.

  model1=aov(attitude$rating~attitude$complaints)
anova(model1)
## Analysis of Variance Table
## 
## Response: attitude$rating
##                     Df Sum Sq Mean Sq F value  Pr(>F)  
## attitude$complaints 22   4077 185.317  5.8964 0.01111 *
## Residuals            7    220  31.429                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The probability that the variation in rating amongst departments is directly caused by randomization is 0.01111. The F statistic came out to 5.8964. This being said, we reject our null hypothesis. Complaint handling could potentially be enough to predict the rating of a department.

Model 2: ANOVA for the oppotunity to learn’s affect on rating. In simpler terms, the test is being performed to see if the resources for growth provided by the department has an impact on the rating of a department.

  model2=aov(attitude$rating~attitude$learning)
anova(model2)
## Analysis of Variance Table
## 
## Response: attitude$rating
##                   Df Sum Sq Mean Sq F value Pr(>F)
## attitude$learning 22   3406  154.82  1.2163 0.4205
## Residuals          7    891  127.29

The probability that the variation in rating amongst departments is directly caused by randomization is 0.4205. The F statistic came out to 1.2163. This being said, we fail to reject our null hypothesis. Learning opportunities alone do not give us enough evidence to accurately predict the department rating.

Model 3: ANOVA for raises based on performance’s affect on rating. In simpler terms, the test is being performed to see if the chance for personal financial growth provided by the department has an impact on the rating of a department.

  model3=aov(attitude$rating~attitude$raises)
anova(model3)
## Analysis of Variance Table
## 
## Response: attitude$rating
##                 Df Sum Sq Mean Sq F value Pr(>F)
## attitude$raises 20 3498.8 174.940  1.9726 0.1477
## Residuals        9  798.2  88.685

The probability that the variation in rating amongst departments is directly caused by randomization is 0.1477. The F statistic came out to 1.9726. This being said, we fail to reject our null hypothesis. Performance based raises alone do not give us enough evidence to accurately predict the department rating.

Model 4: ANOVA for feedback being too critical’s affect on rating. In simpler terms, the test is being performed to see if harsh feedback from the department has an impact on the rating of a department.

  model4=aov(attitude$rating~attitude$critical)
anova(model4)
## Analysis of Variance Table
## 
## Response: attitude$rating
##                   Df Sum Sq Mean Sq F value Pr(>F)
## attitude$critical 20 2785.7  139.29  0.8295 0.6552
## Residuals          9 1511.2  167.92

The probability that the variation in rating amongst departments is directly caused by randomization is 0.6552. The F statistic came out to 0.8295. This being said, we fail to reject our null hypothesis. The fact that feedback is too critical within a department does not give us enough evidence to accurately predict the department rating.

Model 5: ANOVA to test if the integration of all four factors has affects the rating. We already know that complaints plays a significant role, but interacion and relationships among factors may create some variation… (See Diagnostics/Adequacy Checking for details)

Besides for the integration of all four factors, we also had to find the interaction effects of the factors. since there are 4 factors, there will be 6 interaction effects simulated.

Interaction 1: Complaints factor and Learning factor.

  model7=aov(attitude$rating~attitude$complaints*attitude$learning)
anova(model7)
## Warning in anova.lm(model7): ANOVA F-tests on an essentially perfect fit
## are unreliable
## Analysis of Variance Table
## 
## Response: attitude$rating
##                     Df Sum Sq Mean Sq F value Pr(>F)
## attitude$complaints 22   4077 185.317               
## attitude$learning    7    220  31.429               
## Residuals            0      0
  interaction.plot(attitude$complaints, attitude$learning, attitude$rating)

As you can see, there is no interaction between the two factors. The graph is not a great representation of the interaction due to the fact that each level is a single independent integer. But the lack of crossing lines and trends suggest that a 2-way interaction is not present in this relationship.

Interaction 2: Complaints factor and raises factor.

  model8=aov(attitude$rating~attitude$complaints*attitude$raises)
anova(model8)
## Warning in anova.lm(model8): ANOVA F-tests on an essentially perfect fit
## are unreliable
## Analysis of Variance Table
## 
## Response: attitude$rating
##                     Df Sum Sq Mean Sq F value Pr(>F)
## attitude$complaints 22   4077 185.317               
## attitude$raises      7    220  31.429               
## Residuals            0      0
  interaction.plot(attitude$complaints, attitude$raises, attitude$rating)

As you can see, there is no interaction between the two factors. The graph is not a great representation of the interaction due to the fact that each level is a single independent integer. But the lack of crossing lines and trends suggest that a 2-way interaction is not present in this relationship. This graph suggests that the factors are still independent and non-causal.

Interaction 3: Complaints factor and critical factor
  model9=aov(attitude$rating~attitude$complaints*attitude$critical)
anova(model9)
## Analysis of Variance Table
## 
## Response: attitude$rating
##                     Df Sum Sq Mean Sq F value Pr(>F)
## attitude$complaints 22 4077.0 185.317 41.1815 0.1224
## attitude$critical    6  215.5  35.917  7.9815 0.2645
## Residuals            1    4.5   4.500
  interaction.plot(attitude$complaints, attitude$critical, attitude$rating)

As you can see, there is no interaction between the two factors. The graph is not a great representation of the interaction due to the fact that each level is a single independent integer. But the lack of crossing lines and trends suggest that a 2-way interaction is not present in this relationship. This graph suggests that the factors are still independent and non-causal.

Interaction 4: Learning factor and Raises factor

  model10=aov(attitude$rating~attitude$learning*attitude$raises)
anova(model10)
## Warning in anova.lm(model10): ANOVA F-tests on an essentially perfect fit
## are unreliable
## Analysis of Variance Table
## 
## Response: attitude$rating
##                   Df Sum Sq Mean Sq F value Pr(>F)
## attitude$learning 22   3406  154.82               
## attitude$raises    7    891  127.29               
## Residuals          0      0
  interaction.plot(attitude$learning, attitude$raises, attitude$rating)

As you can see, there is no interaction between the two factors. The graph is not a great representation of the interaction due to the fact that each level is a single independent integer. But the lack of crossing lines and trends suggest that a 2-way interaction is not present in this relationship. This graph suggests that the factors are still independent and non-causal.

Interaction 5: Learning factor and Critical Factor

  model11=aov(attitude$rating~attitude$learning*attitude$critical)
anova(model11)
## Warning in anova.lm(model11): ANOVA F-tests on an essentially perfect fit
## are unreliable
## Analysis of Variance Table
## 
## Response: attitude$rating
##                   Df Sum Sq Mean Sq F value Pr(>F)
## attitude$learning 22   3406  154.82               
## attitude$critical  7    891  127.29               
## Residuals          0      0
  interaction.plot(attitude$learning, attitude$critical, attitude$rating)

As you can see, there is no interaction between the two factors. The graph is not a great representation of the interaction due to the fact that each level is a single independent integer. But the lack of crossing lines and trends suggest that a 2-way interaction is not present in this relationship. This graph suggests that the factors are still independent and non-causal.

Interaction 6: Raises Factor and Critical Factor

  model12=aov(attitude$rating~attitude$raises*attitude$critical)
anova(model12)
## Analysis of Variance Table
## 
## Response: attitude$rating
##                   Df Sum Sq Mean Sq F value Pr(>F)
## attitude$raises   20 3498.8 174.940  0.6333 0.7692
## attitude$critical  7  245.7  35.095  0.1270 0.9838
## Residuals          2  552.5 276.250
  interaction.plot(attitude$raises, attitude$critical, attitude$rating)

As you can see, there is no interaction between the two factors. The graph is not a great representation of the interaction due to the fact that each level is a single independent integer. But the lack of crossing lines and trends suggest that a 2-way interaction is not present in this relationship. This graph suggests that the factors are still independent and non-causal.

Clearly, after revieing the 2 way interactions within the experiment, there are little to no interactions amongst or between these independent factors. Again, If I were to reduce the scale or make the levels ranges rather than independent integers, we may experience more relational interactions. But the way that this data is set up to influence the rating, it was not necessary to calibrate it or manipulate it. The data that did appear on the graphs often moved parallel to each other, which is an instant indicator of NO INTERACTION EFFECTS.

Estimation

To best estimate the rating of a department, one could simply use the complaints factor, since the fit is near perfect to the actual rating. Likewise, one can integrate the factors, and achieve another essentially perfect fit.

Diagnostics

When attempting to include complaints in the final ANOVA model, we get an error message since it is already an accepted method of presiction. So, I removed complaints from the equation and will observe the interaction between the other three factors. After doing that, I got another message saying that the fit was essentially perfect again, leading me to believe that when combining the factors, there is a high probability that we would be able to predict the ratings of the departments. This indicates that the results of our analysis are very accurate.

Below you can find the qqnorm plots for the models individually. Take particular notice of the clustering and straight lines present in some of the more accurate predictor models.

  qqnorm(residuals(model1))
qqline(residuals(model1))

  qqnorm(residuals(model3))
qqline(residuals(model3))

  qqnorm(residuals(model4))
qqline(residuals(model4))

The First model is the complaints model. The other two are raises, and critical (the ones which we determined, in Testing, could not be used alone as accurate predictors.)

4. References to the Literature.

N/A

5. Appendices

Raw Data: The Chatterjee-Price Attitude Data.

  attitude <- read.csv("~/DoE/attitude.csv")
 View(attitude)

All Complete R code included in the RMarkDown.