STAT 565 Final

Ryan Masson | Portland State University | 3/20/24

Problem 1

(a)

If we want to study the effect of each element added to the glass, we need to randomly assign each element (the treatments) to groups of the glass (the units). This random assignment is essential to any analysis we would conduct on the data later. It would also be ideal to have a balanced design, where there is an equal number of units assigned to each treatment. We should pay attention to these issues, as well as anything else that could create systematic error or bias in the results. We want to avoid systematic error so we can minimize the random error, or natural variance in the phenomenon we are studying.

(b)

Show the code
setwd('~/Documents/psu/STAT 565/final/')
glass <- read.table('final1-1.txt', header = TRUE)
names(glass)[names(glass) == "Respnse"] <- "Response"
glass$element_factor <- as.factor(glass$Element) 

# fit ANOVA model
glass.model <- aov(Response~element_factor, data = glass)
print(anova(glass.model))
Analysis of Variance Table

Response: Response
               Df  Sum Sq Mean Sq F value    Pr(>F)    
element_factor 10 2149.13 214.913  17.749 4.892e-15 ***
Residuals      65  787.06  12.109                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I fitted a one-way ANOVA model of the response as a function of the level of the element factor. The overall F-test p-value for the separate means model is less than 0.001. This means I reject the null hypothesis that all the treatment means are equal. Generally speaking, adding different elements certainly does change the property of glass that we are studying.

(c)

Show the code
for (i in 1:76) {
  if (glass$Element[i] == 'Li' | glass$Element[i] == "Na" | glass$Element[i] == "K") {
    glass$groups[i] <- "alkali_metal"
  }
  if (glass$Element[i] == 'Ca' | glass$Element[i] == "Sr") {
    glass$groups[i] <- "alkaline_earth_metal"
  }
  if (glass$Element[i] == 'Al') {
    glass$groups[i] <- "post_transition_metal"
  }
  if (glass$Element[i] == 'Y') {
    glass$groups[i] <- "transition_metal"
  }
  if (glass$Element[i] == 'La' | glass$Element[i] == "Ce" | glass$Element[i] == "Eu" | glass$Element[i] == "Er") {
    glass$groups[i] <- "lanthanide"
  }
}

glass$groups_factor <- as.factor(glass$groups)

elementgroup.pairwise.test = pairwise.t.test(x = glass$Response, g = glass$groups_factor, paired = F,p.adjust.method = "bonferroni")
print(elementgroup.pairwise.test)

    Pairwise comparisons using t tests with pooled SD 

data:  glass$Response and glass$groups_factor 

                      alkali_metal alkaline_earth_metal lanthanide
alkaline_earth_metal  0.10844      -                    -         
lanthanide            0.00013      1.00000              -         
post_transition_metal 0.08883      1.00000              1.00000   
transition_metal      3.4e-06      0.01212              0.11211   
                      post_transition_metal
alkaline_earth_metal  -                    
lanthanide            -                    
post_transition_metal -                    
transition_metal      0.18302              

P value adjustment method: bonferroni 

I performed all pairwise tests using Bonferroni adjusted p-values. See the printed object for the p-values of comparisons between each element group.

(d)

Show the code
### NEED TO ADAPT THIS CODE
#library(multcomp)
#K = matrix(c(0.5,0.5,-1/3,-1/3,-1/3), 1)
#glht(model31.1, linfct = K)
#K = rbind(c(1,1,0,-1,-1),
#          c(0.5,0.5,-1/3,-1/3,-1/3),
#          c(1,0,0,0,-1)
#      )
#summary(glht(model31.1, linfct = K))

#library(DescTools)
#ScheffeTest(model31.aov)
#ScheffeTest(model31.aov,which = "tempC",contrasts = t(K),conf.level = 0.99)

(e)

Show the code
library(agricolae)
glass.model.REGWR = REGW.test(glass.model,"element_factor")
print(glass.model.REGWR)
$statistics
   MSerror Df     Mean       CV
  12.10859 65 31.61854 11.00538

$parameters
  test         name.t ntr alpha
  REGW element_factor  11  0.05

$regw
NULL

$means
   Response      std r       se      Min      Max      Q25      Q50      Q75
Al 32.50330 3.867774 7 1.315218 25.78987 38.48055 31.29189 32.30957 34.17968
Ca 32.88923 1.998121 7 1.315218 30.18710 36.03191 31.57584 32.80098 34.02649
Ce 32.59060 1.439982 7 1.315218 31.14180 34.79646 31.63426 32.20274 33.36235
Er 40.89458 4.377775 7 1.315218 34.84494 46.72323 38.14279 40.95398 43.72718
Eu 30.95724 1.406045 7 1.315218 29.12784 33.76350 30.50582 30.87106 30.96331
K  20.73267 6.107409 7 1.315218 15.28081 31.05793 16.05358 18.50718 24.08778
La 29.50194 1.197264 7 1.315218 28.31012 31.95274 28.93102 29.17763 29.60553
Li 33.30733 2.424102 6 1.420598 31.24266 37.20922 31.62013 32.23587 34.68809
Na 26.17300 5.159950 7 1.315218 20.65303 34.90209 22.24027 25.16100 29.00718
Sr 29.33388 2.972042 7 1.315218 26.24101 33.51035 27.33755 28.26171 31.32451
Y  39.16136 3.149597 7 1.315218 34.98280 43.07326 36.53478 40.22912 41.38741

$comparison
NULL

$groups
   Response groups
Er 40.89458      a
Y  39.16136      a
Li 33.30733      b
Ca 32.88923      b
Ce 32.59060      b
Al 32.50330      b
Eu 30.95724     bc
La 29.50194     bc
Sr 29.33388     bc
Na 26.17300     cd
K  20.73267      d

attr(,"class")
[1] "group"

To control the strong familywise error rate, I use the Ryan, Einot and Gabriel and Welsch multiple range test (REGW). The strong familywise error rate is the probability of making any false discoveries at all in the analysis. This is a rather stringent error rate. More stringent procedures are less powerful: as we restrict the Type I error rate, it becomes more difficult to make rejections and detect significant effects. This can be seen in the output from the REGWR procedure, because the $comparison field is NULL. There were no significant pairwise comparisons for the elements. This contrasts with the above pairwise comparison procedure, which did find some differences.

(f)

Element Er, Erbium, has the largest mean response.

Problem 2

(a)

Show the code
patents <- read.table("final2.txt", header = TRUE)
#patents$Year <- as.factor(patents$Year)
patents$Institution <- as.factor(patents$Institution)
patents$Expenditure <- as.factor(patents$Expenditure)
patent_model <- aov(Licenses ~ Year*Institution*Expenditure, data = patents)

When I convert all the factor columns to the factor data type, I get the warning “ANOVA F-tests on an essentially perfect fit are unreliable,” and the assumptions checks seen below seem totally wrong. So instead I am going to treat the Year variable as a continuous variable, because it seemingly does not break the whole model.

(b)

Show the code
summary(patent_model)
                  Df Sum Sq Mean Sq F value   Pr(>F)    
Year               1    615     615   6.120   0.0147 *  
Institution       31 449573   14502 144.242  < 2e-16 ***
Year:Institution  31  16196     522   5.197 1.35e-11 ***
Residuals        128  12869     101                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(c)

Year and Institution have significant main effects on the response, the number of patent licenses.

Show the code
(coef(patent_model))
                                                    (Intercept) 
                                                  -7.958000e+02 
                                                           Year 
                                                   4.000000e-01 
                         InstitutionCedars-Sinai_Medical_Center 
                                                  -5.769048e+02 
                                    InstitutionCleveland_Clinic 
                                                   1.357810e+02 
                           InstitutionColorado_State_University 
                                                  -4.124448e+03 
                                  InstitutionCornell_University 
                                                   4.591682e+04 
                                   InstitutionDrexel_University 
                                                  -1.262257e+03 
                                     InstitutionDuke_University 
                                                   1.420700e+04 
                            InstitutionEast_Carolina_University 
                                                  -2.365619e+02 
          InstitutionH_Lee_Moffitt_Cancer_Ctr_&_Res_Institution 
                                                  -3.450095e+02 
                InstitutionHackensack_University_Medical_Center 
                                                   7.958000e+02 
                               InstitutionIowa_State_University 
                                                  -7.599933e+03 
                            InstitutionJohns_Hopkins_University 
                                                  -4.835581e+03 
         InstitutionKansas_State_University_Research_Foundation 
                                                  -6.152781e+03 
                   InstitutionLouisiana_State_University_System 
                                                  -4.243895e+03 
       InstitutionMassachusetts_Institution_of_Technology_(MIT) 
                                                  -2.029356e+04 
                           InstitutionMichigan_State_University 
                                                  -1.182377e+04 
                   InstitutionMichigan_Technological_University 
                                                   2.474790e+03 
                            InstitutionMontana_State_University 
                                                   2.772076e+03 
                     InstitutionNorth_Carolina_State_University 
                                                  -1.604800e+04 
                        InstitutionNorthern_Illinois_University 
                                                   8.538571e+02 
                  InstitutionOregon_Health_&_Science_University 
                                                  -1.248579e+04 
                               InstitutionPenn_State_University 
                                                   9.913048e+02 
                                InstitutionPrinceton_University 
                                                   6.666667e-01 
                                     InstitutionRice_University 
                                                   2.703686e+03 
               InstitutionSt._Jude_Children's_Research_Hospital 
                                                   1.386705e+03 
                         InstitutionTexas_A&M_University_System 
                                                  -4.980305e+03 
                               InstitutionUniversity_of_Alabama 
                                                   7.991333e+02 
                     InstitutionUniversity_of_California_System 
                                                  -9.054867e+03 
          InstitutionUniversity_of_Kentucky_Research_Foundation 
                                                   2.299619e+03 
     InstitutionUniversity_of_North_Texas_Health_Science_Center 
                                                   6.833524e+02 
                                InstitutionUniversity_of_Toledo 
                                                   2.415733e+03 
                                  InstitutionWistar_Institution 
                                                  -1.524381e+04 
                    Year:InstitutionCedars-Sinai_Medical_Center 
                                                   2.857143e-01 
                               Year:InstitutionCleveland_Clinic 
                                                  -5.714286e-02 
                      Year:InstitutionColorado_State_University 
                                                   2.057143e+00 
                             Year:InstitutionCornell_University 
                                                  -2.274286e+01 
                              Year:InstitutionDrexel_University 
                                                   6.285714e-01 
                                Year:InstitutionDuke_University 
                                                  -7.000000e+00 
                       Year:InstitutionEast_Carolina_University 
                                                   1.142857e-01 
     Year:InstitutionH_Lee_Moffitt_Cancer_Ctr_&_Res_Institution 
                                                   1.714286e-01 
           Year:InstitutionHackensack_University_Medical_Center 
                                                  -4.000000e-01 
                          Year:InstitutionIowa_State_University 
                                                   3.800000e+00 
                       Year:InstitutionJohns_Hopkins_University 
                                                   2.457143e+00 
    Year:InstitutionKansas_State_University_Research_Foundation 
                                                   3.057143e+00 
              Year:InstitutionLouisiana_State_University_System 
                                                   2.114286e+00 
  Year:InstitutionMassachusetts_Institution_of_Technology_(MIT) 
                                                   1.011429e+01 
                      Year:InstitutionMichigan_State_University 
                                                   5.885714e+00 
              Year:InstitutionMichigan_Technological_University 
                                                  -1.228571e+00 
                       Year:InstitutionMontana_State_University 
                                                  -1.371429e+00 
                Year:InstitutionNorth_Carolina_State_University 
                                                   8.000000e+00 
                   Year:InstitutionNorthern_Illinois_University 
                                                  -4.285714e-01 
             Year:InstitutionOregon_Health_&_Science_University 
                                                   6.228571e+00 
                          Year:InstitutionPenn_State_University 
                                                  -4.857143e-01 
                           Year:InstitutionPrinceton_University 
                                                   3.755374e-12 
                                Year:InstitutionRice_University 
                                                  -1.342857e+00 
          Year:InstitutionSt._Jude_Children's_Research_Hospital 
                                                  -6.857143e-01 
                    Year:InstitutionTexas_A&M_University_System 
                                                   2.485714e+00 
                          Year:InstitutionUniversity_of_Alabama 
                                                  -4.000000e-01 
                Year:InstitutionUniversity_of_California_System 
                                                   4.600000e+00 
     Year:InstitutionUniversity_of_Kentucky_Research_Foundation 
                                                  -1.142857e+00 
Year:InstitutionUniversity_of_North_Texas_Health_Science_Center 
                                                  -3.428571e-01 
                           Year:InstitutionUniversity_of_Toledo 
                                                  -1.200000e+00 
                             Year:InstitutionWistar_Institution 
                                                   7.571429e+00 

(d)

Show the code
# plot residuals against time
plot(patents$Year, patent_model$residuals)

Show the code
# normality of residuals check (npp)
qqnorm(residuals(patent_model))
qqline(residuals(patent_model))

Show the code
# equal variance check (residuals against fitted)
plot(patent_model$fitted.values, patent_model$residuals, xlab = "Fitted values", ylab = "Residuals", main = "Residuals vs. Fitted")

The p-value for the Year factor is significant, indicating that there is likely a temporal trend. The plot of residuals against time could also show this. The two other plots show a departure from normality in the residuals and an increasing variance across the range of fitted values. This model is violating many assumptions, so the standard analysis may not be adequate here.

(e)

The issue of the autocorrelation will need to be addressed, along with the other assumptions. Conclusions about the effects of the institution and expenditure factors cannot be reached until these are dealt with. Also, more extensive analysis of the interactions than what I did above could help improve the understanding of institutions’ patent output. More factors could be added to the data to analyze.