Bok, Joonhyuk

Rensselaer Polytechnic Institute

Oct 10 2016 version 3

1. Setting

1.1 System under test

Among the 100 interesting data sets, we select Global health data which would be health for identifying high-impact ways to improve world health

In Global health data, ‘Mental health’ is selected, which includes Mental health governance (3 factors: legislation, plan, and policy), Human resources (1 factor: psychiatrists), and Suicide rates (1 response variable).

The data set was collected for examining Effects of Mental health care on Suicide rate and consist of 5 variables and 160 observations.

Mental <- read.table("C:/Users/bokjh3/Desktop/Global health data.csv", header=T, sep=",")
head(Mental)
##       country X2012.suicide.rate X2011.legislation X2011.plan X2011.policy
## 1 Afghanistan                4.0                 2          2            2
## 2     Albania                6.5                 2          2            2
## 3     Algeria                1.8                 2          2            2
## 4      Angola               10.6                 1          2            1
## 5     Armenia                3.3                 2          1            1
## 6   Australia               11.6                 2          2            2
##   X2011.Psychiatrists
## 1                   1
## 2                   1
## 3                   1
## 4                   1
## 5                   1
## 6                   2

1.2 Factors and Levels

Definitions of the factors in dataset are as below. The levels of factors are 2(no or yes, below average or above average)

  • 2011 legislation : Existence of dedicated mental health legislation in 2011 (No = 1, Yes = 2)
  • 2011 plan: Existence of a mental health plan in 2011 (No = 1, Yes = 2)
  • 2011 policy: Existence of an officially approved mental health policy in 2011 (No = 1, Yes = 2)
  • 2011 psychiatrists: Psychiatrists working in mental health per 100,000 population in 2011 (below average = 1, above average = 2)
str(Mental)
## 'data.frame':    160 obs. of  6 variables:
##  $ country            : Factor w/ 160 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ X2012.suicide.rate : num  4 6.5 1.8 10.6 3.3 11.6 15.6 1.7 7.2 6.6 ...
##  $ X2011.legislation  : int  2 2 2 1 2 2 2 2 1 1 ...
##  $ X2011.plan         : int  2 2 2 2 1 2 2 2 2 2 ...
##  $ X2011.policy       : int  2 2 2 1 1 2 1 2 2 2 ...
##  $ X2011.Psychiatrists: int  1 1 1 1 1 2 2 2 2 1 ...

1.4 Response variables

The response variable is 2012 suicide rate. The 2012 suicide rate is defined as Suicide mortality rate per 100 000 in 2012. 2012 suicide rate is a continuous variable.

1.5 The Data: How is it organized and what does it look like?

There is time lag between factors (year 2011) and response variable (year 2012) because of the constraints of data from Global health data. And, the countries which have missing value are left out of analysis.

2. Experimental Design

2.1 How will the experiment be organized and conducted to test the hypothesis?

We are using 4 factors with two levels analyzing their main and interaction effects on suicide rate. In this experiment, the null hypothesis is that there is no statistically significant main or interaction effect present within certain factors

2.2 What is the rationale for this design?

Some people might argue that suicide is personal thing. However, we think that we could reduce suicide rate through Institutional Arrangements such as government policy or aid from others. Therefore, we try to investigate the impact of 4 factors such as government legistlation, government plan, government policy and accessibility of psychiatrists on reducing suicide rate.

2.3 Randomize: What is the Randomization Scheme?

Randomization is a technique used to balance the effect of extraneous or uncontrollable conditions that can impact the results of an experiment. In this experiment, we do not consider randomization becasue we include all countries around world (it is not sample, but it is population).

2.4 Replicate: Are there replicates and/or repeated measures?

Replicates are multiple experimental runs with the same factor settings (levels). Replicates are subject to the same sources of variability, independently of each other. We can replicate combinations of factor levels, groups of factor level combinations, or entire designs. In our experiments, Global Health data is not conducted without this repeated measurement.

2.5 Block: Did you use blocking in the design?

In experimental design, blocking is a technique used to deal with nuisance factors that may affect the results of the experiment. The experiment is organized into blocks, where the nuisance factor is maintained at a constant level in each block. Blocking is unnecessary in this experimental design, because the factors related with this experiment are just questions (not treatment).

3. Statistical Analysis

3.1 Graphics and descriptive summary

At first, we need to investigate our dataset.

summary(Mental)
##         country    X2012.suicide.rate X2011.legislation   X2011.plan   
##  Afghanistan:  1   Min.   : 0.300     Min.   :1.0       Min.   :1.000  
##  Albania    :  1   1st Qu.: 3.775     1st Qu.:1.0       1st Qu.:2.000  
##  Algeria    :  1   Median : 7.550     Median :2.0       Median :2.000  
##  Angola     :  1   Mean   : 9.401     Mean   :1.6       Mean   :1.775  
##  Armenia    :  1   3rd Qu.:13.000     3rd Qu.:2.0       3rd Qu.:2.000  
##  Australia  :  1   Max.   :36.800     Max.   :2.0       Max.   :2.000  
##  (Other)    :154                                                       
##   X2011.policy   X2011.Psychiatrists
##  Min.   :1.000   Min.   :1.000      
##  1st Qu.:1.000   1st Qu.:1.000      
##  Median :2.000   Median :1.000      
##  Mean   :1.656   Mean   :1.306      
##  3rd Qu.:2.000   3rd Qu.:2.000      
##  Max.   :2.000   Max.   :2.000      
## 

In the section, the levels of each factor are shown in a boxplot to analyze the main effects of factors over the response variable, suicide rate. Keep in mind that No is 1 and Yes is 2 (in the case of psychiatrists, ‘Under average’ is 1, ’Above average is 2)

boxplot(Mental$X2012.suicide.rate~Mental$X2011.legislation, xlab="Mental health legislation", ylab="Suicide rate")
title("Impact of legislation on Suicide")

boxplot(Mental$X2012.suicide.rate~Mental$X2011.plan, xlab="Mental health plan", ylab="Suicide rate")
title("Impact of plan on Suicide")

boxplot(Mental$X2012.suicide.rate~Mental$X2011.policy, xlab="Mental health policy", ylab="Suicide rate")
title("Impact of policy on Suicide")

boxplot(Mental$X2012.suicide.rate~Mental$X2011.Psychiatrists, xlab="Psychiatrists working in mental health", ylab="Suicide rate")
title("Impact of psychiatrists on Suicide")

3.2 Testing

In this section, we examine main effects of all factors by using ANOVA. The factors of legislation and psychiatrists are statistically significant at even 1% significance level.

me = aov(Mental$X2012.suicide.rate~Mental$X2011.legislation+Mental$X2011.plan+Mental$X2011.plan+Mental$X2011.policy+Mental$X2011.Psychiatrists)
summary(me)
##                             Df Sum Sq Mean Sq F value   Pr(>F)    
## Mental$X2011.legislation     1    386   385.6   9.588  0.00233 ** 
## Mental$X2011.plan            1     45    45.5   1.131  0.28923    
## Mental$X2011.policy          1      7     6.5   0.163  0.68714    
## Mental$X2011.Psychiatrists   1   1474  1474.3  36.660 1.02e-08 ***
## Residuals                  155   6233    40.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We could examine the ANOVA results for interaction effects as below. There are no interaction effects in this experiments.

legislation x plan - p > 0.05, it is not significant interaction effect.

ie_12 <- aov(Mental$X2012.suicide.rate~Mental$X2011.legislation*Mental$X2011.plan)
anova(ie_12)
## Analysis of Variance Table
## 
## Response: Mental$X2012.suicide.rate
##                                             Df Sum Sq Mean Sq F value
## Mental$X2011.legislation                     1  385.6  385.57  7.8008
## Mental$X2011.plan                            1   45.5   45.48  0.9202
## Mental$X2011.legislation:Mental$X2011.plan   1    3.6    3.62  0.0732
## Residuals                                  156 7710.7   49.43        
##                                              Pr(>F)   
## Mental$X2011.legislation                   0.005877 **
## Mental$X2011.plan                          0.338914   
## Mental$X2011.legislation:Mental$X2011.plan 0.787109   
## Residuals                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(Mental$X2011.plan,Mental$X2011.legislation,Mental$X2012.suicide.rate)

legislation x policy - p > 0.05, it is not significant interaction effect.

ie_13 <- aov(Mental$X2012.suicide.rate~Mental$X2011.legislation*Mental$X2011.policy)
anova(ie_13)
## Analysis of Variance Table
## 
## Response: Mental$X2012.suicide.rate
##                                               Df Sum Sq Mean Sq F value
## Mental$X2011.legislation                       1  385.6  385.57  7.8011
## Mental$X2011.policy                            1    0.9    0.94  0.0190
## Mental$X2011.legislation:Mental$X2011.policy   1   48.5   48.47  0.9807
## Residuals                                    156 7710.4   49.43        
##                                                Pr(>F)   
## Mental$X2011.legislation                     0.005876 **
## Mental$X2011.policy                          0.890530   
## Mental$X2011.legislation:Mental$X2011.policy 0.323564   
## Residuals                                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(Mental$X2011.policy,Mental$X2011.legislation,Mental$X2012.suicide.rate)

legislation x psychiatrists - p > 0.05, it is not significant interaction effect.

ie_14 <- aov(Mental$X2012.suicide.rate~Mental$X2011.legislation*Mental$X2011.Psychiatrists)
anova(ie_14)
## Analysis of Variance Table
## 
## Response: Mental$X2012.suicide.rate
##                                                      Df Sum Sq Mean Sq
## Mental$X2011.legislation                              1  385.6  385.57
## Mental$X2011.Psychiatrists                            1 1476.6 1476.60
## Mental$X2011.legislation:Mental$X2011.Psychiatrists   1    4.8    4.80
## Residuals                                           156 6278.4   40.25
##                                                     F value    Pr(>F)    
## Mental$X2011.legislation                             9.5803  0.002332 ** 
## Mental$X2011.Psychiatrists                          36.6890 9.971e-09 ***
## Mental$X2011.legislation:Mental$X2011.Psychiatrists  0.1192  0.730389    
## Residuals                                                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(Mental$X2011.Psychiatrists,Mental$X2011.legislation,Mental$X2012.suicide.rate)

plan x policy - p > 0.05, it is not significant interaction effect.

ie_23 <- aov(Mental$X2012.suicide.rate~Mental$X2011.plan*Mental$X2011.policy)
anova(ie_23)
## Analysis of Variance Table
## 
## Response: Mental$X2012.suicide.rate
##                                        Df Sum Sq Mean Sq F value  Pr(>F)  
## Mental$X2011.plan                       1   89.1  89.051  1.7620 0.18631  
## Mental$X2011.policy                     1    1.4   1.430  0.0283 0.86662  
## Mental$X2011.plan:Mental$X2011.policy   1  170.7 170.704  3.3776 0.06799 .
## Residuals                             156 7884.2  50.540                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(Mental$X2011.policy,Mental$X2011.plan,Mental$X2012.suicide.rate)

plan x psychiatrists - p > 0.05, it is not significant interaction effect.

ie_24 <- aov(Mental$X2012.suicide.rate~Mental$X2011.plan*Mental$X2011.Psychiatrists)
anova(ie_24)
## Analysis of Variance Table
## 
## Response: Mental$X2012.suicide.rate
##                                               Df Sum Sq Mean Sq F value
## Mental$X2011.plan                              1   89.1   89.05  2.2028
## Mental$X2011.Psychiatrists                     1 1749.1 1749.08 43.2665
## Mental$X2011.plan:Mental$X2011.Psychiatrists   1    0.9    0.87  0.0214
## Residuals                                    156 6306.4   40.43        
##                                                 Pr(>F)    
## Mental$X2011.plan                               0.1398    
## Mental$X2011.Psychiatrists                   6.835e-10 ***
## Mental$X2011.plan:Mental$X2011.Psychiatrists    0.8838    
## Residuals                                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(Mental$X2011.Psychiatrists,Mental$X2011.plan,Mental$X2012.suicide.rate)

policy x psychiatrists - p > 0.05, it is not significant interaction effect.

ie_24 <- aov(Mental$X2012.suicide.rate~Mental$X2011.policy*Mental$X2011.Psychiatrists)
anova(ie_24)
## Analysis of Variance Table
## 
## Response: Mental$X2012.suicide.rate
##                                                 Df Sum Sq Mean Sq F value
## Mental$X2011.policy                              1   32.3   32.35  0.8007
## Mental$X2011.Psychiatrists                       1 1798.0 1797.98 44.5066
## Mental$X2011.policy:Mental$X2011.Psychiatrists   1   13.0   12.99  0.3216
## Residuals                                      156 6302.1   40.40        
##                                                   Pr(>F)    
## Mental$X2011.policy                               0.3723    
## Mental$X2011.Psychiatrists                     4.169e-10 ***
## Mental$X2011.policy:Mental$X2011.Psychiatrists    0.5715    
## Residuals                                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(Mental$X2011.Psychiatrists,Mental$X2011.policy,Mental$X2012.suicide.rate)

3.3 Diagnostics / Model Adequacy Checking

Quantile-Quantile (Q-Q) plots are graphs used to verify the distributional assumption for a set of data. The relatively linear relationship for all data sets justifies the use of ANOVA to test for the significant difference. However, when we check for main effects, the data is not normally distributed in my experiment.

qqnorm(residuals(me))
qqline(residuals(me))

Residuals vs. Fits Plot is a common graph used in residual analysis. It is a scatter plot of residuals as a function of fitted values, or the estimated responses. There are slightly outliers in the ‘suicide rate’ response variable when we check for main effects.

plot(fitted(me),residuals(me))

4. References to the literature

Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition