Final-Exam-STA-580

1. An educator believes that a new reading curriculum will help elementary school students improve some aspects of their reading ability. She arranges for a third-grade class of 21 students to take part in the new curriculum for an eight-week period. A control classroom of 23 third-graders follows the standard curriculum. At the end of the eight weeks, all students are given a Degree of Reading Power (DRP) test, which measures aspects of reading that the treatment is designed to improve “DRPscores.txt”. Test the hypothesis that the treatment group performed better than the control group on the test. State your conclusions.

require(UsingR)

## Loading required package: UsingR

## Loading required package: MASS

## Warning: package 'MASS' was built under R version 3.2.5

## Loading required package: HistData

## Warning: package 'HistData' was built under R version 3.2.5

## Loading required package: Hmisc

## Warning: package 'Hmisc' was built under R version 3.2.5

## Loading required package: lattice

## Warning: package 'lattice' was built under R version 3.2.5

## Loading required package: survival

## Warning: package 'survival' was built under R version 3.2.5

## Loading required package: Formula

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.2.5

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

## 
## Attaching package: 'UsingR'

## The following object is masked from 'package:survival':
## 
##     cancer

getwd()

## [1] "/Users/yusufsultan"

setwd("/Users/yusufsultan")
DRP1 <- read.csv("DRPscores.csv")
str(DRP1)

## 'data.frame':    44 obs. of  2 variables:
##  $ Treatment: Factor w/ 2 levels "Control","Treat": 2 2 2 2 2 2 2 2 2 2 ...
##  $ X24      : int  24 56 43 59 58 52 71 62 43 54 ...

head(DRP1)

##   Treatment X24
## 1     Treat  24
## 2     Treat  56
## 3     Treat  43
## 4     Treat  59
## 5     Treat  58
## 6     Treat  52

tail(DRP1)

##    Treatment X24
## 39   Control  55
## 40   Control  54
## 41   Control  28
## 42   Control  20
## 43   Control  48
## 44   Control  85

Treat=DRP1[1:21,2]
Control=DRP1[22:44,2]
aggregate(X24 ~ Treatment ,data =DRP1 ,var )

##   Treatment      X24
## 1   Control 294.0791
## 2     Treat 121.1619

drpTTest <- t.test(Treat,Control)
drpTTest

## 
##  Welch Two Sample t-test
## 
## data:  Treat and Control
## t = 2.3109, df = 37.855, p-value = 0.02638
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   1.23302 18.67588
## sample estimates:
## mean of x mean of y 
##  51.47619  41.52174

x <- c(Treat)
y <- c(Control)
plot(DRP1,x)

plot(DRP1,y)

qplot(X24,Treatment,data=DRP1,geom = c("point","smooth"))

## `geom_smooth()` using method = 'loess'

# now twst for normality of X24
shapiro.test(DRP1$X24)

## 
##  Shapiro-Wilk normality test
## 
## data:  DRP1$X24
## W = 0.97522, p-value = 0.4554

# Shapiro test normality for Treat & Control
shapiro.test(DRP1$X24[DRP1$Treatment == "Treat"])

## 
##  Shapiro-Wilk normality test
## 
## data:  DRP1$X24[DRP1$Treatment == "Treat"]
## W = 0.96635, p-value = 0.6517

shapiro.test(DRP1$X24[DRP1$Treatment == "Control"])

## 
##  Shapiro-Wilk normality test
## 
## data:  DRP1$X24[DRP1$Treatment == "Control"]
## W = 0.97181, p-value = 0.7322

# the tests not fail now visually(Both they look fill normally distribution ) 
ggplot(DRP1 ,aes(x=X24,fill =Treatment))+
    geom_histogram(binwidth = .5 ,alpha =1/2 )

require(plyr)

## Loading required package: plyr

## Warning: package 'plyr' was built under R version 3.2.5

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:Hmisc':
## 
##     is.discrete, summarize

X24Summary <- ddply(DRP1 ,"Treatment",summarize,X24.mean=mean(X24),X24.sd=sd(X24),
                     Lower= X24.mean-2*X24.sd/sqrt(NROW(X24)),Upper =X24.mean+2*X24.sd/sqrt(NROW(X24)))
X24Summary

##   Treatment X24.mean   X24.sd    Lower    Upper
## 1   Control 41.52174 17.14873 34.37022 48.67326
## 2     Treat 51.47619 11.00736 46.67219 56.28019

ggplot(X24Summary ,aes(x=X24.mean,y =Treatment))+
    geom_point()+
    geom_errorbarh(aes(xmin=Lower ,xmax = Upper),height =.2)

# Plot showing the mean and two standard error of X24 broken down by the Treatment

2. A company is rated as acceptable in quality control if more than 90% of units produced at its facilities are found to be defect-free, and it is rated as excellent in quality control if more than 95% are defect-free. Suppose that a random sample of 500 units is selected and tested for defects and that 18 units are found to have defects.

======

Does this data show at the 5% level of significance that the company is acceptable?

=====

Does it show that the company is excellent? Construct a 95% confidence interval for proportion of defect-free units.

======= c. What sample size should a reliability engineer use to estimate this proportion to within 2% with 95% confidence if it is assumed that the proportion of units that are defect-free is at least 90%?

-==========-

3. A large corporation requires that its employees attend a 1-day sexual harassment seminar. The Director of Human Resources of this corporation would like to determine whether or not the information presented in this seminar is retained over a long period of time. To this end, a random sample of 40 employees is selected from recently hired employees who are scheduled to take this seminar. Each of the employees in this sample completes a test of knowledge concerning sexual harassment and related legal issues immediately after the seminar and then takes a similar test 6 months later. The scores are contained in the file “harass.txt”.

require(UsingR)
getwd()

## [1] "/Users/yusufsultan"

setwd("/Users/yusufsultan")
harass <- read.table('harass.txt',sep = '\t',header = T)
str(harass)

## 'data.frame':    40 obs. of  3 variables:
##  $ Employee: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Test1   : int  72 92 93 87 90 84 99 87 85 88 ...
##  $ Test2   : int  36 91 83 89 74 95 107 94 93 55 ...

head(harass)

##   Employee Test1 Test2
## 1        1    72    36
## 2        2    92    91
## 3        3    93    83
## 4        4    87    89
## 5        5    90    74
## 6        6    84    95

tail(harass)

##    Employee Test1 Test2
## 35       35    70    79
## 36       36    87    94
## 37       37    84    62
## 38       38    94   109
## 39       39    72    64
## 40       40    95    52

t.test(harass$Test1,harass$Test2 ,paired = TRUE )

## 
##  Paired t-test
## 
## data:  harass$Test1 and harass$Test2
## t = 1.5301, df = 39, p-value = 0.1341
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.754618 12.654618
## sample estimates:
## mean of the differences 
##                    5.45

qplot(Test1,Test2,data=harass,geom = c("point","smooth"))

## `geom_smooth()` using method = 'loess'

TestDiff <- harass$Test1 - harass$Test2
ggplot(harass,aes(x=Test1 - Test2))+
  geom_density()+
  geom_vline(xintercept = mean(TestDiff))+
  geom_vline(xintercept = mean(TestDiff)+
  2*c(-1,1)*sd(TestDiff)/sqrt(nrow(harass)),linetype = 2)

# Density plot showing the difference of test 1 and test 2

4. The goal for the mean time to resolve software problems by the software support group of a large corporation is 24 hours. Suppose that 60 software problem items are randomly selected from all such items over the past quarter, and the mean time to resolve the problems was 22.4 hours with a standard deviation of 9.6 hours.

========

Does this data show at the 10% level of significance that the mean resolution time is less than 24 hours?

Construct a 90% confidence interval for the mean resolution time.

What sample size would be required to estimate the mean time to within 0.5 hours with 90% confidence if it is assumed that the standard deviation will be no more than 10 hours?

5. A study of industries in North Texas compared the experience of entry-level managers in telecommunication companies with the experience of entry-level managers in softwareservicescompanies. Suppose that a random sample of size 20 entry-level managers wasselected separately from each group of companies and the experience of each manager was obtained.

Does this data show that there is a difference at the 5% level of significance?

Construct a 90% confidence interval for the difference between the means.

6. A large corporation would like to determine if employee job satisfaction will improve if it includes profit sharing based on quality scores for its factory workers. To answer this question, a pilot program was begun at one of its factories. A random sample of 30 workers from this factory was selected and, separately, a random sample of 30 workers was selected from another of its factories that did not implement this program. Prior to the start of the program, each worker in these samples was given a test of job satisfaction as part of their normal review process. This test was then administered to the same employees six months after the start of the new program. Use 5% level of significance for the following questions. The data are contained in the file “Pilot.txt”.

===

Is there a difference between the mean satisfaction scores of these two factories before the pilot program is started?

Let SatisImprov be defined as SatisImprov = After − Before.
Is there a difference between the means of SatisImprov at these factories?

Construct a 95% confidence interval for SatisImprov at the pilot factory.

<h4 style="text-align: justify;"><span style="color: #ff0000;"><strong>7. To develop which muscles need to be subjected to conditioning program in order to improve one&rsquo;s performance on the flat serve used in tennis, the study &ldquo;AnElectromyographic -Cinematographic Analysis of the Tennis Serve&rdquo; was conducted </strong>by the<strong> Department of Health, Physical Education, and Recreation at the Virginia PolytechnicInstitute and State University in 1978. Five different muscles</strong></span></h4>

1: anterior deltoid

2: pectorial major

3: posterior deltoid

4: middle deltoid

5: triceps

were tested on each of three subjects, and the experiment was carried out three times for each treatment combination. The electrographic data, recorded during the serve, are given in the following table. Data file “electromyographic.txt”

library(afex)

## Warning: package 'afex' was built under R version 3.2.5

## Loading required package: lme4

## Warning: package 'lme4' was built under R version 3.2.5

## Loading required package: Matrix

## Warning: package 'Matrix' was built under R version 3.2.5

## Loading required package: lsmeans

## Loading required package: estimability

## Warning: package 'estimability' was built under R version 3.2.5

## ************
## Welcome to afex. For support visit: http://afex.singmann.science/

## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'KR', 'S', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to lsmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set orthogonal sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************

## 
## Attaching package: 'afex'

## The following object is masked from 'package:lme4':
## 
##     lmer

getwd()

## [1] "/Users/yusufsultan"

setwd("/Users/yusufsultan")
electrom <- read.table('electromyographic.txt',header = T)
str(electrom)

## 'data.frame':    45 obs. of  3 variables:
##  $ Subject          : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Muscle           : int  1 1 1 2 2 2 3 3 3 4 ...
##  $ electromyographic: num  32 59 38 5 1.5 2 58 61 66 10 ...

head(electrom)

##   Subject Muscle electromyographic
## 1       1      1              32.0
## 2       1      1              59.0
## 3       1      1              38.0
## 4       1      2               5.0
## 5       1      2               1.5
## 6       1      2               2.0

tail(electrom)

##    Subject Muscle electromyographic
## 40       3      4                63
## 41       3      4                46
## 42       3      4                55
## 43       3      5                61
## 44       3      5                85
## 45       3      5                95

electromANOVA <- aov(lm(electromyographic ~ Subject * Muscle,electrom))
summary(electromANOVA)

##                Df Sum Sq Mean Sq F value  Pr(>F)   
## Subject         1   3730    3730   7.795 0.00792 **
## Muscle          1    540     540   1.129 0.29420   
## Subject:Muscle  1   1932    1932   4.038 0.05109 . 
## Residuals      41  19618     478                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Use a = 0.01 level of significance to test the hypothesis that a. Different subjects have equal electromyographic measurements.

Different muscles have no effect on electromyographic measurements.

Subjects and type of muscle do not interact.

8. (only for graduate students) A quality control engineer studied the relationship between years of experience of a system control engineer on the capacity of the engineer to complete within a given time a complex control design including the debugging of all computer programs and control devices. A group of 25 engineers having a wide difference in experience (measured in months of experience) were given the same control design project. The results of the study are given in the following table with y = 1 if the project was successfully completed in the allocated time and y = 0 if the project was not successfully completed. Data file “ExperienceCompetingTask.txt”.

Determine whether experience is associated with the probability of completing the task.

Compute the probability of successfully completing the task for an engineer having 24 months of experience. Place a 95% confidence interval on your estimate.

9. (only for graduate students) Geriatric study. A researcher in geriatrics designed a prospective study to investigate the effects of two interventions on the frequency of falls. One hundred subjects were randomly assigned to one of the two interventions: education only (X1 = 0) and education plus aerobic exercise training (X1 =1) . Subjects were at least 65 years of age and in reasonably good health. Three variables considered to be important as control variables were gender (X2 : 0 = female ; 1 = male) , a balance index (X3) , and a strength index (X4 ) . The higher the balance index, the more stable is the subject: and the higher the strength index, the stronger is the subject. The subject kept a diary recording the number of falls (Y ) during the six months of the study. The data are given in the following table. Data file “GeriatricStudy.txt”.

====

Fit the Poisson regression model with the response function

===

State the estimated regression coefficients, their estimated standard deviations. and the estimated response function.

Obtain the deviance residuals and present them in an index plot. Do there appear to be any outlying cases?

Assuming that the fitted model is appropriate, use the likelihood ratio test to determine whether gender (X2 ) can be dropped from the model: control a at 0.05. State the full and reduced models, decision rule, and conclusion. What is the P-value of the test?

For the fitted model containing only X1, X3 and X4 in first-order terms, obtain an approximate 95% confidence interval for b1. Interpret your confidence interval. Does aerobic exercise reduce the frequency of falls when controlling for balance and strength?

require(UsingR)
setwd("/Users/yusufsultan")
Pilot <- read.table('Pilot.txt',header = T)
str(Pilot)

## 'data.frame':    60 obs. of  3 variables:
##  $ Factory: Factor w/ 2 levels "NonPilot","Pilot": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Before : int  55 106 64 66 62 87 71 85 105 103 ...
##  $ After  : int  60 111 58 82 68 90 79 88 99 104 ...

head(Pilot)

##   Factory Before After
## 1   Pilot     55    60
## 2   Pilot    106   111
## 3   Pilot     64    58
## 4   Pilot     66    82
## 5   Pilot     62    68
## 6   Pilot     87    90

tail(Pilot)

##     Factory Before After
## 55 NonPilot     31    31
## 56 NonPilot     88    91
## 57 NonPilot    106   107
## 58 NonPilot     72    72
## 59 NonPilot     75    75
## 60 NonPilot     84    87

t.test(Pilot$Before,Pilot$After ,paired = TRUE )

## 
##  Paired t-test
## 
## data:  Pilot$Before and Pilot$After
## t = -3.4994, df = 59, p-value = 0.0008941
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.222211 -0.877789
## sample estimates:
## mean of the differences 
##                   -2.05

qplot(Before,After,data=Pilot,geom = c("point","smooth"))

## `geom_smooth()` using method = 'loess'

TestDiff <- Pilot$Before - Pilot$After
ggplot(Pilot,aes(x=Before - After))+
  geom_density()+
  geom_vline(xintercept = mean(TestDiff))+
  geom_vline(xintercept = mean(TestDiff)+
  2*c(-1,1)*sd(TestDiff)/sqrt(nrow(Pilot)),linetype = 2)

# Density plot showing the difference of test 1 and test 2

setwd("/Users/yusufsultan")
Geriatric <- read.table("GeriatricStudy.txt")
colnames(Geriatric)=c("NumberofFalls","Intervention","Gender","Balance","Strength")
NumberofFalls=Geriatric[,1]
Intervention=Geriatric[,2]
Intervention<-factor(Intervention)
Gender=Geriatric[,3]
Gender<-factor(Gender)
Balance=Geriatric[,4]
Strength=Geriatric[,5]
pois.Geriatric<-
glm(NumberofFalls~Intervention+Gender+Balance+Strength,family="poisson")
summary(pois.Geriatric)

## 
## Call:
## glm(formula = NumberofFalls ~ Intervention + Gender + Balance + 
##     Strength, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1854  -0.7819  -0.2564   0.5449   2.3626  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    0.489467   0.336869   1.453  0.14623    
## Intervention1 -1.069403   0.133154  -8.031 9.64e-16 ***
## Gender1       -0.046606   0.119970  -0.388  0.69766    
## Balance        0.009470   0.002953   3.207  0.00134 ** 
## Strength       0.008566   0.004312   1.986  0.04698 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 199.19  on 99  degrees of freedom
## Residual deviance: 108.79  on 95  degrees of freedom
## AIC: 377.29
## 
## Number of Fisher Scoring iterations: 5

summary(Geriatric)

##  NumberofFalls    Intervention     Gender        Balance     
##  Min.   : 0.00   Min.   :0.0   Min.   :0.00   Min.   :13.00  
##  1st Qu.: 1.00   1st Qu.:0.0   1st Qu.:0.00   1st Qu.:39.00  
##  Median : 3.00   Median :0.5   Median :1.00   Median :51.50  
##  Mean   : 3.04   Mean   :0.5   Mean   :0.53   Mean   :52.83  
##  3rd Qu.: 4.00   3rd Qu.:1.0   3rd Qu.:1.00   3rd Qu.:66.25  
##  Max.   :11.00   Max.   :1.0   Max.   :1.00   Max.   :98.00  
##     Strength    
##  Min.   :18.00  
##  1st Qu.:52.00  
##  Median :60.00  
##  Mean   :60.78  
##  3rd Qu.:70.25  
##  Max.   :90.00

deviance.residuals<-residuals(pois.Geriatric,type="deviance")
plot(deviance.residuals)

pois.Geriatric.reduced<-
glm(NumberofFalls~Intervention+Balance+Strength,family="poisson")
anova(pois.Geriatric.reduced,pois.Geriatric,test="LRT")

## Analysis of Deviance Table
## 
## Model 1: NumberofFalls ~ Intervention + Balance + Strength
## Model 2: NumberofFalls ~ Intervention + Gender + Balance + Strength
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        96     108.94                     
## 2        95     108.79  1    0.151   0.6976

pois.Geriatric.Intervention<-glm(NumberofFalls~Intervention,family="poisson")
pois.Geriatric.Gender<-glm(NumberofFalls~Gender,family="poisson")
pois.Geriatric.Balance<-glm(NumberofFalls~Balance,family="poisson")
pois.Geriatric.Strength<-glm(NumberofFalls~Strength,family="poisson")
summary(pois.Geriatric.Intervention)

## 
## Call:
## glm(formula = NumberofFalls ~ Intervention, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0057  -0.7620  -0.2495   0.6625   2.5703  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.50851    0.06652  22.678  < 2e-16 ***
## Intervention1 -1.06383    0.13132  -8.101 5.45e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 199.19  on 99  degrees of freedom
## Residual deviance: 123.98  on 98  degrees of freedom
## AIC: 386.48
## 
## Number of Fisher Scoring iterations: 5

summary(pois.Geriatric.Gender)

## 
## Call:
## glm(formula = NumberofFalls ~ Gender, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6337  -1.1678  -0.2573   0.2788   3.4356  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.24360    0.07833  15.877   <2e-16 ***
## Gender1     -0.26513    0.11501  -2.305   0.0211 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 199.19  on 99  degrees of freedom
## Residual deviance: 193.86  on 98  degrees of freedom
## AIC: 456.36
## 
## Number of Fisher Scoring iterations: 5

summary(pois.Geriatric.Balance)

## 
## Call:
## glm(formula = NumberofFalls ~ Balance, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8128  -1.0883  -0.3222   0.6404   3.7148  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 0.607376   0.177460   3.423  0.00062 ***
## Balance     0.009251   0.002986   3.098  0.00195 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 199.19  on 99  degrees of freedom
## Residual deviance: 189.60  on 98  degrees of freedom
## AIC: 452.09
## 
## Number of Fisher Scoring iterations: 5

summary(pois.Geriatric.Strength)

## 
## Call:
## glm(formula = NumberofFalls ~ Strength, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6911  -1.2380  -0.2467   0.6115   3.0345  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) 0.608621   0.262028   2.323   0.0202 *
## Strength    0.008170   0.004097   1.994   0.0461 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 199.19  on 99  degrees of freedom
## Residual deviance: 195.19  on 98  degrees of freedom
## AIC: 457.69
## 
## Number of Fisher Scoring iterations: 5