Question 1

A COVID frontliner wants to estimate the average amount that a resident in Maramag would donate to COVID affected families in Maramag. Twenty residents were randomly selected from the Municipality of Maramag. The 20 randomly residents were contacted by telephone and asked how much they would be willing to donate. Their responses are given below.

30 20 15 8 10 40 20 25 20 28 20 25 50 40 20 10 25 25 20 15

1.1 Test at 0.05 level of significance using R for a two-tailed test.

1.2 Test at 0.05 level of significance using R for a an appropriate one-tailed test.

mar = c(30, 20, 15, 8, 10, 40, 20, 25, 20, 28, 20, 25, 50, 40, 20, 10, 25, 25, 20, 15)
mean(mar)

## [1] 23.3

1.1 Test at 0.05 level of significance using R for a two-tailed test.

t.test(mar, alternative="two.sided", mu = 20, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  mar
## t = 1.3905, df = 19, p-value = 0.1804
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  18.33282 28.26718
## sample estimates:
## mean of x 
##      23.3

Given the high p-value = 0.1804, we fail to reject the null hypothesis that the mean of mar is equal to 20. That is, we don’t have evidence that it is different from 20.

1.2 Test at 0.05 level of significance using R for a an appropriate one-tailed test.

t.test(mar, alternative="greater", mu = 20, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  mar
## t = 1.3905, df = 19, p-value = 0.09022
## alternative hypothesis: true mean is greater than 20
## 95 percent confidence interval:
##  19.19641      Inf
## sample estimates:
## mean of x 
##      23.3

Since the p-value is greater than the signficance level 0.05, we fail to reject the null hypothesis. That is, there is no enough evidence to prove that Maramag residents would donate more than 20.

t.test(mar, alternative="less", mu = 20, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  mar
## t = 1.3905, df = 19, p-value = 0.9098
## alternative hypothesis: true mean is less than 20
## 95 percent confidence interval:
##      -Inf 27.40359
## sample estimates:
## mean of x 
##      23.3

Since the p-value is greater than the signficance level 0.05, we fail to reject the null hypothesis. That is, there is no enough evidence to prove that Maramag residents would donate less than 20.

Question 2

Refer to the pgviews data, answer the following:

2.1 Is the distribution of the data normally distributed?

2.2 Are the variances equal?

2.3 At 0.05 level of significance, does Site A and Site B differ statistically?

library(readxl)
pgviews<- read_excel("C:/Users/63966/Downloads/pgviews.xlsx")
View(pgviews)

ken <- head(pgviews, 20)
ken

## # A tibble: 20 × 3
##    Subject Site  Pages
##      <dbl> <chr> <dbl>
##  1       1 B         2
##  2       2 B         6
##  3       3 A         5
##  4       4 B         7
##  5       5 A         3
##  6       6 B         2
##  7       7 B         6
##  8       8 A         1
##  9       9 A         3
## 10      10 A         4
## 11      11 B         6
## 12      12 B         6
## 13      13 B         4
## 14      14 A         5
## 15      15 A         3
## 16      16 A         6
## 17      17 B         6
## 18      18 B         3
## 19      19 A         4
## 20      20 B         7

2.1 Is the distribution of the data normally distributed?

Here, Null Hypothesis: The data is normally distributed; Alternative Hypothesis: The data is not normally distributed.

shapiro.test(ken$Pages)

## 
##  Shapiro-Wilk normality test
## 
## data:  ken$Pages
## W = 0.92449, p-value = 0.1209

Since p-value = 0.1209 > 0.05, it is conclusive that we fail to reject the null hypothesis. That is, we can assume normality.

2.2 Are the variances equal?

str(ken)

## tibble [20 × 3] (S3: tbl_df/tbl/data.frame)
##  $ Subject: num [1:20] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Site   : chr [1:20] "B" "B" "A" "B" ...
##  $ Pages  : num [1:20] 2 6 5 7 3 2 6 1 3 4 ...

var.test(Pages ~ Site, ken, alternative = "two.sided")

## 
##  F test to compare two variances
## 
## data:  Pages by Site
## F = 0.60957, num df = 8, denom df = 10, p-value = 0.4948
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1581284 2.6181715
## sample estimates:
## ratio of variances 
##          0.6095679

The p-value of 0.4948 is greater than the significance level of 0.05. We can conclude that there is no significant difference between the two variances.

2.3 At 0.05 level of significance, does Site A and Site B differ statistically?

kyle <- t.test(Pages ~ Site, data = ken, var.equal = TRUE,conf.level = 0.95)
kyle

## 
##  Two Sample t-test
## 
## data:  Pages by Site
## t = -1.5765, df = 18, p-value = 0.1323
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -2.8510432  0.4065987
## sample estimates:
## mean in group A mean in group B 
##        3.777778        5.000000

The p-value of the test is 0.1323, which is greater than the significance level alpha = 0.05. Hence we do not have enough evidences to prove that Site A and Site B differ statistically.

Question 3

Two medications for the treatment of panic disorder, Medication1 and Medication 2, were compared with a placebo. A random sample of 36 patients was obtained from a listing of about 4,000 panic disorder patients who volunteered to participate in clinical trials. The 36 patients were randomly divided into three groups of equal size. The first group received Medication 1 for 10 weeks, the second group received Medication 2 for 10 weeks, and the third group received a placebo pill for 10 weeks. On week 11, all 36 patients were given the 7-item Panic Disorder Severity Scale (PDSC) which is scored on a 0 to 28 scale (lower scores are better). The PDSC scores are given below.

Group 1: 12 10 11 14 15 9 11 12 13 10 15 10

Group 2: 12 13 17 11 16 13 12 14 17 12 16 18

Group 3: 14 21 17 16 17 22 16 22 19 20 18 16

3.1 Is the distribution of the data normally distributed?

3.2 Are the variances equal?

3.3 At 0.05 level of significance, does the PDSC scores differ significantly among the different treatments?

3.4 Use R for the conduct of multiple comparison test provided that 3.3 is significant.

K1 <- c(12, 10, 11, 14, 15, 9, 11, 12, 13, 10, 15, 10)
K1

##  [1] 12 10 11 14 15  9 11 12 13 10 15 10

K2 <- c(12, 13, 17, 11, 16, 13, 12, 14, 17, 12, 16, 18)
K2

##  [1] 12 13 17 11 16 13 12 14 17 12 16 18

K3 <- c(14, 21, 17, 16, 17, 22, 16, 22, 19, 20, 18, 16)
K3

##  [1] 14 21 17 16 17 22 16 22 19 20 18 16

K <- c(K1, K2, K3)
K

##  [1] 12 10 11 14 15  9 11 12 13 10 15 10 12 13 17 11 16 13 12 14 17 12 16 18 14
## [26] 21 17 16 17 22 16 22 19 20 18 16

str(K)

##  num [1:36] 12 10 11 14 15 9 11 12 13 10 ...

shapiro.test(K)

## 
##  Shapiro-Wilk normality test
## 
## data:  K
## W = 0.95851, p-value = 0.1933

Since p-value = 0.1933 > 0.05, it is conclusive that we fail to reject the null hypothesis. That is, we can assume normality.

3.2 Are the variances equal?

library(readxl)
med<- read_excel("D:/Regression Analysis/Book1.xlsx")
View(med)

kenneth <- head(med, 37)
kenneth

## # A tibble: 36 × 2
##    Group  Observation
##    <chr>        <dbl>
##  1 Group1          12
##  2 Group1          10
##  3 Group1          11
##  4 Group1          14
##  5 Group1          15
##  6 Group1           9
##  7 Group1          11
##  8 Group1          12
##  9 Group1          13
## 10 Group1          10
## # … with 26 more rows

str(kenneth)

## tibble [36 × 2] (S3: tbl_df/tbl/data.frame)
##  $ Group      : chr [1:36] "Group1" "Group1" "Group1" "Group1" ...
##  $ Observation: num [1:36] 12 10 11 14 15 9 11 12 13 10 ...

res <- bartlett.test(Observation ~ Group, data = kenneth)
res

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Observation by Group
## Bartlett's K-squared = 0.67864, df = 2, p-value = 0.7123

The p-value is 0.7123 is greater than the significance level of 0.05. We can conclude that there is no significant difference between the tested sample variances.

3.3 At 0.05 level of significance, does the PDSC scores differ significantly among the different treatments?

set.seed(123)
dplyr::sample_n(kenneth, 36)

## # A tibble: 36 × 2
##    Group  Observation
##    <chr>        <dbl>
##  1 Group3          16
##  2 Group2          17
##  3 Group2          13
##  4 Group1          11
##  5 Group1          10
##  6 Group2          13
##  7 Group2          12
##  8 Group1          15
##  9 Group1          15
## 10 Group2          14
## # … with 26 more rows

library(ggpubr)

## Loading required package: ggplot2

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

group_by(med, Group) %>%
  summarise(
    count = n(),
    mean = mean(Observation, na.rm = TRUE),
    sd = sd(Observation, na.rm = TRUE)
  )

## # A tibble: 3 × 4
##   Group  count  mean    sd
##   <chr>  <int> <dbl> <dbl>
## 1 Group1    12  11.8  2.04
## 2 Group2    12  14.2  2.42
## 3 Group3    12  18.2  2.62

kenneth$Group <- ordered(kenneth$Group,
                         levels = c("Group1", "Group2", "Group3"))

levels(kenneth$Group)

## [1] "Group1" "Group2" "Group3"

library("ggpubr")
ggboxplot(med, x = "Group", y = "Observation",
          color = "Group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("Group1", "Group2", "Group3"),
          ylab = "Observation", xlab = "Group")

library("ggpubr")
ggline(med, x = "Group", y = "Observation",
       add = c("mean_se", "jitter"),
       order = c("Group1", "Group2", "Group3"),
       ylab = "Observation", xlab = "Group")

library("gplots")

## 
## Attaching package: 'gplots'

## The following object is masked from 'package:stats':
## 
##     lowess

plotmeans(Observation ~ Group, data = med, frame = FALSE,
          xlab = "Group", ylab = "Observation",
          main="Mean Plot with 95% CI")

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
## graphical parameter

## Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
## a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
## graphical parameter

res.aov <- aov(Observation ~ Group, data = kenneth)
res

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Observation by Group
## Bartlett's K-squared = 0.67864, df = 2, p-value = 0.7123

summary(res.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Group        2  245.2  122.58    21.8 9.25e-07 ***
## Residuals   33  185.6    5.62                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can conclude that there are significant differences between the groups highlighted with “***” in the model summary because the p-value is less than the significance level of 0.05.

3.4 Use R for the conduct of multiple comparison test provided that 3.3 is significant.

TukeyHSD(res.aov)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Observation ~ Group, data = kenneth)
## 
## $Group
##                   diff        lwr      upr     p adj
## Group2-Group1 2.416667 0.04105701 4.792276 0.0454940
## Group3-Group1 6.333333 3.95772367 8.708943 0.0000006
## Group3-Group2 3.916667 1.54105701 6.292276 0.0008425

The difference between Group2 and Group1 is significant, as shown by the output, with an adjusted p-value of 0.0454940. The difference between Group3 and Group1 is significant, as shown by the output, with an adjusted p-value of 0.0000006. The difference between Group3 and Group2 is significant, as shown by the output, with an adjusted p-value of 0.0008425.

Experimental Design (STAT 55) Midterm Exam

Kyle Kenneth Ruaya

2022-11-27

Question 1

Question 2

Question 3