LONG TEST 2. PART II.

library(readr)
library(BSDA)
## Loading required package: lattice
## 
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
## 
##     Orange
library(pander)
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v dplyr   1.0.2
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v purrr   0.3.4
## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggpubr)
library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter

PROBLEM 1

Seven manufacturing companies agreed to implement a time management program in hopes of improving productivity. The average times, in minutes, it took the companies to produce the same quantity and kinds of part are listed on the table below. Does this information indicate that the program decreased the production time? Use a 0.05 level of significance and assume normal population distributions.

Problem: Is the mean production time of companies before the time management program greater than the mean production time after the program?

Definition of the parameters: \(\mu\)1: the mean production time (in minutes) of the companies before the program; \(\mu\)2: the mean production time (in minutes) of the companies after the program

Null Hypothesis, H0: The mean production time of the companies before the time management program is greater than or equal to the mean production time after the time management program. (\(\mu\)1 \(\ge\) \(\mu\)2)

Alternative Hypothesis, Ha: The mean production time of the companies before the time management program is lesser than the mean production time after the program. (\(\mu\)1 < \(\mu\)2)

Level of Significance: \(\alpha\) = 0.05

Test Statistic: t-statistic

before<-c(75,112,89,95,80,105,110)
after<-c(70,110,88,100,80,100,99)
pander(t.test(before,after,paired=TRUE,alternative="less",mu=0))
Paired t-test: before and after
Test statistic df P value Alternative hypothesis mean of the differences
1.439 6 0.8999 less 2.714

Decision: Do not reject H0.

CONCLUSION: There is sufficient data to show that the average time of production of the companies before the time management program is greater than or equal to the average time of production after the time management program. Hence, the time management program has significantly decreased production time and, thus, improved productivity.

PROBLEM 2

The monthly returns in percentage of pesos of two investment portfolios were recorded for one year. Perform a hypothesis test at the 0.05 significance level to determine if there is sufficient evidence showing that there is no significant difference in the mean monthly percentage returns between the two investment portfolios.

Problem: Is there significant difference in the mean monthly percentage returns between the two investment portfolios?

Definition of the parameters: \(\mu\)1: the mean monthly percentage returns (in percentage) of the investment portfolio 1; \(\mu\)2: the mean monthly percentage returns (in percentage) of the investment portfolio 2

Null Hypothesis, H0: There is no significant difference in the mean monthly percentage returns between the two investment portfolios. (\(\mu\)1 = \(\mu\)2)

Alternative Hypothesis, Ha: There is a significant difference in the mean monthly percentage returns between the two investment portfolios. (\(\mu\)1 \(\neq\) \(\mu\)2)

Level of Significance: \(\alpha\) = 0.05

Test Statistic: t-statistic

Test for Equality of Variance

portfolio1<-c(2.1,1.2,-1.5,1.9,0.7,2.5,3.0,-2.2,1.8,0.5,2.0,1.5)
portfolio2<-c(2.9,3.5,-2.8,1.0,-3.0,2.6,-3.5,4.5,1.5,2.3,-1.0,0.8)
pander(var.test(portfolio1,portfolio2))
F test to compare two variances: portfolio1 and portfolio2 (continued below)
Test statistic num df denom df P value Alternative hypothesis
0.3335 11 11 0.082 two.sided
ratio of variances
0.3335

Since the p-value is greater than the significance level of 0.05, the null is not rejected which indicates that the population variances are equal.

pander(t.test(portfolio1, portfolio2, alternative="two.sided",mu=0,var.equal=TRUE))
Two Sample t-test: portfolio1 and portfolio2
Test statistic df P value Alternative hypothesis mean of x mean of y
0.4344 22 0.6683 two.sided 1.125 0.7333

Decision: Do not reject H0.

CONCLUSION: There is sufficient evidence to show that there is no significant difference in the mean monthly percentage returns between the two investment portfolios.

PROBLEM 3

The following data represents the semi-monthly salary of the faculty (in thousands) of four state universities. Faculties were randomly selected from each school. At the 5% level of significance, is there a significant difference among the salary of the faculties of the four state universities?

Problem: Is there a significant difference among the salary of the faculties of the four state universities?

Definition of the parameters: \(\mu\)1: the mean semi-monthly salary of the faculty of University A; \(\mu\)2: the mean semi-monthly salary of the faculty of University B; \(\mu\)3: the mean semi-monthly salary of the faculty of University c; \(\mu\)4: the mean semi-monthly salary of the faculty of University D

Null Hypothesis, H0: There is no significant difference in the mean semi-monthly salary of the faculties in four state universities. (\(\mu\)1 = \(\mu\)2 = \(\mu\)3 = \(\mu\)4)

Alternative Hypothesis, Ha: There are at least two state universities that differ in the mean semi-monthly salary of their faculty. (\(\mu\)1 \(\neq\) \(\mu\)2 \(\neq\) \(\mu\)3 \(\neq\) \(\mu\)4)

Level of Significance: \(\alpha\) = 0.05

Test Statistic: F-statistic

faculty<-read.csv("faculty.csv")
head(faculty)
##   university salary
## 1          a     15
## 2          a     20
## 3          a     16
## 4          a     13
## 5          a     17
## 6          b     12
levels(faculty$university)
## [1] "a" "b" "c" "d"

1. Checking for outliers in the data.

ggboxplot(faculty, x="university", y="salary")

faculty%>%group_by(university)%>%identify_outliers(salary)
## [1] university salary     is.outlier is.extreme
## <0 rows> (or 0-length row.names)

There are no outliers shown in the boxplot, therefore, the test shows that there are also no outliers.

2. Checking the normality assumption of residuals.

anovamod<-lm(salary~university,data=faculty)
ggqqplot(residuals(anovamod))

The generated QQ plot shows that the distribution of residuals is approximately normal.

shapiro_test(residuals(anovamod))
## # A tibble: 1 x 3
##   variable            statistic p.value
##   <chr>                   <dbl>   <dbl>
## 1 residuals(anovamod)     0.965   0.704

The Shapiro-Wilk test statistic shows a p-value of 0.7037 which is greater than the 0.05 significance level. Thus, the decision is to not reject the null hypothesis for the test which is “the data is approximately normal.”

3. Checking the homogeneity assumption of variances.

faculty%>%levene_test(salary~university)
## # A tibble: 1 x 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    14      1.24 0.333

From the output, the p-value shown which is 0.3330 is greater than 0.05. This indicates to fail to reject the null hypothesis for the test which is "the variances across groups are homogeneous or equal.

ANOVA Table

salary.anova<-aov(salary~university,faculty)
anova(salary.anova)
## Analysis of Variance Table
## 
## Response: salary
##            Df Sum Sq Mean Sq F value  Pr(>F)  
## university  3 136.19  45.398  2.9055 0.07184 .
## Residuals  14 218.75  15.625                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA table, it can be seen that at least three of the universities significantly differ in terms of the average semi-monthly salary of their faculty (0.07184).

Post-Hoc Analysis

pairwisecomp<-faculty%>%tukey_hsd(salary~university)
pander(pairwisecomp)
Table continues below
term group1 group2 null.value estimate conf.low conf.high
university a b 0 -1.45 -9.157 6.257
university a c 0 5.2 -2.066 12.47
university a d 0 -1.2 -8.907 6.507
university b c 0 6.65 -1.057 14.36
university b d 0 0.25 -7.874 8.374
university c d 0 -6.4 -14.11 1.307
p.adj p.adj.signif
0.946 ns
0.207 ns
0.968 ns
0.102 ns
1 ns
0.12 ns

Decision: Do not reject H0.

CONCLUSION: Based on the sample and from the post-hoc analysis, there is no significant difference among the semi-monthly salary of the faculties of the four state universities because all the computed p-values for these comparisons are greater than 0.05.