LONG TEST 2. PART II.

library(readr)
library(BSDA)

## Loading required package: lattice

## 
## Attaching package: 'BSDA'

## The following object is masked from 'package:datasets':
## 
##     Orange

library(pander)
library(tidyverse)

## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v dplyr   1.0.2
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v purrr   0.3.4

## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggpubr)
library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

PROBLEM 1

Seven manufacturing companies agreed to implement a time management program in hopes of improving productivity. The average times, in minutes, it took the companies to produce the same quantity and kinds of part are listed on the table below. Does this information indicate that the program decreased the production time? Use a 0.05 level of significance and assume normal population distributions.

Problem: Is the mean production time of companies before the time management program greater than the mean production time after the program?

Definition of the parameters: \(\mu\)₁: the mean production time (in minutes) of the companies before the program; \(\mu\)₂: the mean production time (in minutes) of the companies after the program

Null Hypothesis, H₀: The mean production time of the companies before the time management program is greater than or equal to the mean production time after the time management program. (\(\mu\)₁ \(\ge\) \(\mu\)₂)

Alternative Hypothesis, H_a: The mean production time of the companies before the time management program is lesser than the mean production time after the program. (\(\mu\)₁ < \(\mu\)₂)

Level of Significance: \(\alpha\) = 0.05

Test Statistic: t-statistic

before<-c(75,112,89,95,80,105,110)
after<-c(70,110,88,100,80,100,99)
pander(t.test(before,after,paired=TRUE,alternative="less",mu=0))

Paired t-test: `before` and `after`
Test statistic	df	P value	Alternative hypothesis	mean of the differences
1.439	6	0.8999	less	2.714

Decision: Do not reject H₀.

CONCLUSION: There is sufficient data to show that the average time of production of the companies before the time management program is greater than or equal to the average time of production after the time management program. Hence, the time management program has significantly decreased production time and, thus, improved productivity.

PROBLEM 2

The monthly returns in percentage of pesos of two investment portfolios were recorded for one year. Perform a hypothesis test at the 0.05 significance level to determine if there is sufficient evidence showing that there is no significant difference in the mean monthly percentage returns between the two investment portfolios.

Problem: Is there significant difference in the mean monthly percentage returns between the two investment portfolios?

Definition of the parameters: \(\mu\)₁: the mean monthly percentage returns (in percentage) of the investment portfolio 1; \(\mu\)₂: the mean monthly percentage returns (in percentage) of the investment portfolio 2

Null Hypothesis, H₀: There is no significant difference in the mean monthly percentage returns between the two investment portfolios. (\(\mu\)₁ = \(\mu\)₂)

Alternative Hypothesis, H_a: There is a significant difference in the mean monthly percentage returns between the two investment portfolios. (\(\mu\)₁ \(\neq\) \(\mu\)₂)

Level of Significance: \(\alpha\) = 0.05

Test Statistic: t-statistic

Test for Equality of Variance

portfolio1<-c(2.1,1.2,-1.5,1.9,0.7,2.5,3.0,-2.2,1.8,0.5,2.0,1.5)
portfolio2<-c(2.9,3.5,-2.8,1.0,-3.0,2.6,-3.5,4.5,1.5,2.3,-1.0,0.8)
pander(var.test(portfolio1,portfolio2))

F test to compare two variances: `portfolio1` and `portfolio2` (continued below)
Test statistic	num df	denom df	P value	Alternative hypothesis
0.3335	11	11	0.082	two.sided

ratio of variances
0.3335

Since the p-value is greater than the significance level of 0.05, the null is not rejected which indicates that the population variances are equal.

pander(t.test(portfolio1, portfolio2, alternative="two.sided",mu=0,var.equal=TRUE))

Two Sample t-test: `portfolio1` and `portfolio2`
Test statistic	df	P value	Alternative hypothesis	mean of x	mean of y
0.4344	22	0.6683	two.sided	1.125	0.7333

Decision: Do not reject H₀.

CONCLUSION: There is sufficient evidence to show that there is no significant difference in the mean monthly percentage returns between the two investment portfolios.

PROBLEM 3

The following data represents the semi-monthly salary of the faculty (in thousands) of four state universities. Faculties were randomly selected from each school. At the 5% level of significance, is there a significant difference among the salary of the faculties of the four state universities?

Problem: Is there a significant difference among the salary of the faculties of the four state universities?

Definition of the parameters: \(\mu\)₁: the mean semi-monthly salary of the faculty of University A; \(\mu\)₂: the mean semi-monthly salary of the faculty of University B; \(\mu\)₃: the mean semi-monthly salary of the faculty of University c; \(\mu\)₄: the mean semi-monthly salary of the faculty of University D

Null Hypothesis, H₀: There is no significant difference in the mean semi-monthly salary of the faculties in four state universities. (\(\mu\)₁ = \(\mu\)₂ = \(\mu\)₃ = \(\mu\)₄)

Alternative Hypothesis, H_a: There are at least two state universities that differ in the mean semi-monthly salary of their faculty. (\(\mu\)₁ \(\neq\) \(\mu\)₂ \(\neq\) \(\mu\)₃ \(\neq\) \(\mu\)₄)

Level of Significance: \(\alpha\) = 0.05

Test Statistic: F-statistic

faculty<-read.csv("faculty.csv")
head(faculty)

##   university salary
## 1          a     15
## 2          a     20
## 3          a     16
## 4          a     13
## 5          a     17
## 6          b     12

levels(faculty$university)

## [1] "a" "b" "c" "d"

1. Checking for outliers in the data.

ggboxplot(faculty, x="university", y="salary")

faculty%>%group_by(university)%>%identify_outliers(salary)

## [1] university salary     is.outlier is.extreme
## <0 rows> (or 0-length row.names)

There are no outliers shown in the boxplot, therefore, the test shows that there are also no outliers.

2. Checking the normality assumption of residuals.

anovamod<-lm(salary~university,data=faculty)
ggqqplot(residuals(anovamod))

The generated QQ plot shows that the distribution of residuals is approximately normal.

shapiro_test(residuals(anovamod))

## # A tibble: 1 x 3
##   variable            statistic p.value
##   <chr>                   <dbl>   <dbl>
## 1 residuals(anovamod)     0.965   0.704

The Shapiro-Wilk test statistic shows a p-value of 0.7037 which is greater than the 0.05 significance level. Thus, the decision is to not reject the null hypothesis for the test which is “the data is approximately normal.”

3. Checking the homogeneity assumption of variances.

faculty%>%levene_test(salary~university)

## # A tibble: 1 x 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    14      1.24 0.333

From the output, the p-value shown which is 0.3330 is greater than 0.05. This indicates to fail to reject the null hypothesis for the test which is "the variances across groups are homogeneous or equal.

ANOVA Table

salary.anova<-aov(salary~university,faculty)
anova(salary.anova)

## Analysis of Variance Table
## 
## Response: salary
##            Df Sum Sq Mean Sq F value  Pr(>F)  
## university  3 136.19  45.398  2.9055 0.07184 .
## Residuals  14 218.75  15.625                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA table, it can be seen that at least three of the universities significantly differ in terms of the average semi-monthly salary of their faculty (0.07184).

Post-Hoc Analysis

pairwisecomp<-faculty%>%tukey_hsd(salary~university)
pander(pairwisecomp)

Table continues below
term	group1	group2	estimate	conf.low	conf.high
university	a	b	-1.45	-9.157	6.257
university	a	c	5.2	-2.066	12.47
university	a	d	-1.2	-8.907	6.507
university	b	c	6.65	-1.057	14.36
university	b	d	0.25	-7.874	8.374
university	c	d	-6.4	-14.11	1.307

p.adj	p.adj.signif
0.946	ns
0.207	ns
0.968	ns
0.102	ns
1	ns
0.12	ns

Decision: Do not reject H₀.

CONCLUSION: Based on the sample and from the post-hoc analysis, there is no significant difference among the semi-monthly salary of the faculties of the four state universities because all the computed p-values for these comparisons are greater than 0.05.

LongTest3Garna

2194168 GARNA, JAN KATHLENE RAY L.

1769 MWF 5:00-6:00

LONG TEST 2. PART II.

PROBLEM 1

PROBLEM 2

PROBLEM 3

1. Checking for outliers in the data.

2. Checking the normality assumption of residuals.

3. Checking the homogeneity assumption of variances.

ANOVA Table

Post-Hoc Analysis