library(readr)
library(BSDA)
## Loading required package: lattice
##
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
##
## Orange
library(pander)
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v dplyr 1.0.2
## v tibble 3.0.3 v stringr 1.4.0
## v tidyr 1.1.2 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggpubr)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
Seven manufacturing companies agreed to implement a time management program in hopes of improving productivity. The average times, in minutes, it took the companies to produce the same quantity and kinds of part are listed on the table below. Does this information indicate that the program decreased the production time? Use a 0.05 level of significance and assume normal population distributions.
Problem: Is the mean production time of companies before the time management program greater than the mean production time after the program?
Definition of the parameters: \(\mu\)1: the mean production time (in minutes) of the companies before the program; \(\mu\)2: the mean production time (in minutes) of the companies after the program
Null Hypothesis, H0: The mean production time of the companies before the time management program is greater than or equal to the mean production time after the time management program. (\(\mu\)1 \(\ge\) \(\mu\)2)
Alternative Hypothesis, Ha: The mean production time of the companies before the time management program is lesser than the mean production time after the program. (\(\mu\)1 < \(\mu\)2)
Level of Significance: \(\alpha\) = 0.05
Test Statistic: t-statistic
before<-c(75,112,89,95,80,105,110)
after<-c(70,110,88,100,80,100,99)
pander(t.test(before,after,paired=TRUE,alternative="less",mu=0))
| Test statistic | df | P value | Alternative hypothesis | mean of the differences |
|---|---|---|---|---|
| 1.439 | 6 | 0.8999 | less | 2.714 |
Decision: Do not reject H0.
CONCLUSION: There is sufficient data to show that the average time of production of the companies before the time management program is greater than or equal to the average time of production after the time management program. Hence, the time management program has significantly decreased production time and, thus, improved productivity.
The monthly returns in percentage of pesos of two investment portfolios were recorded for one year. Perform a hypothesis test at the 0.05 significance level to determine if there is sufficient evidence showing that there is no significant difference in the mean monthly percentage returns between the two investment portfolios.
Problem: Is there significant difference in the mean monthly percentage returns between the two investment portfolios?
Definition of the parameters: \(\mu\)1: the mean monthly percentage returns (in percentage) of the investment portfolio 1; \(\mu\)2: the mean monthly percentage returns (in percentage) of the investment portfolio 2
Null Hypothesis, H0: There is no significant difference in the mean monthly percentage returns between the two investment portfolios. (\(\mu\)1 = \(\mu\)2)
Alternative Hypothesis, Ha: There is a significant difference in the mean monthly percentage returns between the two investment portfolios. (\(\mu\)1 \(\neq\) \(\mu\)2)
Level of Significance: \(\alpha\) = 0.05
Test Statistic: t-statistic
Test for Equality of Variance
portfolio1<-c(2.1,1.2,-1.5,1.9,0.7,2.5,3.0,-2.2,1.8,0.5,2.0,1.5)
portfolio2<-c(2.9,3.5,-2.8,1.0,-3.0,2.6,-3.5,4.5,1.5,2.3,-1.0,0.8)
pander(var.test(portfolio1,portfolio2))
| Test statistic | num df | denom df | P value | Alternative hypothesis |
|---|---|---|---|---|
| 0.3335 | 11 | 11 | 0.082 | two.sided |
| ratio of variances |
|---|
| 0.3335 |
Since the p-value is greater than the significance level of 0.05, the null is not rejected which indicates that the population variances are equal.
pander(t.test(portfolio1, portfolio2, alternative="two.sided",mu=0,var.equal=TRUE))
| Test statistic | df | P value | Alternative hypothesis | mean of x | mean of y |
|---|---|---|---|---|---|
| 0.4344 | 22 | 0.6683 | two.sided | 1.125 | 0.7333 |
Decision: Do not reject H0.
CONCLUSION: There is sufficient evidence to show that there is no significant difference in the mean monthly percentage returns between the two investment portfolios.
The following data represents the semi-monthly salary of the faculty (in thousands) of four state universities. Faculties were randomly selected from each school. At the 5% level of significance, is there a significant difference among the salary of the faculties of the four state universities?
Problem: Is there a significant difference among the salary of the faculties of the four state universities?
Definition of the parameters: \(\mu\)1: the mean semi-monthly salary of the faculty of University A; \(\mu\)2: the mean semi-monthly salary of the faculty of University B; \(\mu\)3: the mean semi-monthly salary of the faculty of University c; \(\mu\)4: the mean semi-monthly salary of the faculty of University D
Null Hypothesis, H0: There is no significant difference in the mean semi-monthly salary of the faculties in four state universities. (\(\mu\)1 = \(\mu\)2 = \(\mu\)3 = \(\mu\)4)
Alternative Hypothesis, Ha: There are at least two state universities that differ in the mean semi-monthly salary of their faculty. (\(\mu\)1 \(\neq\) \(\mu\)2 \(\neq\) \(\mu\)3 \(\neq\) \(\mu\)4)
Level of Significance: \(\alpha\) = 0.05
Test Statistic: F-statistic
faculty<-read.csv("faculty.csv")
head(faculty)
## university salary
## 1 a 15
## 2 a 20
## 3 a 16
## 4 a 13
## 5 a 17
## 6 b 12
levels(faculty$university)
## [1] "a" "b" "c" "d"
ggboxplot(faculty, x="university", y="salary")
faculty%>%group_by(university)%>%identify_outliers(salary)
## [1] university salary is.outlier is.extreme
## <0 rows> (or 0-length row.names)
There are no outliers shown in the boxplot, therefore, the test shows that there are also no outliers.
anovamod<-lm(salary~university,data=faculty)
ggqqplot(residuals(anovamod))
The generated QQ plot shows that the distribution of residuals is approximately normal.
shapiro_test(residuals(anovamod))
## # A tibble: 1 x 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(anovamod) 0.965 0.704
The Shapiro-Wilk test statistic shows a p-value of 0.7037 which is greater than the 0.05 significance level. Thus, the decision is to not reject the null hypothesis for the test which is “the data is approximately normal.”
faculty%>%levene_test(salary~university)
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 14 1.24 0.333
From the output, the p-value shown which is 0.3330 is greater than 0.05. This indicates to fail to reject the null hypothesis for the test which is "the variances across groups are homogeneous or equal.
salary.anova<-aov(salary~university,faculty)
anova(salary.anova)
## Analysis of Variance Table
##
## Response: salary
## Df Sum Sq Mean Sq F value Pr(>F)
## university 3 136.19 45.398 2.9055 0.07184 .
## Residuals 14 218.75 15.625
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA table, it can be seen that at least three of the universities significantly differ in terms of the average semi-monthly salary of their faculty (0.07184).
pairwisecomp<-faculty%>%tukey_hsd(salary~university)
pander(pairwisecomp)
| term | group1 | group2 | null.value | estimate | conf.low | conf.high |
|---|---|---|---|---|---|---|
| university | a | b | 0 | -1.45 | -9.157 | 6.257 |
| university | a | c | 0 | 5.2 | -2.066 | 12.47 |
| university | a | d | 0 | -1.2 | -8.907 | 6.507 |
| university | b | c | 0 | 6.65 | -1.057 | 14.36 |
| university | b | d | 0 | 0.25 | -7.874 | 8.374 |
| university | c | d | 0 | -6.4 | -14.11 | 1.307 |
| p.adj | p.adj.signif |
|---|---|
| 0.946 | ns |
| 0.207 | ns |
| 0.968 | ns |
| 0.102 | ns |
| 1 | ns |
| 0.12 | ns |
Decision: Do not reject H0.
CONCLUSION: Based on the sample and from the post-hoc analysis, there is no significant difference among the semi-monthly salary of the faculties of the four state universities because all the computed p-values for these comparisons are greater than 0.05.