library(readxl)
library(janitor)
## Warning: package 'janitor' was built under R version 3.6.3
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
stat325 <- read_xlsx("STAT 325 Assignment Data.xlsx")
names(stat325)
stat325 <- stat325 %>% janitor::clean_names()
head(stat325,10)
## Warning: `...` is not empty.
##
## We detected these problematic arguments:
## * `needs_dots`
##
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
str(stat325)
1. Present appropriate summaries for the data
The data has bot numeric and cotegorical variables. Therefore we are going to do summaries for the numeric and categorical variables independently
Numeric Variables
To do this, we can create a data frame containing numeric variables only.
numeric_vars <- stat325[,c(3,7:14)]
The first few observations of the numeric variables
head(numeric_vars)
## Warning: `...` is not empty.
##
## We detected these problematic arguments:
## * `needs_dots`
##
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
In order to do summary statistics for the numeric variables, we can use describe() function from the psych package. However there are other ways we can do the same. The describe() function is comprehensive and simple.
library(psych)
num_summaries<- describe(numeric_vars)
knitr::kable(apply(num_summaries,2,round,2))
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age | 1 | 153 | 40.15 | 10.79 | 39.00 | 39.96 | 8.90 | 19.00 | 67.00 | 48.00 | 0.20 | -0.44 | 0.87 |
| number_of_assistants | 2 | 153 | 12.39 | 7.28 | 12.00 | 12.07 | 8.90 | 2.00 | 31.00 | 29.00 | 0.31 | -0.80 | 0.59 |
| experience | 3 | 153 | 10.75 | 5.96 | 10.00 | 10.60 | 7.41 | 1.00 | 25.00 | 24.00 | 0.20 | -0.83 | 0.48 |
| marketing_budget | 4 | 153 | 1426151.90 | 889186.57 | 1288260.00 | 1364300.65 | 986833.39 | 165920.00 | 4599160.00 | 4433240.00 | 0.61 | 0.01 | 71886.47 |
| appraisal_score | 5 | 153 | 80.64 | 8.36 | 81.10 | 80.69 | 10.82 | 66.20 | 94.20 | 28.00 | -0.05 | -1.24 | 0.68 |
| quarter_1 | 6 | 153 | 18961.60 | 4879.04 | 19080.62 | 18920.92 | 4786.87 | 8574.16 | 30068.04 | 21493.88 | 0.02 | -0.58 | 394.45 |
| quarter_2 | 7 | 153 | 20703.36 | 4880.24 | 20767.62 | 20640.39 | 4280.09 | 9144.08 | 33333.32 | 24189.24 | 0.11 | -0.21 | 394.54 |
| quarter_3 | 8 | 153 | 23034.45 | 4628.41 | 22772.40 | 23035.63 | 4565.07 | 10948.20 | 32534.36 | 21586.16 | -0.01 | -0.51 | 374.18 |
| quarter_4 | 9 | 153 | 23356.82 | 5035.83 | 23329.86 | 23194.62 | 5514.12 | 13058.68 | 35907.48 | 22848.80 | 0.28 | -0.50 | 407.12 |
For this, we may wish to create frequency tables and contigency tables
categorical <- stat325[,c("sex","marital_status","education_level","department")]
apply(categorical, 2, table)
## $sex
##
## Female Male
## 82 71
##
## $marital_status
##
## Divorced Married Single Widowed
## 32 68 30 23
##
## $education_level
##
## Bachelors Certificate Diploma Post graduate
## 39 37 37 40
##
## $department
##
## Agriculture Energy Financial services Manufacturing
## 33 15 27 21
## Mining Tourism
## 27 30
Contigency tables.
table(categorical$sex, categorical$department)
##
## Agriculture Energy Financial services Manufacturing Mining Tourism
## Female 19 10 15 12 11 15
## Male 14 5 12 9 16 15
2. Test the hypothesis of equality of proportions of female and male salespersons in each department and the entire organization
Testing the equality of sex proportions in each department
attach(stat325)
table1 <- table(department,sex)
prop.test(table1)
##
## 6-sample test for equality of proportions without continuity
## correction
##
## data: table1
## X-squared = 3.3385, df = 5, p-value = 0.648
## alternative hypothesis: two.sided
## sample estimates:
## prop 1 prop 2 prop 3 prop 4 prop 5 prop 6
## 0.5757576 0.6666667 0.5555556 0.5714286 0.4074074 0.5000000
The p-value is higher than 0.05. We fail to reject the null hypothesis. There is no significant difference in the proportion of male and female in each department.
Testing the equality of sex proportions in thet entire organization
table2 <- table(sex)
prop.test(table2)
##
## 1-sample proportions test with continuity correction
##
## data: table2, null probability 0.5
## X-squared = 0.65359, df = 1, p-value = 0.4188
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.4537919 0.6162707
## sample estimates:
## p
## 0.5359477
The p-value is greater than \[\alpha = 0.05\]. We fail to reject the null hypothesis. Ther is no significant difference in the proportion of male andd female in the entire organization.
3. Is there a significant relationship between marital status and education level?
table3 <-table(stat325$marital_status, stat325$education_level)
chisq.test(table3)
##
## Pearson's Chi-squared test
##
## data: table3
## X-squared = 19.236, df = 9, p-value = 0.02326
The p-value is less than \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant relationship between marital status and education level.
4. Is there a significant relationship between education level and department?
Edu_Dep<-table(stat325$education_level, stat325$department)
chisq.test(Edu_Dep)
## Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: Edu_Dep
## X-squared = 20.611, df = 15, p-value = 0.1497
The p-value is greater than \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant relationship between education level and department.
5. Compare the mean quarterly and annual sales by sex, age, marital status, education level and department. For age categorize young employees as those aged less than 35 years and old employees otherwise
# Data preprocessing
# Annual sales
stat325$annual_sales <- quarter_1+quarter_2+quarter_3+quarter_4
# Agegroup
stat325$age_group <- ifelse(age<35,"Young","Old")
attach(stat325)
## The following objects are masked from stat325 (pos = 3):
##
## age, appraisal_score, department, education_level, experience,
## marital_status, marketing_budget, number_of_assistants,
## personel_number, quarter_1, quarter_2, quarter_3, quarter_4, sex
Quarter 1 and sex
t.test(quarter_1~sex)
##
## Welch Two Sample t-test
##
## data: quarter_1 by sex
## t = -1.329, df = 142.69, p-value = 0.186
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2626.2673 514.5749
## sample estimates:
## mean in group Female mean in group Male
## 18471.64 19527.48
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 1 for male and female.
Quarter 1 and age group
t.test(quarter_1~age_group)
##
## Welch Two Sample t-test
##
## data: quarter_1 by age_group
## t = 7.241, df = 98.331, p-value = 1.001e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3736.412 6557.399
## sample estimates:
## mean in group Old mean in group Young
## 20509.04 15362.13
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 1 for young and old.
Quarter 1 and marital status
q1_mstatus <- aov(quarter_1~marital_status)
summary(q1_mstatus)
## Df Sum Sq Mean Sq F value Pr(>F)
## marital_status 3 5.074e+07 16914948 0.706 0.55
## Residuals 149 3.568e+09 23943771
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 1 in each marital status.
Quarter 1 and education level
q1_edu <- aov(quarter_1~education_level)
summary(q1_edu)
## Df Sum Sq Mean Sq F value Pr(>F)
## education_level 3 3.143e+08 104762883 4.724 0.00353 **
## Residuals 149 3.304e+09 22175021
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 1 in each education level.
Quarter 1 and department
q1_dept <- aov(quarter_1~department)
summary(q1_dept)
## Df Sum Sq Mean Sq F value Pr(>F)
## department 5 1.006e+08 20122778 0.841 0.523
## Residuals 147 3.518e+09 23930292
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 1 in each department.
Quarter 2 and sex
t.test(quarter_2~sex)
##
## Welch Two Sample t-test
##
## data: quarter_2 by sex
## t = -1.4582, df = 146.09, p-value = 0.1469
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2714.6023 409.5354
## sample estimates:
## mean in group Female mean in group Male
## 20168.53 21321.06
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 2 for male and female.
Quarter 2 and age group
t.test(quarter_2~age_group)
##
## Welch Two Sample t-test
##
## data: quarter_2 by age_group
## t = 6.2731, df = 96.79, p-value = 9.929e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3159.053 6083.273
## sample estimates:
## mean in group Old mean in group Young
## 22092.73 17471.57
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 2 for young and old.
Quarter 2 and marital status
q2_mstatus <- aov(quarter_2~marital_status)
summary(q2_mstatus)
## Df Sum Sq Mean Sq F value Pr(>F)
## marital_status 3 1.643e+08 54753924 2.361 0.0738 .
## Residuals 149 3.456e+09 23193850
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 2 in each marital status.
Quarter 2 and education level
q2_edu <- aov(quarter_2~education_level)
summary(q2_edu)
## Df Sum Sq Mean Sq F value Pr(>F)
## education_level 3 2.177e+08 72572857 3.178 0.0259 *
## Residuals 149 3.402e+09 22835080
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 2 in each education level.
Quarter 2 and department
q2_dept <- aov(quarter_2~department)
summary(q2_dept)
## Df Sum Sq Mean Sq F value Pr(>F)
## department 5 2.131e+08 42624441 1.839 0.109
## Residuals 147 3.407e+09 23177029
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 2 in each department.
Quarter 3 and sex
t.test(quarter_3~sex)
##
## Welch Two Sample t-test
##
## data: quarter_3 by sex
## t = -2.0617, df = 144, p-value = 0.04104
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3014.10777 -63.50156
## sample estimates:
## mean in group Female mean in group Male
## 22320.37 23859.17
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 for male and female.
Quarter 3 and age group
t.test(quarter_3~age_group)
##
## Welch Two Sample t-test
##
## data: quarter_3 by age_group
## t = 5.3694, df = 99.238, p-value = 5.206e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2406.514 5227.490
## sample estimates:
## mean in group Old mean in group Young
## 24182.05 20365.05
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 for young and old.
Quarter 3 and marital status
q3_mstatus <- aov(quarter_3~marital_status)
summary(q3_mstatus)
## Df Sum Sq Mean Sq F value Pr(>F)
## marital_status 3 2.030e+08 67661315 3.302 0.0221 *
## Residuals 149 3.053e+09 20491172
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 in each marital status.
Quarter 3 and education level
q3_edu <- aov(quarter_3~education_level)
summary(q3_edu)
## Df Sum Sq Mean Sq F value Pr(>F)
## education_level 3 2.769e+08 92287639 4.615 0.00406 **
## Residuals 149 2.979e+09 19995340
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 in each education level.
Quarter 3 and department
q3_dept <- aov(quarter_3~department)
summary(q3_dept)
## Df Sum Sq Mean Sq F value Pr(>F)
## department 5 1.172e+08 23449551 1.098 0.364
## Residuals 147 3.139e+09 21353203
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 3 in each department.
Quarter 4 and sex
t.test(quarter_4~sex)
##
## Welch Two Sample t-test
##
## data: quarter_4 by sex
## t = -0.95341, df = 143.38, p-value = 0.342
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2407.5750 840.7809
## sample estimates:
## mean in group Female mean in group Male
## 22993.28 23776.68
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 4 for male and female.
Quarter 4 and age group
t.test(quarter_4~age_group)
##
## Welch Two Sample t-test
##
## data: quarter_4 by age_group
## t = 6.3998, df = 115.51, p-value = 3.455e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3158.437 5989.766
## sample estimates:
## mean in group Old mean in group Young
## 24732.04 20157.94
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 4 for young and old.
Quarter 4 and marital status
q4_mstatus <- aov(quarter_4~marital_status)
summary(q4_mstatus)
## Df Sum Sq Mean Sq F value Pr(>F)
## marital_status 3 1.364e+08 45451395 1.821 0.146
## Residuals 149 3.718e+09 24955051
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 4 in each marital status.
Quarter 4 and education level
q4_edu <- aov(quarter_4~education_level)
summary(q4_edu)
## Df Sum Sq Mean Sq F value Pr(>F)
## education_level 3 2.401e+08 80043657 3.3 0.0221 *
## Residuals 149 3.615e+09 24258562
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 4 in each education level.
Quarter 4 and department
q4_dept <- aov(quarter_4~department)
summary(q4_dept)
## Df Sum Sq Mean Sq F value Pr(>F)
## department 5 1.798e+08 35954460 1.438 0.214
## Residuals 147 3.675e+09 24999214
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 4 in each department.
Annual Sales and sex
t.test(annual_sales~sex)
##
## Welch Two Sample t-test
##
## data: annual_sales by sex
## t = -1.6724, df = 143.52, p-value = 0.09662
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9885.2905 824.1276
## sample estimates:
## mean in group Female mean in group Male
## 83953.81 88484.40
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean annual sales for male and female.
Annual Sales and age group
t.test(annual_sales~age_group)
##
## Welch Two Sample t-test
##
## data: annual_sales by age_group
## t = 8.0544, df = 116.3, p-value = 7.765e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 13693.86 22624.49
## sample estimates:
## mean in group Old mean in group Young
## 91515.86 73356.69
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean annual sales for young and old.
Annual Sales and marital status
an_mstatus <- aov(annual_sales~marital_status)
summary(an_mstatus)
## Df Sum Sq Mean Sq F value Pr(>F)
## marital_status 3 1.935e+09 645122373 2.373 0.0726 .
## Residuals 149 4.050e+10 271838511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean annual sales in each marital status.
Annual Sales and education level
an_edu <- aov(annual_sales~education_level)
summary(an_edu)
## Df Sum Sq Mean Sq F value Pr(>F)
## education_level 3 3.798e+09 1.266e+09 4.881 0.00289 **
## Residuals 149 3.864e+10 2.593e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean annual sales in each education level.
Annual Sales and department
an_dept <- aov(annual_sales~department)
summary(an_dept)
## Df Sum Sq Mean Sq F value Pr(>F)
## department 5 1.901e+09 380205609 1.379 0.236
## Residuals 147 4.054e+10 275770593
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean annual sales in each department.
6. Is there a quarter where the sales were significantly different from others?
Quarter 1 and Quarter 2
t.test(quarter_1,quarter_2,paired = TRUE)
##
## Paired t-test
##
## data: quarter_1 and quarter_2
## t = -5.3236, df = 152, p-value = 3.591e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2388.165 -1095.351
## sample estimates:
## mean of the differences
## -1741.758
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 1 and quarter 2. Quarter 1 and Quarter 3
t.test(quarter_1,quarter_3,paired = TRUE)
##
## Paired t-test
##
## data: quarter_1 and quarter_3
## t = -12.297, df = 152, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4727.220 -3418.479
## sample estimates:
## mean of the differences
## -4072.85
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 1 and quarter 3.
Quarter 1 and Quarter 4
t.test(quarter_1,quarter_4,paired = TRUE)
##
## Paired t-test
##
## data: quarter_1 and quarter_4
## t = -12.492, df = 152, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5090.333 -3700.098
## sample estimates:
## mean of the differences
## -4395.216
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 1 and quarter 4.
Quarter 2 and Quarter 3
t.test(quarter_2,quarter_3,paired = TRUE)
##
## Paired t-test
##
## data: quarter_2 and quarter_3
## t = -7.6167, df = 152, p-value = 2.581e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2935.752 -1726.431
## sample estimates:
## mean of the differences
## -2331.092
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 2 and quarter 3.
Quarter 2 and Quarter 4
t.test(quarter_2,quarter_4,paired = TRUE)
##
## Paired t-test
##
## data: quarter_2 and quarter_4
## t = -8.0137, df = 152, p-value = 2.726e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3307.638 -1999.277
## sample estimates:
## mean of the differences
## -2653.458
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 4 and quarter 4.
Quarter 3 and Quarter 4
t.test(quarter_3,quarter_4,paired = TRUE)
##
## Paired t-test
##
## data: quarter_3 and quarter_4
## t = -1.0227, df = 152, p-value = 0.3081
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -945.1123 300.3803
## sample estimates:
## mean of the differences
## -322.366
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant differnce in mean quarter 3 and quarter 4.
7. Repeat 5) but use the non-parametric approach. Quarter 1 and sex
wilcox.test(quarter_1~sex)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_1 by sex
## W = 2481, p-value = 0.1161
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 1 for male and female.
Quarter 1 and age group
wilcox.test(quarter_1~age_group)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_1 by age_group
## W = 3993, p-value = 1.104e-09
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 1 for young and old.
Quarter 1 and marital status
stat325 <- transform(stat325,marital_status = as.factor(marital_status),education_level = as.factor(education_level),department = as.factor(department))
attach(stat325)
## The following objects are masked from stat325 (pos = 3):
##
## age, age_group, annual_sales, appraisal_score, department,
## education_level, experience, marital_status, marketing_budget,
## number_of_assistants, personel_number, quarter_1, quarter_2,
## quarter_3, quarter_4, sex
## The following objects are masked from stat325 (pos = 4):
##
## age, appraisal_score, department, education_level, experience,
## marital_status, marketing_budget, number_of_assistants,
## personel_number, quarter_1, quarter_2, quarter_3, quarter_4, sex
kruskal.test(quarter_1~marital_status)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_1 by marital_status
## Kruskal-Wallis chi-squared = 2.4976, df = 3, p-value = 0.4757
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 1 in each marital status.
Quarter 1 and education level
kruskal.test(quarter_1~education_level)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_1 by education_level
## Kruskal-Wallis chi-squared = 14.312, df = 3, p-value = 0.00251
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 1 in each education level.
Quarter 1 and department
kruskal.test(quarter_1~department)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_1 by department
## Kruskal-Wallis chi-squared = 3.8982, df = 5, p-value = 0.5642
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 1 in each department.
Quarter 2 and sex
wilcox.test(quarter_2~sex)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_2 by sex
## W = 2573, p-value = 0.2169
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 2 for male and female.
Quarter 2 and age group
wilcox.test(quarter_2~age_group)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_2 by age_group
## W = 3802, p-value = 9.625e-08
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 2 for young and old.
Quarter 2 and marital status
kruskal.test(quarter_2~marital_status)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_2 by marital_status
## Kruskal-Wallis chi-squared = 6.8368, df = 3, p-value = 0.07728
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 2 in each marital status.
Quarter 2 and education level
kruskal.test(quarter_2~education_level)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_2 by education_level
## Kruskal-Wallis chi-squared = 8.9715, df = 3, p-value = 0.02967
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 2 in each education level.
Quarter 2 and department
kruskal.test(quarter_2~department)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_2 by department
## Kruskal-Wallis chi-squared = 11.446, df = 5, p-value = 0.04322
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 2 in each department.
Quarter 3 and sex
wilcox.test(quarter_3~sex)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_3 by sex
## W = 2419, p-value = 0.07216
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 3 for male and female.
Quarter 3 and age group
wilcox.test(quarter_3~age_group)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_3 by age_group
## W = 3611, p-value = 4.792e-06
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 for young and old.
Quarter 3 and marital status
kruskal.test(quarter_3~marital_status)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_3 by marital_status
## Kruskal-Wallis chi-squared = 10.513, df = 3, p-value = 0.01468
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 in each marital status.
Quarter 3 and education level
kruskal.test(quarter_3~education_level)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_3 by education_level
## Kruskal-Wallis chi-squared = 12.948, df = 3, p-value = 0.004751
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 3 in each education level.
Quarter 3 and department
kruskal.test(quarter_3~department)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_3 by department
## Kruskal-Wallis chi-squared = 6.247, df = 5, p-value = 0.2829
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 3 in each department.
Quarter 4 and sex
wilcox.test(quarter_4~sex)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_4 by sex
## W = 2691, p-value = 0.422
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 4 for male and female.
Quarter 4 and age group
wilcox.test(quarter_4~age_group)
##
## Wilcoxon rank sum test with continuity correction
##
## data: quarter_4 by age_group
## W = 3772, p-value = 1.845e-07
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 4 for young and old.
Quarter 4 and marital status
kruskal.test(quarter_4~marital_status)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_4 by marital_status
## Kruskal-Wallis chi-squared = 5.0574, df = 3, p-value = 0.1676
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 4 in each marital status.
Quarter 4 and education level
kruskal.test(quarter_4~education_level)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_4 by education_level
## Kruskal-Wallis chi-squared = 10.02, df = 3, p-value = 0.01839
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean quarter 4 in each education level.
Quarter 4 and department
kruskal.test(quarter_4~department)
##
## Kruskal-Wallis rank sum test
##
## data: quarter_4 by department
## Kruskal-Wallis chi-squared = 7.731, df = 5, p-value = 0.1717
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean quarter 4 in each department.
Annual Sales and sex
wilcox.test(annual_sales~sex)
##
## Wilcoxon rank sum test with continuity correction
##
## data: annual_sales by sex
## W = 2489, p-value = 0.1231
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean annual sales for male and female.
Annual Sales and age group
wilcox.test(annual_sales~age_group)
##
## Wilcoxon rank sum test with continuity correction
##
## data: annual_sales by age_group
## W = 4042, p-value = 3.203e-10
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean annual sales for young and old.
Annual Sales and marital status
kruskal.test(annual_sales~marital_status)
##
## Kruskal-Wallis rank sum test
##
## data: annual_sales by marital_status
## Kruskal-Wallis chi-squared = 7.3164, df = 3, p-value = 0.06247
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean annual sales in each marital status.
Annual Sales and education level
kruskal.test(annual_sales~education_level)
##
## Kruskal-Wallis rank sum test
##
## data: annual_sales by education_level
## Kruskal-Wallis chi-squared = 13.836, df = 3, p-value = 0.003138
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant difference in mean annual sales in each education level.
Annual Sales and department
kruskal.test(annual_sales~department)
##
## Kruskal-Wallis rank sum test
##
## data: annual_sales by department
## Kruskal-Wallis chi-squared = 7.9549, df = 5, p-value = 0.1587
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant difference in mean annual sales in each department.
8. Repeat 6) but use the non-parametric approach
Quarter 1 and Quarter 2
wilcox.test(quarter_1,quarter_2,paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: quarter_1 and quarter_2
## V = 3261, p-value = 1.678e-06
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 1 and quarter 2. Quarter 1 and Quarter 3
wilcox.test(quarter_1,quarter_3,paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: quarter_1 and quarter_3
## V = 966, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 1 and quarter 3.
Quarter 1 and Quarter 4
wilcox.test(quarter_1,quarter_4,paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: quarter_1 and quarter_4
## V = 955, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 1 and quarter 4.
Quarter 2 and Quarter 3
wilcox.test(quarter_2,quarter_3,paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: quarter_2 and quarter_3
## V = 2301.5, p-value = 6.299e-11
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 2 and quarter 3.
Quarter 2 and Quarter 4
wilcox.test(quarter_2,quarter_4,paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: quarter_2 and quarter_4
## V = 2223.5, p-value = 2.413e-11
## alternative hypothesis: true location shift is not equal to 0
The p-value is less than the \[\alpha = 0.05\]. We reject the null hypothesis. There is a significant differnce in mean quarter 4 and quarter 4.
Quarter 3 and Quarter 4
wilcox.test(quarter_3,quarter_4,paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: quarter_3 and quarter_4
## V = 5622.5, p-value = 0.6261
## alternative hypothesis: true location shift is not equal to 0
The p-value is greater than the \[\alpha = 0.05\]. We fail to reject the null hypothesis. There is no significant differnce in mean quarter 3 and quarter 4.
9. Consider the relationship between age, number of assistants, experience, marketing budget, appraisal score, quarterly and annual sales
a. Determine the product moment correlation coefficients. Interpret your result
numeric_vars$annual_sales <- stat325$annual_sales
knitr::kable(round(cor(numeric_vars),2))
| age | number_of_assistants | experience | marketing_budget | appraisal_score | quarter_1 | quarter_2 | quarter_3 | quarter_4 | annual_sales | |
|---|---|---|---|---|---|---|---|---|---|---|
| age | 1.00 | 0.79 | 0.87 | 0.76 | -0.10 | 0.66 | 0.65 | 0.65 | 0.64 | 0.76 |
| number_of_assistants | 0.79 | 1.00 | 0.91 | 0.96 | -0.13 | 0.77 | 0.72 | 0.74 | 0.76 | 0.87 |
| experience | 0.87 | 0.91 | 1.00 | 0.88 | -0.16 | 0.74 | 0.71 | 0.72 | 0.72 | 0.84 |
| marketing_budget | 0.76 | 0.96 | 0.88 | 1.00 | -0.12 | 0.77 | 0.73 | 0.73 | 0.76 | 0.87 |
| appraisal_score | -0.10 | -0.13 | -0.16 | -0.12 | 1.00 | 0.03 | 0.11 | 0.01 | 0.09 | 0.07 |
| quarter_1 | 0.66 | 0.77 | 0.74 | 0.77 | 0.03 | 1.00 | 0.66 | 0.63 | 0.62 | 0.84 |
| quarter_2 | 0.65 | 0.72 | 0.71 | 0.73 | 0.11 | 0.66 | 1.00 | 0.68 | 0.66 | 0.87 |
| quarter_3 | 0.65 | 0.74 | 0.72 | 0.73 | 0.01 | 0.63 | 0.68 | 1.00 | 0.68 | 0.86 |
| quarter_4 | 0.64 | 0.76 | 0.72 | 0.76 | 0.09 | 0.62 | 0.66 | 0.68 | 1.00 | 0.86 |
| annual_sales | 0.76 | 0.87 | 0.84 | 0.87 | 0.07 | 0.84 | 0.87 | 0.86 | 0.86 | 1.00 |
b. Compute the Spearman’s Rank Correlation coefficients. Interpret your result.
knitr::kable(round(cor(numeric_vars,method = "spearman"),2))
| age | number_of_assistants | experience | marketing_budget | appraisal_score | quarter_1 | quarter_2 | quarter_3 | quarter_4 | annual_sales | |
|---|---|---|---|---|---|---|---|---|---|---|
| age | 1.00 | 0.79 | 0.88 | 0.77 | -0.12 | 0.67 | 0.61 | 0.63 | 0.63 | 0.75 |
| number_of_assistants | 0.79 | 1.00 | 0.90 | 0.96 | -0.14 | 0.77 | 0.71 | 0.74 | 0.76 | 0.87 |
| experience | 0.88 | 0.90 | 1.00 | 0.88 | -0.16 | 0.74 | 0.67 | 0.71 | 0.72 | 0.84 |
| marketing_budget | 0.77 | 0.96 | 0.88 | 1.00 | -0.11 | 0.77 | 0.72 | 0.73 | 0.76 | 0.88 |
| appraisal_score | -0.12 | -0.14 | -0.16 | -0.11 | 1.00 | 0.02 | 0.12 | 0.00 | 0.07 | 0.07 |
| quarter_1 | 0.67 | 0.77 | 0.74 | 0.77 | 0.02 | 1.00 | 0.64 | 0.64 | 0.64 | 0.85 |
| quarter_2 | 0.61 | 0.71 | 0.67 | 0.72 | 0.12 | 0.64 | 1.00 | 0.64 | 0.64 | 0.84 |
| quarter_3 | 0.63 | 0.74 | 0.71 | 0.73 | 0.00 | 0.64 | 0.64 | 1.00 | 0.68 | 0.85 |
| quarter_4 | 0.63 | 0.76 | 0.72 | 0.76 | 0.07 | 0.64 | 0.64 | 0.68 | 1.00 | 0.86 |
| annual_sales | 0.75 | 0.87 | 0.84 | 0.88 | 0.07 | 0.85 | 0.84 | 0.85 | 0.86 | 1.00 |
10. Fit multiple linear regression models for quarterly and annual sales on age, number of assistants, experience, marketing budget, and appraisal score. Comment on the significance of the fitted models as well as the significance of each of the independent variables
Quarter 1
fit1 <- lm(quarter_1 ~ age + number_of_assistants + experience + marketing_budget + marketing_budget + appraisal_score, data = stat325)
summary(fit1)
##
## Call:
## lm(formula = quarter_1 ~ age + number_of_assistants + experience +
## marketing_budget + marketing_budget + appraisal_score, data = stat325)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5789 -2454 -131 2114 6409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.232e+03 2.666e+03 1.962 0.05162 .
## age 1.729e+01 4.663e+01 0.371 0.71129
## number_of_assistants 1.292e+02 1.315e+02 0.983 0.32730
## experience 1.942e+02 1.232e+02 1.577 0.11694
## marketing_budget 1.994e-03 9.494e-04 2.100 0.03747 *
## appraisal_score 8.065e+01 2.972e+01 2.714 0.00745 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3009 on 147 degrees of freedom
## Multiple R-squared: 0.6322, Adjusted R-squared: 0.6197
## F-statistic: 50.53 on 5 and 147 DF, p-value: < 2.2e-16
Quarter 2
fit2 <- lm(quarter_2 ~ age + number_of_assistants + experience + marketing_budget + marketing_budget + appraisal_score, data = stat325)
summary(fit2)
##
## Call:
## lm(formula = quarter_2 ~ age + number_of_assistants + experience +
## marketing_budget + marketing_budget + appraisal_score, data = stat325)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6402.4 -2179.7 -346.7 2870.0 6367.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.961e+03 2.784e+03 1.064 0.289252
## age 5.363e+01 4.869e+01 1.102 0.272433
## number_of_assistants 6.921e+01 1.373e+02 0.504 0.614982
## experience 1.777e+02 1.286e+02 1.382 0.169017
## marketing_budget 2.047e-03 9.914e-04 2.064 0.040746 *
## appraisal_score 1.228e+02 3.103e+01 3.957 0.000118 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3142 on 147 degrees of freedom
## Multiple R-squared: 0.5992, Adjusted R-squared: 0.5855
## F-statistic: 43.95 on 5 and 147 DF, p-value: < 2.2e-16
Quarter 3
fit3 <- lm(quarter_3 ~ age + number_of_assistants + experience + marketing_budget + marketing_budget + appraisal_score, data = stat325)
summary(fit3)
##
## Call:
## lm(formula = quarter_3 ~ age + number_of_assistants + experience +
## marketing_budget + marketing_budget + appraisal_score, data = stat325)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6583 -2099 277 2229 5861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.057e+04 2.710e+03 3.899 0.000146 ***
## age 4.578e+01 4.739e+01 0.966 0.335573
## number_of_assistants 2.105e+02 1.336e+02 1.575 0.117450
## experience 1.366e+02 1.252e+02 1.092 0.276745
## marketing_budget 9.791e-04 9.649e-04 1.015 0.311914
## appraisal_score 6.398e+01 3.020e+01 2.118 0.035825 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3058 on 147 degrees of freedom
## Multiple R-squared: 0.5779, Adjusted R-squared: 0.5635
## F-statistic: 40.24 on 5 and 147 DF, p-value: < 2.2e-16
Quarter 4
fit4 <- lm(quarter_4 ~ age + number_of_assistants + experience + marketing_budget + marketing_budget + appraisal_score, data = stat325)
summary(fit4)
##
## Call:
## lm(formula = quarter_4 ~ age + number_of_assistants + experience +
## marketing_budget + marketing_budget + appraisal_score, data = stat325)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5466.6 -2282.3 -678.5 2356.2 7074.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.275e+03 2.731e+03 2.298 0.022996 *
## age 1.496e+01 4.776e+01 0.313 0.754534
## number_of_assistants 1.671e+02 1.347e+02 1.240 0.216821
## experience 1.474e+02 1.261e+02 1.168 0.244605
## marketing_budget 2.144e-03 9.725e-04 2.205 0.029005 *
## appraisal_score 1.212e+02 3.044e+01 3.980 0.000108 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3082 on 147 degrees of freedom
## Multiple R-squared: 0.6378, Adjusted R-squared: 0.6254
## F-statistic: 51.76 on 5 and 147 DF, p-value: < 2.2e-16
Annual Sales
fitAn <- lm(annual_sales ~ age + number_of_assistants + experience + marketing_budget + marketing_budget + appraisal_score, data = stat325)
summary(fitAn)
##
## Call:
## lm(formula = annual_sales ~ age + number_of_assistants + experience +
## marketing_budget + marketing_budget + appraisal_score, data = stat325)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13408 -4812 -1364 4629 19835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.503e+04 6.331e+03 3.954 0.000119 ***
## age 1.317e+02 1.107e+02 1.189 0.236272
## number_of_assistants 5.760e+02 3.122e+02 1.845 0.067106 .
## experience 6.560e+02 2.924e+02 2.243 0.026384 *
## marketing_budget 7.163e-03 2.254e-03 3.177 0.001811 **
## appraisal_score 3.886e+02 7.057e+01 5.506 1.59e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7145 on 147 degrees of freedom
## Multiple R-squared: 0.8232, Adjusted R-squared: 0.8172
## F-statistic: 136.9 on 5 and 147 DF, p-value: < 2.2e-16
\[\text{In case of any queries text or WhatsApp +245724555216. }\]