Module: 208251 Regression Analysis and Non-Parametric
Statistics
Instructor: Wisunee Puggard
Affiliation: Department of Statistics, Faculty of Science,
Chiang Mai University.
Objectives: Students are able to
perform descriptive statistics
apply appropriate non-parametric statistics tests to answer
reseach questions of interest.
Quality of work life is one of the most important factors for human
motivating and improving of job satisfaction. The current study was
carried out aimed to determine the relationship between quality of work
life and job satisfaction in faculty members of a University. In this
descriptive-analytic study, 20 faculty members were sampled. The job
satisfaction questionnaire was used for data collection.
Import data: Data file is provided in MS team.
data=read.csv('/Users/wisuneepuggard/Desktop/LAB208251/JobSatisfaction.csv',header=TRUE)
data #To view dataset
## Obs Age Gender Status Education Smoke Income Life.Satisfaction
## 1 1 33 Male Married Undergrad No 15050 3
## 2 2 27 Male Married Undergrad Yes 20600 2
## 3 3 22 Female Single Master Yes 15200 4
## 4 4 34 Male Married Undergrad No 22500 3
## 5 5 24 Female Single High school No 30400 4
## 6 6 37 Female Divorced Undergrad No 33850 3
## 7 7 25 Female Single Master Yes 15100 2
## 8 8 31 Male Married Undergrad Yes 34900 1
## 9 9 40 Male Married Undergrad No 23200 1
## 10 10 26 Male Single Secondary school Yes 32000 1
## 11 11 41 Female Married High school Yes 18650 2
## 12 12 38 Male Married High school No 28570 2
## 13 13 28 Female Single Undergrad No 26000 2
## 14 14 44 Male Divorced High school Yes 39560 3
## 15 15 21 Female Married Undergrad Yes 18000 4
## 16 16 25 Male Single Undergrad No 13500 2
## 17 17 30 Female Single Secondary school No 22350 1
## 18 18 42 Male Married Undergrad Yes 35000 3
## 19 19 43 Female Married Undergrad No 28000 4
## 20 20 20 Male Single Master No 15000 3
## Job.Satisfaction
## 1 3
## 2 4
## 3 1
## 4 4
## 5 3
## 6 3
## 7 3
## 8 2
## 9 3
## 10 3
## 11 1
## 12 3
## 13 4
## 14 2
## 15 3
## 16 1
## 17 1
## 18 3
## 19 3
## 20 4
str(data) #To view type of data
## 'data.frame': 20 obs. of 9 variables:
## $ Obs : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Age : int 33 27 22 34 24 37 25 31 40 26 ...
## $ Gender : chr "Male" "Male" "Female" "Male" ...
## $ Status : chr "Married" "Married" "Single" "Married" ...
## $ Education : chr "Undergrad" "Undergrad" "Master " "Undergrad" ...
## $ Smoke : chr "No" "Yes" "Yes" "No" ...
## $ Income : int 15050 20600 15200 22500 30400 33850 15100 34900 23200 32000 ...
## $ Life.Satisfaction: int 3 2 4 3 4 3 2 1 1 1 ...
## $ Job.Satisfaction : int 3 4 1 4 3 3 3 2 3 3 ...
Your task: Use appropriate non-parametric statistics tests to answers
the questions with significance level of 0.05.
Research questions:
1) Is the median of age equal to 30 years old? 2) Is the mean income
greater than 20000?
3) Is there any difference between income of male and female
staff?
4) Is there any difference between life satisfaction level of smoking
and non-smoke staff?
5) Is there any difference between income of people with different
education background?
6) Does income relate to job satisfaction level?
7) Is marriage status related (or dependent) to life satisfaction
level?
Is the median of age equal to 30 years old?
summary(data$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 25.00 30.50 31.55 38.50 44.00
par(mfrow=c(1,2)) #create plots layout with 1 row and 2 columns
hist(data$Age) #create histogram of age
boxplot(data$Age) #create boxplot of age
#perform Wilcoxon test for one sample
wilcox.test(x=data$Age, mu=30, alternative = "two.sided")
## Warning in wilcox.test.default(x = data$Age, mu = 30, alternative =
## "two.sided"): cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data$Age, mu = 30, alternative =
## "two.sided"): cannot compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$Age
## V = 116, p-value = 0.4092
## alternative hypothesis: true location is not equal to 30
Is the mean income greater than 20000?
summary(data$Income)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13500 17300 22850 24372 30800 39560
par(mfrow=c(1,2)) #create plots layout with 1 row and 2 columns
hist(data$Income) #create histogram of age
boxplot(data$Income) #create boxplot of age
#perform Wilcoxon test for one sample
wilcox.test(x=data$Income, mu=20000, alternative = "greater")
##
## Wilcoxon signed rank exact test
##
## data: data$Income
## V = 159, p-value = 0.02203
## alternative hypothesis: true location is greater than 20000
Is there any difference between mean income of male and female staff?
summary(data$Income[data$Gender=="Male"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13500 17825 23200 25444 33450 39560
summary(data$Income[data$Gender=="Female"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15100 18000 22350 23061 28000 33850
boxplot(data$Income~data$Gender,xlab="Gender",ylab="Income")
#perform Mann-Whitney test for two independent groups
wilcox.test(x=data$Income[data$Gender=="Male"], y=data$Income[data$Gender=="Female"],
paired = FALSE, alternative = "two.sided")
##
## Wilcoxon rank sum exact test
##
## data: data$Income[data$Gender == "Male"] and data$Income[data$Gender == "Female"]
## W = 56, p-value = 0.6556
## alternative hypothesis: true location shift is not equal to 0
Is there any difference between mean of life satisfaction
level of smoking and non-smoke staff?
#install.packages("psych") # you need to install package "psych" prior to use summary by groups
library(psych)
#Find summary of life satisfaction of smoking and non-smoking group!!
describeBy(data$Life.Satisfaction,data$Smoke)
##
## Descriptive statistics by group
## group: No
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 11 2.55 1.04 3 2.56 1.48 1 4 3 -0.11 -1.36 0.31
## ------------------------------------------------------------
## group: Yes
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 9 2.44 1.13 2 2.44 1.48 1 4 3 0.12 -1.59 0.38
boxplot(data$Life.Satisfaction~data$Smoke,xlab="Smoke",ylab="Life Satisfaction")
#perform Mann-Whitney test for two independent groups
wilcox.test(x=data$Life.Satisfaction[data$Smoke=="No"], y=data$Life.Satisfaction[data$Smoke=="Yes"],
paired = FALSE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data$Life.Satisfaction[data$Smoke == :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: data$Life.Satisfaction[data$Smoke == "No"] and data$Life.Satisfaction[data$Smoke == "Yes"]
## W = 52.5, p-value = 0.8441
## alternative hypothesis: true location shift is not equal to 0
Is there any difference between mean income of people with
different education background?
#Find summary of income for each education group!!
boxplot(data$Income~data$Education,xlab="Education",ylab="Income")
kruskal.test(data$Income ~ data$Education, data = data)
##
## Kruskal-Wallis rank sum test
##
## data: data$Income by data$Education
## Kruskal-Wallis chi-squared = 5.6472, df = 3, p-value = 0.1301
Does income relate to job satisfaction level?
plot(data$Job.Satisfaction,data$Income,xlab="Job satisfaction",ylab="Income")
cor.test( ~ data$Job.Satisfaction + data$Income,
data=data,method = "spearman",
continuity = FALSE,conf.level = 0.95,
alternative = "two.sided")
## Warning in cor.test.default(x = c(3L, 4L, 1L, 4L, 3L, 3L, 3L, 2L, 3L, 3L, :
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: data$Job.Satisfaction and data$Income
## S = 1312.7, p-value = 0.9567
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.01297123
Is marriage status related (or dependent) to life
satisfaction level?
table(data$Status,data$Life.Satisfaction)
##
## 1 2 3 4
## Divorced 0 0 2 0
## Married 2 3 3 2
## Single 2 3 1 2
chisq.test(data$Status,data$Life.Satisfaction)
## Warning in chisq.test(data$Status, data$Life.Satisfaction): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: data$Status and data$Life.Satisfaction
## X-squared = 5.8333, df = 6, p-value = 0.4421
Are employees of all types of marriage statuses equal?
freq = table(data$Status)
prob = c(1/3,1/3,1/3)
chisq.test(freq,p=prob)
##
## Chi-squared test for given probabilities
##
## data: freq
## X-squared = 5.2, df = 2, p-value = 0.07427
#————————————————————————————————
#————————————————————————————————
A researcher wants to examine whether music has an effect on the
perceived psychological effort required to perform an exercise session.
To test whether music has an effect on the perceived psychological
effort required to perform an exercise session, the researcher recruited
12 runners who each ran three times on a treadmill for 30 minutes. For
consistency, the treadmill speed was the same for all three runs. In a
random order, each subject ran: (a) listening to no music at all; (b)
listening to classical music; and (c) listening to dance music. At the
end of each run, subjects were asked to record how hard the running
session felt on a scale of 1 to 10, with 1 being easy and 10 extremely
hard.
1. Import Practice2_data.csv file
Practice2_Data <- read.csv('/Users/wisuneepuggard/Desktop/LAB208251/Practice2_Data.csv',header=TRUE)
data2 = data.frame(Practice2_Data)
str(data2)
## 'data.frame': 36 obs. of 3 variables:
## $ Runner: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Type : chr "None" "None" "None" "None" ...
## $ Scale : int 8 7 6 8 5 9 7 8 8 7 ...
2. Explore data
tapply(data2$Scale, data2$Type, summary)
## $Classical
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.000 6.750 7.500 7.417 8.000 9.000
##
## $Dance
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 6.0 6.5 6.5 7.0 8.0
##
## $None
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000 7.000 7.500 7.417 8.000 9.000
boxplot(data2$Scale ~ data2$Type,xlab="Type of music",ylab="Scale")
3. Research questions:
A) Is there a difference in perceived effort based on music type?
B) Is there a difference between None to Classical ?
C) Is there a difference between None to Dance ?
D) Is the Classical higher than Dance?
Your task: Use appropriate non-parametric statistics tests to answers
the questions with significance level of 0.05.
#A)
friedman.test(y=data2$Scale,group=data2$Type,blocks = data2$Runner)
##
## Friedman rank sum test
##
## data: data2$Scale, data2$Type and data2$Runner
## Friedman chi-squared = 7.6, df = 2, p-value = 0.02237
#B)
wilcox.test(x=data2$Scale[data2$Type=="None"], y=data2$Scale[data2$Type=="Classical"],
paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data2$Scale[data2$Type == "None"] and data2$Scale[data2$Type == "Classical"]
## V = 23, p-value = 1
## alternative hypothesis: true location shift is not equal to 0
#C)
wilcox.test(x=data2$Scale[data2$Type=="None"], y=data2$Scale[data2$Type=="Dance"],
paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data2$Scale[data2$Type == "None"] and data2$Scale[data2$Type == "Dance"]
## V = 36, p-value = 0.01038
## alternative hypothesis: true location shift is not equal to 0
#D)
wilcox.test(x=data2$Scale[data2$Type=="Classical"], y=data2$Scale[data2$Type=="Dance"],
paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "Classical"], :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "Classical"], :
## cannot compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data2$Scale[data2$Type == "Classical"] and data2$Scale[data2$Type == "Dance"]
## V = 24.5, p-value = 0.08461
## alternative hypothesis: true location shift is not equal to 0
To summarize: From the descriptive statistics, the medians (Q1 to Q3) of perceived effort levels for the no music, classical and dance music running trial were 7.5 (7 to 8), 7.5 (6.25 to 8) and 6.5 (6 to 7), respectively. There was a statistically significant difference in perceived effort depending on which type of music was listened to whilst running (from the Friedman test with T=7.6, p=0.02237 ). The Wilcoxon signed-rank tests was conducted to test the difference between two types of music. There were no significant differences between the no music and classical music running trials (T = 23, p = 1.00) and between the classical and dance music running trials (T=24.5, p = 0.08461). However, there was a statistically significant reduction in perceived effort in the dance music vs no music trial (T=36, p = 0.01038).
—————————————————————————————————–
—————————————————————————————————–
You must submit:
R file with your codes, and
Answer sheet with your handwriting
On Mango, see the deadline there!
Data collected for a sample of 50 in-store transactions during one day in 2019. A store’s manager wants to use this sample data to learn about customer’s behavior. Use the methods of descriptive statistics and nonparametric statistics presented in this module to summarize and analyze the data and comments on your findings. Comments on any finding that appear interesting and of potential value to the stores manager.
Data is provided on Mango Canvas ‘AStore_Data.csv’
Your task: Use appropriate non-parametric statistics tests to answers the research questions with significance level of 0.05.
Research questions:
A. Is the method of payment related to the sales?
B. Is the number of purchased items related to discounted amount?
C. Is the customer’s age related to discounted amount?
D. Is the method of payment related to marital status?
E. Is the number of customers paying by each method comparable?