Module: 208251 Regression Analysis and Non-Parametric Statistics

Instructor: Wisunee Puggard

Affiliation: Department of Statistics, Faculty of Science, Chiang Mai University.

Objectives: Students are able to

  1. perform descriptive statistics

  2. apply appropriate non-parametric statistics tests to answer reseach questions of interest.

Practice 1: Job and Life satisfaction case study

Quality of work life is one of the most important factors for human motivating and improving of job satisfaction. The current study was carried out aimed to determine the relationship between quality of work life and job satisfaction in faculty members of a University. In this descriptive-analytic study, 20 faculty members were sampled. The job satisfaction questionnaire was used for data collection.

Import data: Data file is provided in MS team.

data=read.csv('/Users/wisuneepuggard/Desktop/LAB208251/JobSatisfaction.csv',header=TRUE)
data   #To view dataset
##    Obs Age Gender   Status        Education Smoke Income Life.Satisfaction
## 1    1  33   Male  Married        Undergrad    No  15050                 3
## 2    2  27   Male  Married        Undergrad   Yes  20600                 2
## 3    3  22 Female   Single          Master    Yes  15200                 4
## 4    4  34   Male  Married        Undergrad    No  22500                 3
## 5    5  24 Female   Single      High school    No  30400                 4
## 6    6  37 Female Divorced        Undergrad    No  33850                 3
## 7    7  25 Female   Single          Master    Yes  15100                 2
## 8    8  31   Male  Married        Undergrad   Yes  34900                 1
## 9    9  40   Male  Married        Undergrad    No  23200                 1
## 10  10  26   Male   Single Secondary school   Yes  32000                 1
## 11  11  41 Female  Married      High school   Yes  18650                 2
## 12  12  38   Male  Married      High school    No  28570                 2
## 13  13  28 Female   Single        Undergrad    No  26000                 2
## 14  14  44   Male Divorced      High school   Yes  39560                 3
## 15  15  21 Female  Married        Undergrad   Yes  18000                 4
## 16  16  25   Male   Single        Undergrad    No  13500                 2
## 17  17  30 Female   Single Secondary school    No  22350                 1
## 18  18  42   Male  Married        Undergrad   Yes  35000                 3
## 19  19  43 Female  Married        Undergrad    No  28000                 4
## 20  20  20   Male   Single          Master     No  15000                 3
##    Job.Satisfaction
## 1                 3
## 2                 4
## 3                 1
## 4                 4
## 5                 3
## 6                 3
## 7                 3
## 8                 2
## 9                 3
## 10                3
## 11                1
## 12                3
## 13                4
## 14                2
## 15                3
## 16                1
## 17                1
## 18                3
## 19                3
## 20                4
str(data) #To view type of data
## 'data.frame':    20 obs. of  9 variables:
##  $ Obs              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age              : int  33 27 22 34 24 37 25 31 40 26 ...
##  $ Gender           : chr  "Male" "Male" "Female" "Male" ...
##  $ Status           : chr  "Married" "Married" "Single" "Married" ...
##  $ Education        : chr  "Undergrad" "Undergrad" "Master " "Undergrad" ...
##  $ Smoke            : chr  "No" "Yes" "Yes" "No" ...
##  $ Income           : int  15050 20600 15200 22500 30400 33850 15100 34900 23200 32000 ...
##  $ Life.Satisfaction: int  3 2 4 3 4 3 2 1 1 1 ...
##  $ Job.Satisfaction : int  3 4 1 4 3 3 3 2 3 3 ...

Your task: Use appropriate non-parametric statistics tests to answers the questions with significance level of 0.05.

Research questions:
1) Is the median of age equal to 30 years old? 2) Is the mean income greater than 20000?
3) Is there any difference between income of male and female staff?
4) Is there any difference between life satisfaction level of smoking and non-smoke staff?
5) Is there any difference between income of people with different education background?
6) Does income relate to job satisfaction level?
7) Is marriage status related (or dependent) to life satisfaction level?

Research questions 1:

Is the median of age equal to 30 years old?

summary(data$Age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.00   25.00   30.50   31.55   38.50   44.00
par(mfrow=c(1,2))  #create plots layout with 1 row and 2 columns
hist(data$Age)     #create histogram of age
boxplot(data$Age)  #create boxplot of age

#perform Wilcoxon test for one sample 
wilcox.test(x=data$Age, mu=30, alternative = "two.sided")
## Warning in wilcox.test.default(x = data$Age, mu = 30, alternative =
## "two.sided"): cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data$Age, mu = 30, alternative =
## "two.sided"): cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$Age
## V = 116, p-value = 0.4092
## alternative hypothesis: true location is not equal to 30

Research questions 2:

Is the mean income greater than 20000?

summary(data$Income)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13500   17300   22850   24372   30800   39560
par(mfrow=c(1,2))  #create plots layout with 1 row and 2 columns
hist(data$Income)     #create histogram of age
boxplot(data$Income)  #create boxplot of age

#perform Wilcoxon test for one sample 
wilcox.test(x=data$Income, mu=20000, alternative = "greater")
## 
##  Wilcoxon signed rank exact test
## 
## data:  data$Income
## V = 159, p-value = 0.02203
## alternative hypothesis: true location is greater than 20000

Research question 3:

Is there any difference between mean income of male and female staff?

summary(data$Income[data$Gender=="Male"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13500   17825   23200   25444   33450   39560
summary(data$Income[data$Gender=="Female"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15100   18000   22350   23061   28000   33850
boxplot(data$Income~data$Gender,xlab="Gender",ylab="Income")

#perform Mann-Whitney test for two independent groups
wilcox.test(x=data$Income[data$Gender=="Male"], y=data$Income[data$Gender=="Female"], 
            paired = FALSE, alternative = "two.sided")
## 
##  Wilcoxon rank sum exact test
## 
## data:  data$Income[data$Gender == "Male"] and data$Income[data$Gender == "Female"]
## W = 56, p-value = 0.6556
## alternative hypothesis: true location shift is not equal to 0

Research question 4:

Is there any difference between mean of life satisfaction level of smoking and non-smoke staff?

#install.packages("psych") # you need to install package "psych" prior to use summary by groups
library(psych)
#Find summary of life satisfaction of smoking and non-smoking group!!
describeBy(data$Life.Satisfaction,data$Smoke)
## 
##  Descriptive statistics by group 
## group: No
##    vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 11 2.55 1.04      3    2.56 1.48   1   4     3 -0.11    -1.36 0.31
## ------------------------------------------------------------ 
## group: Yes
##    vars n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 9 2.44 1.13      2    2.44 1.48   1   4     3 0.12    -1.59 0.38
boxplot(data$Life.Satisfaction~data$Smoke,xlab="Smoke",ylab="Life Satisfaction")

#perform Mann-Whitney test for two independent groups
wilcox.test(x=data$Life.Satisfaction[data$Smoke=="No"], y=data$Life.Satisfaction[data$Smoke=="Yes"],
            paired = FALSE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data$Life.Satisfaction[data$Smoke == :
## cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$Life.Satisfaction[data$Smoke == "No"] and data$Life.Satisfaction[data$Smoke == "Yes"]
## W = 52.5, p-value = 0.8441
## alternative hypothesis: true location shift is not equal to 0

Research question 5:

Is there any difference between mean income of people with different education background?

#Find summary of income for each education group!! 
boxplot(data$Income~data$Education,xlab="Education",ylab="Income")

kruskal.test(data$Income ~ data$Education, data = data) 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  data$Income by data$Education
## Kruskal-Wallis chi-squared = 5.6472, df = 3, p-value = 0.1301

Research question 6:

Does income relate to job satisfaction level?

plot(data$Job.Satisfaction,data$Income,xlab="Job satisfaction",ylab="Income")

cor.test( ~ data$Job.Satisfaction + data$Income,
          data=data,method = "spearman",
          continuity = FALSE,conf.level = 0.95,
          alternative = "two.sided")
## Warning in cor.test.default(x = c(3L, 4L, 1L, 4L, 3L, 3L, 3L, 2L, 3L, 3L, :
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  data$Job.Satisfaction and data$Income
## S = 1312.7, p-value = 0.9567
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.01297123

Research question 7:

Is marriage status related (or dependent) to life satisfaction level?

table(data$Status,data$Life.Satisfaction)
##           
##            1 2 3 4
##   Divorced 0 0 2 0
##   Married  2 3 3 2
##   Single   2 3 1 2
chisq.test(data$Status,data$Life.Satisfaction)
## Warning in chisq.test(data$Status, data$Life.Satisfaction): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$Status and data$Life.Satisfaction
## X-squared = 5.8333, df = 6, p-value = 0.4421

Research question 8

Are employees of all types of marriage statuses equal?

freq = table(data$Status)
prob = c(1/3,1/3,1/3)
chisq.test(freq,p=prob)
## 
##  Chi-squared test for given probabilities
## 
## data:  freq
## X-squared = 5.2, df = 2, p-value = 0.07427

#————————————————————————————————
#————————————————————————————————

Practice 2: Music and exercise

A researcher wants to examine whether music has an effect on the perceived psychological effort required to perform an exercise session. To test whether music has an effect on the perceived psychological effort required to perform an exercise session, the researcher recruited 12 runners who each ran three times on a treadmill for 30 minutes. For consistency, the treadmill speed was the same for all three runs. In a random order, each subject ran: (a) listening to no music at all; (b) listening to classical music; and (c) listening to dance music. At the end of each run, subjects were asked to record how hard the running session felt on a scale of 1 to 10, with 1 being easy and 10 extremely hard.

1. Import Practice2_data.csv file

Practice2_Data <- read.csv('/Users/wisuneepuggard/Desktop/LAB208251/Practice2_Data.csv',header=TRUE)
data2 = data.frame(Practice2_Data)
str(data2)
## 'data.frame':    36 obs. of  3 variables:
##  $ Runner: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Type  : chr  "None" "None" "None" "None" ...
##  $ Scale : int  8 7 6 8 5 9 7 8 8 7 ...

2. Explore data

tapply(data2$Scale, data2$Type, summary)
## $Classical
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.000   6.750   7.500   7.417   8.000   9.000 
## 
## $Dance
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     5.0     6.0     6.5     6.5     7.0     8.0 
## 
## $None
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.000   7.000   7.500   7.417   8.000   9.000
boxplot(data2$Scale ~ data2$Type,xlab="Type of music",ylab="Scale")

3. Research questions:
A) Is there a difference in perceived effort based on music type?
B) Is there a difference between None to Classical ?
C) Is there a difference between None to Dance ?
D) Is the Classical higher than Dance?
Your task: Use appropriate non-parametric statistics tests to answers the questions with significance level of 0.05.

#A)
friedman.test(y=data2$Scale,group=data2$Type,blocks = data2$Runner)
## 
##  Friedman rank sum test
## 
## data:  data2$Scale, data2$Type and data2$Runner
## Friedman chi-squared = 7.6, df = 2, p-value = 0.02237
#B) 
wilcox.test(x=data2$Scale[data2$Type=="None"], y=data2$Scale[data2$Type=="Classical"], 
            paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data2$Scale[data2$Type == "None"] and data2$Scale[data2$Type == "Classical"]
## V = 23, p-value = 1
## alternative hypothesis: true location shift is not equal to 0
#C) 
wilcox.test(x=data2$Scale[data2$Type=="None"], y=data2$Scale[data2$Type=="Dance"], 
            paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "None"], y =
## data2$Scale[data2$Type == : cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data2$Scale[data2$Type == "None"] and data2$Scale[data2$Type == "Dance"]
## V = 36, p-value = 0.01038
## alternative hypothesis: true location shift is not equal to 0
#D)
wilcox.test(x=data2$Scale[data2$Type=="Classical"], y=data2$Scale[data2$Type=="Dance"], 
            paired = TRUE, alternative = "two.sided")
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "Classical"], :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = data2$Scale[data2$Type == "Classical"], :
## cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data2$Scale[data2$Type == "Classical"] and data2$Scale[data2$Type == "Dance"]
## V = 24.5, p-value = 0.08461
## alternative hypothesis: true location shift is not equal to 0

To summarize:   From the descriptive statistics, the medians (Q1 to Q3) of perceived effort levels for the no music, classical and dance music running trial were 7.5 (7 to 8), 7.5 (6.25 to 8) and 6.5 (6 to 7), respectively. There was a statistically significant difference in perceived effort depending on which type of music was listened to whilst running (from the Friedman test with T=7.6, p=0.02237 ). The Wilcoxon signed-rank tests was conducted to test the difference between two types of music. There were no significant differences between the no music and classical music running trials (T = 23, p = 1.00) and between the classical and dance music running trials (T=24.5, p = 0.08461). However, there was a statistically significant reduction in perceived effort in the dance music vs no music trial (T=36, p = 0.01038).

—————————————————————————————————–
—————————————————————————————————–

Assignment Lab5

You must submit:

  1. R file with your codes, and

  2. Answer sheet with your handwriting

On Mango, see the deadline there!

Data collected for a sample of 50 in-store transactions during one day in 2019. A store’s manager wants to use this sample data to learn about customer’s behavior. Use the methods of descriptive statistics and nonparametric statistics presented in this module to summarize and analyze the data and comments on your findings. Comments on any finding that appear interesting and of potential value to the stores manager.

Data is provided on Mango Canvas ‘AStore_Data.csv’

Your task: Use appropriate non-parametric statistics tests to answers the research questions with significance level of 0.05.

Research questions:

A. Is the method of payment related to the sales?

B. Is the number of purchased items related to discounted amount?

C. Is the customer’s age related to discounted amount?

D. Is the method of payment related to marital status?

E. Is the number of customers paying by each method comparable?