In the following questions your task is to provide the appropriate, simplest, and most efficient analysis that could be performed for the scenario given (worth 2% for all).
1. You are interested in testing whether a coin is fair or not (i.e., lands on each side equally). What test could we use?
Chi square GOF – tests if the data corresponds to a population with specified set of probabilities. Expected frequencies (a coin should have a 1/2 probability for each side) are compared with actual (unfair coin might not return same probability). The coin is determined to be fair if the difference between expected and actual is about the same.
2. You are asked to find out if the proportion of males and females differ by academic majors, classified categorically as: Art, History, Science, or Social Science. What test would you use?
ANOVA: tests the significance of group differences between two or more groups when the IVs (majors) have two or more categories and DVs (enjoyment) are continuous.
3. You have student’s math scores (a continuous variable) and have been asked to test whether there is a difference between test scores for students in two different sections of the same course. Assuming equal variances what test would you use?
Independent T-tests: tests whether two groups (student sections) differ based on a variable of interest (math scores).
4. You are testing the improvement of individuals health following treatment. You have a continuous score of health before treatment began and after it was concluded for each individual. What test would you use?
Paired sample T-test: tests whether a DV changes over time (before / after treatment).
5. You are interested in which of three industrial milling machines provides the fastest completion time for a standard task. What test(s) would you use?
ANOVA: tests the significance of group differences between two or more groups when the IVs (completion time for each machine) have two or more categories and DVs (time) are continuous.
6. You are asked to see how several factors (all categorical) influence quality perception in consumers (measured as a continuous variable). What test(s) would you use?
ANOVA: tests the significance of group differences between two or more groups when the IVs (several factors) have two or more categories and DVs (perception) are continuous.
7. You are interested in predicting income (a continuous variable) by age, sex, years of education, and years of industry experience (mixtures of continuous and categorical predictors) What test would you use?
Multiple linear regression: tests the effect that multiple variables (IV) have over a DV known as “response variable”. A regression will provide a “fit” that reduces the amount of unexpected variance based on the quantified relationship that each set of IV (predictors) have on the IV. ___
library(readxl)
## Warning: package 'readxl' was built under R version 3.4.4
AirTraffic <- read_excel("C:/Users/Enrique/OneDrive/Documents/HU/ANLY510_Principles7Applicaitons02/Data/AirTraffic.xlsx")
AirTraffic$civilianormilitary=replace(AirTraffic$civilianormilitary,
AirTraffic$civilianormilitary==0,"Civilian")
AirTraffic$civilianormilitary=replace(AirTraffic$civilianormilitary,
AirTraffic$civilianormilitary==1,"Military")
str(AirTraffic)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3000 obs. of 8 variables:
## $ subject : num 1 1 1 2 2 2 3 3 3 4 ...
## $ age : num 32 32 32 31 31 31 36 36 36 32 ...
## $ expierence : num 7 7 7 6 6 6 20 20 20 7 ...
## $ simulator : chr "C1" "C2" "NEW" "C1" ...
## $ Reaction : num 11.1 18.58 6.76 12.18 15.38 ...
## $ gender : chr "Female" "Female" "Female" "Female" ...
## $ airport : chr "PDX" "PDX" "PDX" "PDX" ...
## $ civilianormilitary: chr "Civilian" "Civilian" "Civilian" "Civilian" ...
Lets take a quick look at the reaction means
tapply(X = AirTraffic$Reaction,INDEX = AirTraffic$simulator,FUN = mean)
## C1 C2 NEW
## 7.268982 16.507990 5.535704
tapply(X = AirTraffic$Reaction,INDEX = AirTraffic$gender, FUN= mean)
## Female Male
## 9.817140 9.727163
tapply(AirTraffic$Reaction, AirTraffic$civilianormilitary, FUN= mean)
## Civilian Military
## 10.479070 8.792932
The means show to be quite different across each simulator with New having the lowest. Civilians seem to have a slightly faster reaction than military. Means are very similar between males and females.
Lets seee how the means for each category compare for each simulator separately:
Air_New = subset(AirTraffic, AirTraffic$simulator=="NEW",
select = c("subject","age","expierence",
"Reaction","gender","airport","civilianormilitary"))
Air_c1 = subset(AirTraffic, AirTraffic$simulator=="C1",
select = c("subject","age","expierence",
"Reaction","gender","airport","civilianormilitary"))
Air_c2 = subset(AirTraffic, AirTraffic$simulator=="C2",
select = c("subject","age","expierence",
"Reaction","gender","airport","civilianormilitary"))
tapply(Air_New$Reaction,Air_New$gender,FUN = mean)
## Female Male
## 5.664256 5.414155
tapply(Air_c1$Reaction,Air_c1$gender,FUN = mean)
## Female Male
## 7.318357 7.222297
tapply(Air_c2$Reaction,Air_c2$gender,FUN = mean)
## Female Male
## 16.46881 16.54504
#Civilian and military - New has the lowest reaction mean by civilian or military
tapply(Air_New$Reaction,Air_New$civilianormilitary,FUN = mean)
## Civilian Military
## 6.176517 4.650772
tapply(Air_c1$Reaction,Air_c1$civilianormilitary,FUN = mean)
## Civilian Military
## 8.006233 6.250874
tapply(Air_c2$Reaction,Air_c2$civilianormilitary,FUN = mean)
## Civilian Military
## 17.25446 15.47715
Gender - New has the lowest reaction mean by gender.
Civilian and military - New has the lowest reaction mean by civilian or military
__ Now that we know how the data differs by comparing the means, lets plot the data to display the level of association:__
boxplot(AirTraffic$Reaction~AirTraffic$simulator, main="Reaction per Simulator",
xlab="Simulator", ylab="Reaction time", col=c(2,3,5))
interaction.plot(AirTraffic$gender,AirTraffic$simulator,AirTraffic$Reaction,
main="Interaction between simulator & gender",
ylab="Mean of Reaction", xlab="Gender")
interaction.plot(AirTraffic$expierence,AirTraffic$simulator,AirTraffic$Reaction,
main="Interaction between simulator & Experience",
ylab="Mean of Reaction", xlab="Experience")
interaction.plot(AirTraffic$age,AirTraffic$simulator,AirTraffic$Reaction,
main="Interaction between simulator & age",
ylab="Mean of Reaction", xlab="Age")
Observations:
Simulation results don’t change based on gender across all three simualtors. Only C2 shows a significant decrease in reaction time for participants whom have more experince. *Reaction time increased significantly for participants above 45 years old. This holds true for all three simulators.
Plot distribution of reaction time and test for skewness, normality, and variance.
library(moments)
plot(density(AirTraffic$Reaction), col=2, lwd=2, lty="dashed")
agostino.test(AirTraffic$Reaction)
##
## D'Agostino skewness test
##
## data: AirTraffic$Reaction
## skew = 0.49073, z = 10.43100, p-value < 2.2e-16
## alternative hypothesis: data have a skewness
shapiro.test(AirTraffic$Reaction)
##
## Shapiro-Wilk normality test
##
## data: AirTraffic$Reaction
## W = 0.94254, p-value < 2.2e-16
Failed for both skewness and normality.
Fit ANOVA model to confirm reaction times are in fact different among simulators.
Air_model=anova(lm(AirTraffic$Reaction~AirTraffic$simulator))
Conclusion:
Reject HO of equal variances across groups. Anova confirmed that reaction times highly differ across simulators. The means show to be quite different across each simulator with New having the lowest and C2 the highest. Prescriptive analyses clearly show the new simulator as more effective in terms of response times compared to C1 and c2 across all different demographics.
Exam2Q1.xlsx contains data from a study examining the effects of mood on the propensity to accept gambles relative to sure losses or gains. Specifically, participants were given 100$ and then given a choice between (gain frame: keeping 25$ of the 100$ for sure vs. 50% chance to win 100$ more or lose \(50 of the 100\)) and (loss frame: lose 75$ of the 100$ for sure vs. 50% chance to win 100$ or lose \(50 of the 100\)). Each participant encountered each frame (within-subjects) once in random order. In addition, before making their selections they watched videos (between subjects) selected to induce sadness (video of a child crying), happiness (video of a family reuniting), or neutrality (nature video). Appropriately analyze the data and report your findings in a formal summary (worth 4%).
library(readxl)
MoodEffects <- read_excel("C:/Users/Enrique/OneDrive/Documents/HU/ANLY510_Principles7Applicaitons02/Data/MoodEffects.xlsx")
Change the names and add factors for easier analysis.
MoodEffects$mood=replace(MoodEffects$mood,MoodEffects$mood==1,"happy")
MoodEffects$mood=replace(MoodEffects$mood,MoodEffects$mood==3,"normal")
MoodEffects$mood=replace(MoodEffects$mood,MoodEffects$mood==2,"sad")
MoodEffects$mood=factor(x = MoodEffects$mood,levels = c("happy","normal","sad"))
MoodEffects$lossorgain=factor(x = MoodEffects$lossorgain,levels = c("gain","loss"))
str(MoodEffects)
## Classes 'tbl_df', 'tbl' and 'data.frame': 182 obs. of 4 variables:
## $ subject : num 1 2 3 4 5 6 7 8 9 10 ...
## $ mood : Factor w/ 3 levels "happy","normal",..: 1 3 1 3 1 3 1 3 1 3 ...
## $ acceptancerate: num 0.2812 0.2812 0.2188 0.0312 0.0625 ...
## $ lossorgain : Factor w/ 2 levels "gain","loss": 1 1 1 1 1 1 1 1 1 1 ...
Visualize data to identify any associations prior fitting the model
interaction.plot(MoodEffects$mood,MoodEffects$lossorgain,MoodEffects$acceptancerate,
main="Interaction: Acceptance & Mood", xlab="Mood", ylab="Acceptance Rate")
interaction.plot(MoodEffects$lossorgain,MoodEffects$mood,MoodEffects$acceptancerate,
main="Interaction: Acceptance & Gain-loss", xlab="Mood", ylab="Acceptance Rate")
Check for skewness and normality.
plot(density(MoodEffects$acceptancerate))
agostino.test(MoodEffects$acceptancerate)
##
## D'Agostino skewness test
##
## data: MoodEffects$acceptancerate
## skew = 0.89965, z = 4.49720, p-value = 6.886e-06
## alternative hypothesis: data have a skewness
shapiro.test(MoodEffects$acceptancerate)
##
## Shapiro-Wilk normality test
##
## data: MoodEffects$acceptancerate
## W = 0.8899, p-value = 2.465e-10
Data failed for normality and skewness.
Fit a logistic regression model to display the effects of mood change on acceptance rate, relative to loss or gain.
attach(MoodEffects)
effectmodel=glm(acceptancerate~mood*lossorgain,data = MoodEffects)
anova(effectmodel, test="Chisq")
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: acceptancerate
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 181 14.1948
## mood 2 1.0078 179 13.1870 8.714e-05 ***
## lossorgain 1 3.5031 178 9.6839 7.538e-16 ***
## mood:lossorgain 2 0.1968 176 9.4871 0.1612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(lme4)
## Warning: package 'lme4' was built under R version 3.4.4
## Loading required package: Matrix
effectmodel=lmer(acceptancerate~mood+(1|lossorgain),data= MoodEffects)
summary(effectmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: acceptancerate ~ mood + (1 | lossorgain)
## Data: MoodEffects
##
## REML criterion at convergence: 3.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2721 -0.6048 -0.1285 0.5681 2.7552
##
## Random effects:
## Groups Name Variance Std.Dev.
## lossorgain (Intercept) 0.0379 0.1947
## Residual 0.0544 0.2332
## Number of obs: 182, groups: lossorgain, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.39338 0.14053 2.799
## moodnormal -0.11574 0.04297 -2.694
## moodsad -0.17262 0.04096 -4.214
##
## Correlation of Fixed Effects:
## (Intr) mdnrml
## moodnormal -0.132
## moodsad -0.139 0.455
Interpretation:
Acceptance rate is 39% more likely to increase if participants are happy taking into account wins/losses.When mood changes from happy to normal and from normal to sad, acceptance rate drops by 11% and 17% respectively.
Verify results of model by making sure residuals are normally distributed,fall close to 0, without patterns.
qqnorm(resid(effectmodel))
qqline(resid(effectmodel))
plot(resid(effectmodel))
Residuals do not show any patterns and are mostly close to 0, thus validating the results of our model.Rsually, non-random patterns in residuals indicate lack of fit from predictors.
Plot effects
library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(effectmodel, partial.residuals = T))
Conclusion:
Effect plots of the model confirms results of model and aligns with previous visualizations by displaying a gradual drop in acceptance rate as mood changes from happy to sad.
Casinos should focus on keepin customers’ mood at happy or normal levels to maintain acceptance rates high at all times, regardless of wins and losses.