ExerciseLevel: Whether the student exercises more than eight hours per week (HighExcs) or less than eight hours per week (LowExcs)
You will analyze these data through the lens of a data scientist working for the campus store in which the Student Survey data was collected.
Your team is thinking about running a TV ad marketing to college students and they want to know whether running a TV advertisement is worthwhile. If it is, they also want to know what their discount strategy should be. To answer these questions, you will use the Student Survey data set to determine whether it is worthwhile to run a TV ad, whether your store should advertise a discount on sports gear or science books, and whether there are any confounding factors that might influence the association between how much TV students watch and their interests. You will first determine the percentage of students in the HighTV category, then examine the association between TV level and preferred Award, and finally examine whether the exercise level of those students is a confounding factor.
The data set you will work with is loaded in the code chunk below. The variables in this version of the data set are TV (numerical), ExerciseLevel (categorical: HighExcs and LowExcs), TVLevel (categorical: HighTV and LowTV), and Award (categorical: Nobel or Olympic). Use the following command to load the data frame in R.
X1 <- c(1,4,5,7,8,10,13,14,16,17,18,19,20,22,23,24,25,28,32,33,
34,35,44,45,46,47,48,49,50,51,52,53,54,55,58,60,
62,63,64,66,67,68,69,70,76,77,78,79,84,85)
Year <- c("Senior","Junior","Sophomore","FirstYear",
"Sophomore","FirstYear","Sophomore","Junior",
"FirstYear","FirstYear","Sophomore","Sophomore",
"Sophomore","Sophomore","FirstYear","Sophomore",
"Sophomore","Junior","Sophomore","FirstYear",
"Senior","Junior","Sophomore","FirstYear",
"Sophomore","FirstYear","Sophomore","Junior",
"FirstYear","FirstYear","Sophomore","Sophomore",
"Sophomore","Sophomore","FirstYear","Sophomore",
"Sophomore","Junior","Sophomore","FirstYear",
"Sophomore","Sophomore",
"Sophomore","Sophomore","FirstYear","Sophomore",
"Sophomore","Junior","Sophomore","FirstYear")
Sex <- c("M","M","F","F","M","F","M","F","F","F","M",
"F","F","F","M","M","M","M","F","M","M","M","M","M",
"M","M","M","M","F","F","F","F","M",
"F","M","M","M","M","F","F",
"F","F","M",
"F","M","M","M","M","F","F")
Smoke <- c("Non-Smoker","Non-Smoker","Non-Smoker","Non-Smoker",
"Non-Smoker",
"Non-Smoker","Non-Smoker","Non-Smoker",
"Non-Smoker","Non-Smoker","Non-Smoker","Non-Smoker",
"Non-Smoker",
"Non-Smoker","Non-Smoker","Non-Smoker","Non-Smoker",
"Non-Smoker","Non-Smoker","Non-Smoker","Smoker","Smoker",
"Smoker","Smoker",
"Smoker","Smoker","Smoker","Smoker","Smoker","Smoker","Smoker",
"Smoker","Smoker","Smoker","Smoker",
"Smoker","Smoker","Smoker","Smoker","Smoker",
"Smoker","Smoker","Smoker","Smoker",
"Non-Smoker","Non-Smoker","Non-Smoker","Non-Smoker","Smoker","Smoker")
Award <- c("Olympic","Nobel","Olympic","Olympic","Olympic","Nobel",
"Nobel","Olympic","Nobel","Olympic","Olympic","Olympic",
"Olympic","Olympic","Olympic","Olympic","Olympic","Olympic",
"Nobel","Olympic","Olympic","Olympic","Olympic","Olympic","Nobel",
"Nobel","Olympic","Olympic","Olympic","Olympic","Nobel","Nobel","Olympic",
"Olympic","Olympic","Olympic","Olympic","Nobel","Nobel","Nobel","Olympic", "Nobel","Olympic","Nobel","Olympic","Olympic",
"Olympic","Nobel","Nobel","Nobel")
TV <- c(1,1,3,10,8,1,8,1,2,15,3,1,2,2,20,10,3,2,2,20,10,2,8,2,14,20,20,10,3,1,14,0,20,2,3,1,7,3,12,3,20,8,0,12,3,3,8,8,3,2)
ExerciseLevel <- c("HighExcs","HighExcs","LowExcs","HighExcs","HighExcs",
"HighExcs","HighExcs","HighExcs","LowExcs","LowExcs",
"HighExcs","HighExcs","LowExcs","HighExcs","HighExcs",
"HighExcs","HighExcs","HighExcs","LowExcs","LowExcs",
"HighExcs","HighExcs","LowExcs","HighExcs","HighExcs",
"HighExcs","HighExcs","HighExcs","LowExcs","LowExcs",
"HighExcs","HighExcs","LowExcs","HighExcs","HighExcs",
"HighExcs","HighExcs","HighExcs","LowExcs","LowExcs",
"HighExcs","HighExcs","LowExcs","HighExcs","HighExcs",
"HighExcs","HighExcs","HighExcs","LowExcs","LowExcs")
TVLevel <- c("LowTV","LowTV","LowTV","HighTV","HighTV","LowTV",
"LowTV","HighTV","LowTV","LowTV",
"LowTV","LowTV","LowTV","HighTV","HighTV","LowTV",
"LowTV","HighTV","LowTV","LowTV",
"LowTV","LowTV","LowTV","HighTV","HighTV","LowTV",
"LowTV","HighTV","LowTV","LowTV",
"LowTV","LowTV","LowTV","HighTV","HighTV","LowTV",
"LowTV","HighTV","LowTV","LowTV",
"LowTV","LowTV","LowTV","HighTV","HighTV","LowTV",
"LowTV","HighTV","LowTV","LowTV")
dat <- data.frame(X1,Year,Sex, Smoke, Award, TV, ExerciseLevel, TVLevel)
crimson = '#b31b1b' # crimson
lightGray = '#cecece' # lightGray
darkGray = '#606366' # darkGray
skyBlue = '#92b2c4' # skyblue
gold = '#fbb040' # gold
ecBlack = '#393f47' # ecBlack
Your team assumes that if at least 40% of the students are in the HighTV category, a televised ad campaign would be worthwhile. Determine the proportion of students in the HighTV category.
TvFactor <- factor(dat$TVLevel,
levels = c("0","1"),
labels = c("LowTV","HighTV"))
dat$TvFactor <- TvFactor
dat$TvFactor <- dat$TVLevel
head(dat, n = 10)
## X1 Year Sex Smoke Award TV ExerciseLevel TVLevel TvFactor
## 1 1 Senior M Non-Smoker Olympic 1 HighExcs LowTV LowTV
## 2 4 Junior M Non-Smoker Nobel 1 HighExcs LowTV LowTV
## 3 5 Sophomore F Non-Smoker Olympic 3 LowExcs LowTV LowTV
## 4 7 FirstYear F Non-Smoker Olympic 10 HighExcs HighTV HighTV
## 5 8 Sophomore M Non-Smoker Olympic 8 HighExcs HighTV HighTV
## 6 10 FirstYear F Non-Smoker Nobel 1 HighExcs LowTV LowTV
## 7 13 Sophomore M Non-Smoker Nobel 8 HighExcs LowTV LowTV
## 8 14 Junior F Non-Smoker Olympic 1 HighExcs HighTV HighTV
## 9 16 FirstYear F Non-Smoker Nobel 2 LowExcs LowTV LowTV
## 10 17 FirstYear F Non-Smoker Olympic 15 LowExcs LowTV LowTV
table(head(dat$TVLevel, n=11))
##
## HighTV LowTV
## 3 8
4 / (4 + 7)
## [1] 0.3636364
7 / (4 + 7)
## [1] 0.6363636
#or we can do the similar thing by prop cmmnd
prop.table(table(head(dat$TVLevel,n = 11)))
##
## HighTV LowTV
## 0.2727273 0.7272727
#the proportion of HighTV is 0.2727273
Visualize the proportion of students in each TV level with a barplot. Based on your results, should your team run a televised ad campaign?
tbl.dat <- table(dat$TVLevel)
tbl.dat
##
## HighTV LowTV
## 15 35
prop.table(tbl.dat)
##
## HighTV LowTV
## 0.3 0.7
# We are going to give all the amount of
#the data to a variable names counts
counts = table(dat$TVLevel)
head(counts, n = 10)
##
## HighTV LowTV
## 15 35
par(mar=2+c(5.1,4.1,4.1,2.1))
barplot(counts,
main = "Proportion of studentn in Each TV Level",
col = crimson,
ylab = "HighTV",
xlab = "LowTV?",
cex.axis = 1.5,
cex.names = 1.5,
cex.lab = 1.5,
cex.main = 1.5)
Your team assumes that if more students who watch a lot of TV are interested in Nobel Prizes, they will prefer science books. In contrast, your team assumes that if more students who watch a lot of TV are interested in the Olympics, they will prefer sports gear.
Create a table that shows the proportions of HighTV and LowTV students that prefer each award type. Your team will use this table to decide whether to run a sale on science books or sports gear based on their assumptions, above.
table(dat$TVLevel, dat$Award)
##
## Nobel Olympic
## HighTV 4 11
## LowTV 13 22
#After this, we are going to show this, in row axis
prop.table(table(dat$TVLevel, dat$Award), 1)
##
## Nobel Olympic
## HighTV 0.2666667 0.7333333
## LowTV 0.3714286 0.6285714
tbl <- table(dat$TVLevel, dat$Award)
tbl
##
## Nobel Olympic
## HighTV 4 11
## LowTV 13 22
t(tbl)
##
## HighTV LowTV
## Nobel 4 13
## Olympic 11 22
Create a stacked barplot to present the results of the previous analysis to your team. Recall that your team assumes that if more students who watch a lot of TV are interested in Nobel Prizes, they will prefer science books, whereas if more students who watch a lot of TV are interested in the Olympics, they will prefer sports gear.
Based on your results, should your team discount science books or sports gear?
barplot(tbl,
main="Proportion Of Stuent's Choice Based On Award Types",
cex.names = 1.5, col = c("red", "gray"),
legend.text=c("HighTV","LowTV"),
args.legend=list(x="topleft",cex=1))
TV ads are expensive, so before your team runs the campaign you decide to check for confounding factors. Your team has assumed that students who prefer the Olympics will be interested in sports gear, but the amount of exercise students do could influence the association between the award a student prefers and the amount of TV they watch. For instance, if the students who prefer the Olympics and watch more TV are primarily from the HighExcs group, the team would consider featuring exercise gear instead of sports gear in the TV ad.
Check whether the difference in the level of TV watched by students who are more interested in different awards is influenced by their level of exercise. Summarize this information by creating a table of proportions.
TvFactor <- factor(dat$TVLevel,
levels = c("0","1"),
labels = c("LowTV","HighTV"))
dat$TvFactor <- TvFactor
dat$TvFactor <- dat$TVLevel
head(dat, n = 10)
## X1 Year Sex Smoke Award TV ExerciseLevel TVLevel TvFactor
## 1 1 Senior M Non-Smoker Olympic 1 HighExcs LowTV LowTV
## 2 4 Junior M Non-Smoker Nobel 1 HighExcs LowTV LowTV
## 3 5 Sophomore F Non-Smoker Olympic 3 LowExcs LowTV LowTV
## 4 7 FirstYear F Non-Smoker Olympic 10 HighExcs HighTV HighTV
## 5 8 Sophomore M Non-Smoker Olympic 8 HighExcs HighTV HighTV
## 6 10 FirstYear F Non-Smoker Nobel 1 HighExcs LowTV LowTV
## 7 13 Sophomore M Non-Smoker Nobel 8 HighExcs LowTV LowTV
## 8 14 Junior F Non-Smoker Olympic 1 HighExcs HighTV HighTV
## 9 16 FirstYear F Non-Smoker Nobel 2 LowExcs LowTV LowTV
## 10 17 FirstYear F Non-Smoker Olympic 15 LowExcs LowTV LowTV
dat$Exlvl=ifelse(dat$ExerciseLevel=="HighExcs", 1, 0)
prop = aggregate(Exlvl ~ ExerciseLevel + Award,
FUN = mean,
data = dat)
prop
## ExerciseLevel Award Exlvl
## 1 HighExcs Nobel 1
## 2 LowExcs Nobel 0
## 3 HighExcs Olympic 1
## 4 LowExcs Olympic 0
Create a side-by-side barplot that shows how student exercise levels influence the association between a student’s preferred award and the amount of TV they watch.
library(lattice) # Load the lattice package so you can make a side-by-side barplot
barchart(Exlvl~ExerciseLevel,
groups=Award,
dat=prop,
pch=19, cex=1.5, ylim=c(0,1),
main="Proportion Of High & Low Excs Based On The Award Types",
par.settings = list(superpose.polygon =
list(col=c(crimson, darkGray))),
scales=list(cex=1.5),
xlab=list(label="HighExcs", cex = 1.5),
ylab=list(label="LowExcs", cex = 1.5),
auto.key=list(space="right",cex=1.5))
prop.Award=aggregate(Exlvl ~ Award,
FUN=mean, data=dat)
prop.Award
## Award Exlvl
## 1 Nobel 0.6470588
## 2 Olympic 0.7272727
Based on the side-by-side barplot and the table you made to understand how exercise level influences the relationship between award of interest and TV level, is exercise level a confounding factor? Why or why not?
Based on the bar chart information that is shown on the screen, both parties are on the same level. This means we need to see the relationship between the exercise level and whether it’s in the HighExcs or LowExcs. To bring an example, it gives us the privilege to assess the observed and unobserved confounders.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.