Overview

This lab explores data collected from a random sample of 236 undergraduate students.

Raw Data

The data observes individual undergrads, and contains the following variables:

  • Gender: the gender (M/F) of the student
  • Alcohol: the typical amount of alcoholic beverages consumed per week
  • Height: self reported height, in inches
  • Cheat: Would the student report an instance of cheating on an exam (0=No, 1=Yes)
load("drinking.RData")
x<-data
head(x)
##   Gender Alcohol Height Cheat
## 1 Female      15     64     0
## 2   Male      14     69     0
## 3 Female      NA     66     0
## 4 Female      10     63     0
## 5   Male      30     72     0
## 6 Female      20     67     0

Q1. What are the drinking habits of students at this university? In particular, what is the typical number of drinks a student has during a week? Do the data suggest that drinking is a problem in this university?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   1.000   4.539   7.000  36.000      30

We see that most of the data is clustered around the 0-10 range, and heavily skewed right, with subsequent levels experiencing lower frequencies. This skewedness is evidenced by the mean of 4.5, which is higher than the median of 1.

The data suggests that drinking is a problem at the university. Specifically, amongst the more heavy drinkers. Given the Quartile 1 value of 0, it is estimated that around 25% of students don’t drink at all. Even then, 50% drink at most once per week. However, in the higher ranges, we see that 25% of students drink between 7 and 36 alcoholic beverages per week. This is an issue, but not for the entire university population, only a subset of it.

Q2. One of the statistics professors at this university uses the honor system when giving exams. If there were cheating going on during her exams, would the professor be likely to know about it?

tbl= table(x$Cheat)
names(tbl) <- c("% Would not report cheating", "% Would report cheating")
#100*tbl/sum(tbl)
cheat <- tbl[1] / (tbl[1]+tbl[2])
no_cheat <- tbl[2]/(tbl[1]+tbl[2])
pie(tbl, labels=c(paste(round(cheat*100), "%  Would not report cheating"), 
                  paste(round(no_cheat*100), "%  Would report cheating")),
    main="Would Students Report Cheating?")

round(100*tbl/sum(tbl), 2)
## % Would not report cheating     % Would report cheating 
##                       91.53                        8.47

The data shows that around 92% of students would not report cheating during an exam. This is reason enough for the university to reconsider any honor system rules in place. A professor likely wouldn’t know about it, unless almost every non reporter cheated, in which case, there would likely be at least one instance of a reporter witnessing, and proceeding to report a case of cheating.