#install.packages("Stat2Data")
#install.packages("mosaic")
library(mosaic)
library(Stat2Data)
require(mosaic)
# install.packages("Stat2Data")
require(Stat2Data)
0.7 Measuring students. The registrar at a small liberal arts college computes descriptive summaries for all members of the entering class on a regular basis. For example, the mean and standard deviation of the high school GPAs for all entering students in a particular year were 3.16 and 0.5247, respectively. The Mathematics Department is interested in helping all students who want to take mathematics to identify the appropriate course, so they offer a placement exam. A randomly selected subset of students taking this exam during the past decade had an average score of 71.05 with a standard deviation of 8.96.
What is the population of interest to the registrar at this college?
The population of interest to the registrar at this college all the members of the entering class.
Are the descriptive summaries computed by the registrar (3.16 and 0.5247) statistics or parameters? Explain.
They are parameters because it deals with the entire population of the incoming class.
What is the population of interest to the Mathematics Department?
All student that want to take math.
Are the numerical summaries (71.05 and 8.96) statistics or parameters? Explain.
Statistics, because it deals with a randomly selected subset of students.
Hint: this is chapter 0. Don’t feel like you need to do anything too complicated for your “model.” Comparing plots or doing a simple linear regression is fine. 0.15 Statistics students survey. An instructor at a small liberal arts college distributed the data collection card similar to what is shown below on the first day of class. The data for two different sections of the course are shown in the file Day1Survey. Note that the names have not been entered into the dataset.
Data Collection Card
Directions: Please answer each question and return to me.
Your name (as you prefer): _______________ What is your current class standing? _______________ Sex: Male _______________ Female _______________ How many miles (approximately) did you travel to get to campus? _______________ Height (estimated) in inches: _______________ Handedness (Left, Right, Ambidextrous): _______________ How much money, in coins (not bills), do you have with you? $ _______________ Estimate the length of the white string (in inches): _______________ Estimate the length of the black string (in inches): _______________ How much do you expect to read this semester (in pages/week)? _______________ How many hours do you watch TV in a typical week? _______________ What is your resting pulse? _______________ How many text messages have you sent and received in the last 24 hours? _______________ The data for this survey are stored in Day1Survey.
Apply the four-step process to the survey data to address the question: “Is there evidence that the mean resting pulse rate for women is different from the mean resting pulse rate for men?”
Pick another question that interests you from the survey and compare the responses of men and women.
data(Day1Survey)
head(Day1Survey)
## Section Class Sex Distance Height Handedness Coins WhiteString
## 1 1 Senior F 400 62 Right 1.12 42
## 2 1 * F 450 61 Left 29.00 45
## 3 1 Freshman F 3000 61 Right 1.50 22
## 4 1 Freshman M 100 72 Right 0.07 40
## 5 1 N/A F 2000 69 Right 0.12 48
## 6 1 Senior M 500 73 Right 8.00 30
## BlackString Reading TV Pulse Texting
## 1 6 80 3 71 3
## 2 5 100 10 78 100
## 3 4 100 4 80 2
## 4 4 50 25 63 200
## 5 7 200 5 63 100
## 6 8 100 0 56 1
attach(Day1Survey)
The four step process: 1. Choose “Is there evidence that the mean resting pulse rate for women is different from the mean resting pulse rate for men?” 2. Fit ŷ is the fitted/predicted value and ŷ= B_hat0 +B_hat1*x For a 1-unit increase in the predictor we would expect to see a 1 Beta_hat_1 unit change in the response.
femalelm <- lm((Pulse~(Sex=="F"))) #Create a linear model for female pulse
femalelm
##
## Call:
## lm(formula = (Pulse ~ (Sex == "F")))
##
## Coefficients:
## (Intercept) Sex == "F"TRUE
## 66.65 1.17
mean(Pulse~(Sex=="F"))#True is the females' mean pulse and False is the males' mean pulse
## FALSE TRUE
## 66.65385 67.82353
malelm<-lm((Pulse~(Sex=="M"))) # This is a linear model for males pulse just to check that the female is the same.
malelm
##
## Call:
## lm(formula = (Pulse ~ (Sex == "M")))
##
## Coefficients:
## (Intercept) Sex == "M"TRUE
## 67.82 -1.17
The intercept or Beta_hat_0 for females is 66.65 heart beats per minute and the Beta_hat_1 is 1.17. This means that for if the person is female they gain 1.17 heart beats per minute while resting to the intercept of 66.65 heart beats per minute. If they were male/not female they would not gain this 1.17 heart beats per minute and would just have a heart beat of 66.65.
This makes since since the mean pulse for females is 67.82 beats per minute and 66.65 beats per minute is mean pulse for males.
library(skimr)
##
## Attaching package: 'skimr'
## The following object is masked from 'package:mosaic':
##
## n_missing
Day1Survey %>%
group_by(Sex)%>%
skim(Pulse)
## Skim summary statistics
## n obs: 43
## n variables: 13
## group variables: Sex
##
## ── Variable type:integer ──────────────────────────────────────────────────────────────────────────────────────
## Sex variable missing complete n mean sd p0 p25 p50 p75 p100 hist
## F Pulse 0 17 17 67.82 11.38 51 60 72 75 90 ▃▃▁▁▇▂▁▁
## M Pulse 0 26 26 66.65 11.27 48 57 66 72 96 ▅▇▇▇▂▃▁▁
ggplot(data=Day1Survey) + geom_density(aes(x=Pulse, color=Sex))
These graphs show the Pulse of the females and the pulse of the males. It shows what we learned above that the females have a higher mean pulse compare to males.
mean(residuals(femalelm)) #checks to see if the residuals are close to 0
## [1] 1.136042e-16
mean(residuals(malelm)) #checks to see if the residuals are close to 0
## [1] 3.083168e-17
They are very close to 0.
Does resting pulse differ by gender?
No it does not.
t.test(Pulse~Sex)
##
## Welch Two Sample t-test
##
## data: Pulse by Sex
## t = 0.33077, df = 34.12, p-value = 0.7428
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.015993 8.355360
## sample estimates:
## mean in group F mean in group M
## 67.82353 66.65385
Think of another question and answer it here! 1. choose How many hours do you watch TV in a typical week? “Is there evidence that the mean amount of tv time for women is different from the mean amount of tv time for men?”
tvfe<-lm(TV~(Sex=="F"))
tvfe
##
## Call:
## lm(formula = TV ~ (Sex == "F"))
##
## Coefficients:
## (Intercept) Sex == "F"TRUE
## 5.558 -1.881
ggplot(data=Day1Survey) + geom_density(aes(x=TV, color=Sex))
This means that females have an average of 1.881 hours less of tv then males, which is also shown by the graph. 3. Assess
The residuals between each gender should be 0, which was tested below.
mean(residuals(tvfe)) #checks to see if the residuals are close to 0
## [1] 7.47217e-17
tvme<-lm(TV~(Sex=="M"))
mean(residuals(tvme)) #checks to see if the residuals are close to 0
## [1] -2.631056e-16
They are close to 0.
We assess the relationship by using them within a t.test to see if we have to reject the null that they carry the same amount of coins.
t.test(TV~Sex)
##
## Welch Two Sample t-test
##
## data: TV by Sex
## t = -1.3256, df = 35.307, p-value = 0.1935
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.7613613 0.9989179
## sample estimates:
## mean in group F mean in group M
## 3.676471 5.557692
This means that we can’t prove that males and females watch different amounts of TV during a week.