library(ggplot2)
library(dplyr)
it is the chance of the observed value occurring
A p value cannot be the chance that a hypothesis being true because the definition of a p value is the chance of observed data occurring under the assumption that the null hypothesis is correct
we need to assume that the null hypothesis is true because if not, then we can’t reject it based on probability
true
False, the value 0.1551 from the above output represents the standard error of the y variable
the t value of 8.838 means that the observed mean is 8 times bigger or different than the calculated mean under the null hypothesis
False
The null hypothesis associated with the p-value; 1.03*10^-11, from the above output is not equal to 1.3712
1.475
1-pnorm(0)
## [1] 0.5
0.5
that the null hypothesis cannot be ruled out.
College: Name of schoolGradRate: College graduation rate (as a value from 0 to
100)SFRatio: Student-to-faculty ratioAdmisRate: Percentage of applicants accepted (as a
value from 0 to 100)FacultyPhD: Percentage of faculty with a PhD (as a
value from 0 to 100)Type: School type (Private or Public)Region: Location of school: Midwest, NorthEast, South,
or WestState: State in which the school is locatedMathSAT: Average Math SAT score of entering
first-yearsVerbalSAT: Average Verbal SAT score of entering
first-yearsACT: Average ACT score of entering first-yearsUSNews = read.csv( 'https://raw.githubusercontent.com/vittorioaddona/data/main/USNews.csv' )
GradRate, and
print out its summary. What are the hypotheses associated
with the p-value on the (Intercept) coefficient?M12=lm(formula=GradRate~1,data=USNews)
M12
##
## Call:
## lm(formula = GradRate ~ 1, data = USNews)
##
## Coefficients:
## (Intercept)
## 60.53
summary(M12)
##
## Call:
## lm(formula = GradRate ~ 1, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.529 -12.529 -0.529 13.471 39.471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.5287 0.5522 109.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.86 on 1166 degrees of freedom
the null hypothesis is equal to 0.00758 and the alternate hypothesis
is not equal to 0.00758
(Intercept) coefficient is so tiny.because the estimated mean is so much bigger than the hypothesized mean that assuming the null hypothesis is correct, the chance of getting the estimated coefficient under that null hypothesis is incredibly low
yes
USNews %>% summarize(mean(GradRate))
## mean(GradRate)
## 1 60.52871
1-pnorm(109.6)
## [1] 0
summary(M12)
##
## Call:
## lm(formula = GradRate ~ 1, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.529 -12.529 -0.529 13.471 39.471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.5287 0.5522 109.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.86 on 1166 degrees of freedom
the p value for this data is 0 according to r meaning that we can reject the null hypothesis, thus we can conclude that the graduation rate is greater than 60%
GradRate and
SFRatio? Fit an appropriate model to answer this question,
and fill in the following pieces en route to your ultimate
conclusion:M13=lm(formula=GradRate~SFRatio,data=USNews)
M13
##
## Call:
## lm(formula = GradRate ~ SFRatio, data = USNews)
##
## Coefficients:
## (Intercept) SFRatio
## 80.272 -1.338
summary(M13)
##
## Call:
## lm(formula = GradRate ~ SFRatio, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.419 -12.085 0.088 12.084 77.564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.2721 1.6813 47.74 <2e-16 ***
## SFRatio -1.3381 0.1084 -12.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.75 on 1165 degrees of freedom
## Multiple R-squared: 0.1157, Adjusted R-squared: 0.115
## F-statistic: 152.5 on 1 and 1165 DF, p-value: < 2.2e-16
the null hypothesis is that the coefficient for SFRatio will be equal
to 0, the alternate hypothesis is that that coefficient will not be
equal to 0
does a horizontal or non horizontal trend line best define the
relationship between GradRate and SFRatio?
35.39
less than 2*10^-16
there is a relationship between SFRatio and GradRate
GradRate and
SFRatio after controlling for Type? Fit an
appropriate model to answer this question, and fill in the following
pieces en route to your ultimate conclusion:M14=lm(formula=GradRate~SFRatio+Type,data=USNews)
M14
##
## Call:
## lm(formula = GradRate ~ SFRatio + Type, data = USNews)
##
## Coefficients:
## (Intercept) SFRatio TypePublic
## 76.882 -0.798 -12.781
summary(M14)
##
## Call:
## lm(formula = GradRate ~ SFRatio + Type, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.024 -11.629 0.541 12.141 52.100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.8817 1.6255 47.298 < 2e-16 ***
## SFRatio -0.7980 0.1136 -7.026 3.62e-12 ***
## TypePublic -12.7811 1.1356 -11.255 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.86 on 1164 degrees of freedom
## Multiple R-squared: 0.2025, Adjusted R-squared: 0.2011
## F-statistic: 147.8 on 2 and 1164 DF, p-value: < 2.2e-16
confint(M14)
## 2.5 % 97.5 %
## (Intercept) 73.692546 80.0708959
## SFRatio -1.020902 -0.5751716
## TypePublic -15.009076 -10.5530809
the null hypothesis is that the SFRatio coefficient is equal to 0,
the alternate hypothesis is that the SFRatio coefficient is not equal to
0
which best represents the data, a horizontal line or a non horizontal
line
40.272
3.62*10^-12
because the p value for SFRatio is very small, we can reject it as not fitting the data, thus we can conclude that even when controlling for type, there is a relationship between SFRatio and GradRate
GradRate and
Type after controlling for SFRatio? Fit an
appropriate model to answer this question, and fill in the following
pieces en route to your ultimate conclusion:M15=lm(formula=GradRate~Type+SFRatio,data=USNews)
M15
##
## Call:
## lm(formula = GradRate ~ Type + SFRatio, data = USNews)
##
## Coefficients:
## (Intercept) TypePublic SFRatio
## 76.882 -12.781 -0.798
summary(M15)
##
## Call:
## lm(formula = GradRate ~ Type + SFRatio, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.024 -11.629 0.541 12.141 52.100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.8817 1.6255 47.298 < 2e-16 ***
## TypePublic -12.7811 1.1356 -11.255 < 2e-16 ***
## SFRatio -0.7980 0.1136 -7.026 3.62e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.86 on 1164 degrees of freedom
## Multiple R-squared: 0.2025, Adjusted R-squared: 0.2011
## F-statistic: 147.8 on 2 and 1164 DF, p-value: < 2.2e-16
the null hypothesis is that the type coefficient is equal to 0, the
alternate hypothesis is that the type coefficient is not equal to one
does a horizontal or non horizontal line best represent the
relationship between type and GradRate.
36.043
less than 2*10^-16
1-pnorm(64.1006)
## [1] 0
since the p value is so low, the null hypothesis can be rejected meaning that there is a relationship between Type and GradRate when controlling for SFRatio
summary output of the following three
models:GradRate ~ VerbalSATGradRate ~ MathSATGradRate ~ MathSAT +
VerbalSATM16=lm(formula=GradRate~VerbalSAT,data=USNews)
M16
##
## Call:
## lm(formula = GradRate ~ VerbalSAT, data = USNews)
##
## Coefficients:
## (Intercept) VerbalSAT
## -26.2801 0.1907
summary(M16)
##
## Call:
## lm(formula = GradRate ~ VerbalSAT, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.632 -9.441 -0.008 9.515 44.196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.280068 4.504936 -5.834 8.26e-09 ***
## VerbalSAT 0.190688 0.009641 19.778 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.88 on 705 degrees of freedom
## (460 observations deleted due to missingness)
## Multiple R-squared: 0.3569, Adjusted R-squared: 0.3559
## F-statistic: 391.2 on 1 and 705 DF, p-value: < 2.2e-16
M17=lm(formula=GradRate~MathSAT,data=USNews)
M17
##
## Call:
## lm(formula = GradRate ~ MathSAT, data = USNews)
##
## Coefficients:
## (Intercept) MathSAT
## -18.2299 0.1574
summary(M17)
##
## Call:
## lm(formula = GradRate ~ MathSAT, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.897 -10.146 -0.544 9.887 49.992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18.229886 4.419489 -4.125 4.15e-05 ***
## MathSAT 0.157401 0.008583 18.339 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.27 on 705 degrees of freedom
## (460 observations deleted due to missingness)
## Multiple R-squared: 0.323, Adjusted R-squared: 0.322
## F-statistic: 336.3 on 1 and 705 DF, p-value: < 2.2e-16
M18=lm(formula=GradRate~MathSAT+VerbalSAT,data=USNews)
M18
##
## Call:
## lm(formula = GradRate ~ MathSAT + VerbalSAT, data = USNews)
##
## Coefficients:
## (Intercept) MathSAT VerbalSAT
## -26.88887 0.03306 0.15559
summary(M18)
##
## Call:
## lm(formula = GradRate ~ MathSAT + VerbalSAT, data = USNews)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.308 -9.251 -0.082 9.198 44.823
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.88887 4.51785 -5.952 4.18e-09 ***
## MathSAT 0.03306 0.02145 1.541 0.124
## VerbalSAT 0.15559 0.02472 6.293 5.46e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.87 on 704 degrees of freedom
## (460 observations deleted due to missingness)
## Multiple R-squared: 0.359, Adjusted R-squared: 0.3572
## F-statistic: 197.2 on 2 and 704 DF, p-value: < 2.2e-16
MathSAT is a confounding
variable in the relationship between GradRate and
VerbalSAT? Briefly explain.No, because controlling for MathSAT does not change the sign of the
coefficient for VerbalSAT.
VerbalSAT is a confounding
variable in the relationship between GradRate and
MathSAT? Briefly explain.No, because controlling for VerbalSAT does not change the sign of
MathSAT meaning that it is not a confounding variable
MathSAT and VerbalSAT. Stated
differently, why do you think we observed an association between
MathSAT and GradRate in Model 2?Because the SAT is a measure of learning and knowledge, and the
higher score gotten on the SAT correlates pretty well with a higher
graduation rate.
MathSAT
associated with GradRate once we control for
VerbalSAT? Which model is the null hypothesis model
for this research question, and which model is the alternative
hypothesis model?the null hypothesis model would be a model where the coefficient for MathSAT is equal to 0, the alternate hypothesis model would be a model where the coefficient for MathSAT is not equal to 0, the VerbalSAT coefficient and the GradRate coefficients do not matter.
-4.411
CSHA = read.csv( 'https://raw.githubusercontent.com/vittorioaddona/data/main/CSHA.csv' )
Survival and Sex?summary output from a certain model:# model = lm( Survival ~ Sex - 1 , data=CSHA )
# summary( model )
River: Lumber or WacamawStation: A station number (0, 1,…, 15)Length: The fish’s length (in centimeters)Weight: The fish’s weight (in grams)Concen: The fish’s mercury concentration (in parts per
million; ppm)Mercury = read.csv('https://raw.githubusercontent.com/vittorioaddona/data/main/Mercury.csv')
Concen ~ Length +
RiverConcen ~ Length +
River + Weightsummary output
corresponds to a certain hypothesis test. Can you describe in your own
words what each of these p-values is testing?
the comic is trying to show how confusing some values can be, for
example, the comic claims that an alternate hypothesis is true with a p
value greater than 0.05, which would actually make the null hypothesis
more likely.