Name:_______________________ ENS 495 Fall 2016 12/15/2016 Test 3: Regression
#This is the test key.
#Notes regarding the test will be sent via email
#and the course Facebook page
# https://www.facebook.com/groups/930301587096169/?ref=bookmarks
#COrrect answers are flagged with the text
# "**CORRECT**"
All questions are weighted as 2 points unless otherwise stated.
Question 1: Which TWO statements in the list below about correlation analysis are TRUE and can be used to complete the following sentence:
Correlation analysis investigates/determine if… (circle 2 answers)
Question 2: Regression analysis is different than correlation analysis for which of the following reasons (that is, what does regression analysis assume or do that correlation analysis does not or can not?)
#This question should be re-worded if used in teh future
Question 3: What mathematical method is used to fit a regression line to data?
Question 4: When you fit a regression line to data, the TWO parameters that describe form of the line through the scatterplot are: (circle 2 answers)
Question 5: When you run a regression, uncertainty about the true value of the slope of the line can be characterized
Question 6 For the following questions consider the graph above
6a: Which line in the graph above represents the NULL Hypothesis or model (Ho)? (1 point)
6b: For the graph above, which line represents the Alternative Hypothesis or model (Ha)? (1 point)
6c) Just looking at the graph, which hypothesis do you think is most likely to be true? (1 point)
6d) What term best characterizes Line 1
6e) What is the “word equation” that would be appropriate for a regression model for these data in the plot?
Question 7) A model using temp and ozone was fit to these data usig the lm() function in R. The following output was produced using the summary() command.
Estimate Std. Error t value Pr(>|t|)
Intercept -146.9955 18.2872 -8.038 9.37e-13 Temp 2.4287 0.2331 10.418 < 2e-16
7a) Write the full mathematical equation described by this output.
7b) In the output above, what is the p-value? Write it below.
7c) How do you interpret this p-value.
Question 8) TRUE / FALSE (circle one) Regression, ANOVA and t-tests are all fundamentally different methods that require different statistical approaches. (1 point)
Question 9) In the figure above, the left-hand size of the plot shows a scatter plot of the data. The right hand side shows a regression line through the data.
What are the vertical lines drawn from the regression line up or down to each data point?
Question 10) The above plot show diagnostics from the scatter plot in the previous question.
10a) Which TWO plots provide information on the normality of the data? (circle 2 answers)
10b) Which plot provides information about whether there are any outliers and/or influental points in the data set?
10c) Is there a outer/influential point in this data set, and if so what is it? (1 point)
Question 11: When doing regression, the process involves calculating the residuals of the regression model and doing what mathematically to them to determine the line that best fits the data?
Question 12: The standard error (SE) of the slope of a regression line represents:
Question 13: The value of R^2 from a regression tells us:
Question 14: TRUE / FALSE: If you have a very low p-value (highly significant difference), you you must also have a very high R^2 value. Therefore, R^2 is highly correlated with p-values. (1 point)
Question 15: What are the key assumptions of regression? (4 points)
Question 16: Which of your 4 key assumptions is the most important to pay attention to (1 point; refer to to the list you made for question 17)
Question 17: The left-hand graph above shows a plot raw data, and the right hand graph shows a diagnostic plot.
17a) What assumption of regression analysis does this diagnostics plot tell us about? (refer to the list you made above)
17b) Do these data appear to violate this assumption?
17c) Assuming these data do indeed violate this assumption, what could be done to try to fix the problem?
Question 18: Which of the following things does the log tranformation NOT do
Question 19) Which two statements are true about outliers
x <- runif(100,1,10)
y <- 10.0 + 12.0*x + -0.872*x^2 + -0.0072*x^3 + -0.000972*x^4 + rnorm(length(x),0,10)
scatter.smooth(x,y, main = "Question 20")
Question 20) For the following questions consider this scatter plot of variable y plotted against variable x.
20a) The line through the data was not draw using regression but instead using a technique use to help visualize curvey data. What is the name of this type of line? (1/2 point)
20b) What is the technical name for a such as this that is not straight? (1/2 point)
Question 21) What R code would you add to a regression to model a curvey line? Assume the predictor is called “x” (1 point)
Question 22) Logistic regression is used to model
Question 23) Which of these TWO statements are true about logistic regression (circle 2)
Question 24) This plot shows three sets of data (diamonds, circles, triangles) and regression lines running through them. What statement is true about these lines?
Question 25) The figure above, in the top-left panel, shows scatter plot of raw data with a regression line and a “confidence band” (aka “confidence ellipse”) around the line. This line represents uncertainty about the true values of the parameters that define the line. In panels a, b, and c are shown 3 other possible regression lines as thick lines plotted with the original regression line (now thin and dotted) and confidence band.
Of the 3 alternative regression lines, which ones are possible alternatives that are consistent with the data? (1 point)
Question 26) In the plot above, the raw data used previously is plotted with it 95% Confidence Band. Panels a, b and c represent similar regression lines fit to alternative data sets with different sample sizes.
Why do the confidence bands change size? (1 point)
Question 27 The plot above shows four hypothetical regression lines with different intercepts and slopes
27a) Which plots have positive (+) slopes? (circle all that apply; 1 point)
27b) Which plots have negative (-) slopes? (circle all that apply; 1 point)
27c) Which plots have zero (0) slopes? (circle all that apply; 1 point)
27d) Which plots have positive intercepts? (circle all that apply; 1 point)
27e) Which plots have negative intercepts? (circle all that apply; 1 point)
## [1] "Plant" "Type" "Treatment" "conc" "uptake"
Question 28) In a lab experiment researchers were interested in the effect of carbon dioxide concentration (CO2) in the air the rate at which plants can use CO2 for photosynthesis. Their response variable was CO2 uptake rate (“uptake”) and their predictor variable was CO2 concentation (“conc”).
28a) Write the R code to represents a “null” hypothesis (Ho); that is, a model that assumes “update” does not change with “conc” (1 point)
28a) Write the R code to represents a “alternative” hypothesis (Ho); that is, a model that assumes “uptake” does change with “conc” (1 point)
28a) Assume that the null model is called “m.null” and the alternative model is called “m.alt”. Write the one line of R code used to test whether the null hypothesis should be rejected.
anova(m.null,m.alt)
## Analysis of Variance Table
##
## Model 1: uptake ~ 1
## Model 2: uptake ~ conc
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 1470.7
## 2 28 1436.8 1 33.933 0.6613 0.423
summary(m.alt)
##
## Call:
## lm(formula = uptake ~ conc, data = CO2[i.use, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.0375 -5.1557 0.2152 6.7416 10.0625
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.420631 3.037488 7.052 1.14e-07 ***
## conc 0.004017 0.004940 0.813 0.423
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.163 on 28 degrees of freedom
## Multiple R-squared: 0.02307, Adjusted R-squared: -0.01182
## F-statistic: 0.6613 on 1 and 28 DF, p-value: 0.423
Question 29 Above is partial R output from an analysis of the data in question 28.
Question 29a) What hypothesis does this data support?
Question 29b) Write a sentence using this output to describe the results of this study.
Question 30 Which plot indicates a violation of an assumption of regression modeling?
Question 31 You are reading an old paper from the Journal of Aquatic Ecology and the author’s state “The regression analysis we conducted was highly significant (p < 0.0001), supporting our hypothesis that pH impacted the abundance of insects in southwestern PA streams.”
31a What can you conclude about the slope of the regression model from this p-value?
31b What can you conclude about the R^2 value of the regression model from this p-value?