The four homework sets below target important ideas from each module. Your grades on these will count as bonus points for your content quiz average. Remember to complete not only the code blocks needed but also to provide detailed written explanations of your analyses and findings.
I. Homework Set 1
Covers Modules S1, S2 and S3
Due Date: Friday, October 30, before midnight.
Module 1 Exercises
Item 2. Using the built-in R data frame InsectSprays which examines the number of insects in an area depending upon type of insecticide used, conduct an analysis of the variable count. Include all relevant plots, and include titles and axis labels for your histogram and density plot.
Item 4. Using the built-in R data frame mtcars which includes miles per gallon and ten other variables for 32 different models in 1974, create an xyplot for miles per gallon (mpg variable) vs. displacement (disp variable) using a grouping variable of transmission type, automatic vs. manual (am).
Module 2 Exercises
Item 2. Find the theoretical probability in Exercise 1 by altering the code block that created the pFun function. Use your new pFun function and R’s sum function for the calculations. How does your theoretical calculation compare to your empirical estimate in Exercise 1?
pFun <- function(t) {
choose(16,t)*(.5)^t *(.5)^(16-t)
}
Item 4. How many successes out of 100 would Dr. Bristol need to have before you would believe she was not guessing at random? Explain your reasoning based on empirical or theoretical calculations.
Module 3 Exercises
Item 2. Find the approximate IQ score that corresponds to the 25th percentile IQ. Use the xqnorm function. IQ’s have the \(N(100,15)\) distribution.
Item 3. Find the percentile ranking for an SAT-Math score of 700 using the xpnorm function. SAT components have the \(N(500,100)\) distribution.
Item 5. Find the “middle 90%” of the IQ distribution. The middle 90% will be a symmetric interval that traps exactly 5% of the area below it, 5% above it, and thus 90% within it. Use the xqnorm function. IQ components have the \(N(100,15)\) distribution.
II. Homework Set 2
Covers Modules S4 and S5
Due Date: Friday, November 13, before midnight.
Module 4 Exercises
Item 3. Using the AccDate variable from the Data3350 data frame, test the hypothesis that the Yes/No responses to the dating question depend upon whether or not the person is a member of social Greek fraternity or sorority. Test at the \(\alpha = 0.1\) level, and include a Mosaic plot with a description about it’s relationship to your \(p\)-value and conclusions.
Module 5 Exercises
Item 1. Use the thrill-seeking variable Thrill from the Data3350 data frame to test for a significant difference between younger students and older students (G21 variable, with “Y” meaning yes, 21 or older). Test whether thrill-seeking levels are higher for younger students at the 0.05 level of significance.
Item 4. Use the Neuroticism variable Neuro from the Data3350 data frame to test for a significant difference between those involved in social Greek fraternities and sororities. Use the 0.05 level of significance.
Item 7. Use the Adult Playfulness variable Play from the Data3350 data frame to test for a significance difference in levels of Playfulness based upon the Friend-making variable (Friends) which indicates whether the individual is most comfortable making friends with members of the same or opposite sex, or if the preference is equal. Conduct a post hoc Tukey HSD test if needed.
III. Homework Set 3
Covers Modules S6 and S7
Due Date: Friday, November 20, before midnight
Module 6 Exercises
Item 2. Use the Corps variable in Data3350 where Y / N responses indicate whether the participant’s is in the UNG Corps of Cadets. Assuming the data frame is representative of the UNG Dahlonega campus, create a 90% confidence interval estimate for the percentage of students who are members of the Corps and interpret your findings. Hint: set success = “Y”.
Item 4. Use the TexRel variable in Data3350 where numeric scores represent scores on the Toxic Relationship Beliefs Scale. (Higher scores equate to more toxic beliefs). Assuming the data frame is representative of the UNG Dahlonega campus, create a 99% confidence interval estimate for the mean TxRel score and interpret your findings.
Module 7 Exercises
Item 1. Use the Data3350 data frame to build and evaluate a linear model for narcissism (dependent variabele) vs. thrill-seeking (independent variable) using the Thrill and Narc variables. Be sure to check the linearity and normality assumptions and analyze all regression statistics. Construct a confidence interval for the slope of the regression line using an appropriate method.
Item 3. Use the Perc data frame to create linear models for the 25th percentile wage earners from 2000 Q1 through 2019 Q4. Be sure to look carefully at all diagnostic plots. Create models for the Trump era and the last 3 years of the Obmama era and conduct hypothesis tests that the Trump era growth was significantly greater than the historic trend as well as greater than the Obama era.
IV. Homework Set 4
Covers Modules S8 and S9
Due Date: Friday, December 4, before midnight
Module 8 Exercises
Item 1. Add the non-numeric variable biological Sex to the Thrill-seeking model above to test the stereotype of males being more adventurous than females. The stats notation for the model is \[\text{Thrill} \sim \text{Anx} + \text{Narc} + \text{Play} + \text{Sex}\] Is the variable Sex a significant predictor? Does it add anything to the model? Did the diagnostic plots change in any significant way?
\[\text{Thrill} \sim \text{Anx} + \text{Narc} + \text{Play} + \text{Sex}\]
Item 4. Using the OCD variable from the Data3350 data frame, build a two-predictor linear model using Perf and TypeA as predictors: \[\text{OCD} \sim \text{Perf} + \text{TypeA}\] Evaluate and analyze your model including all diagnostic plots.
- Add Anx as the third predictor in your model for OCD: \[\text{OCD} \sim \text{Perf} + \text{TypeA}+ \text{Anx}\] Evaluate and analyze your model including all diagnostic plots, and compare your three-predictor model with your results from the two-predictor model.
Module 9 Exercises
Item 1.A baseball card company claims that 25% of its cards are rookies, 65% are veterans but not All-Stars, and 10% are veteran All-Stars. Suppose a random sample of 200 cards has 70 rookies, 120 veterans, and 10 All-Stars. Is the company’s claimed distribution credible? Test using \(\chi^2\) GOF with a 0.05 level of significance.
Item 3. Helena buys custom M&M’s for a bridal shower she’s throwing for her sister. She orders 10% yellow, 20% pale blue, 30% red and 40% pink. When she receives her 10 lb. package, she sees almost not yellow and far too many pale blue. She randomly selects 200 of the M&M’s and finds the following observed counts: \[\begin{array}{cccc}\text{Yellow}&\text{Blue}&\text{Red}&\text{Pink}\\ \hline
8&54&64&74\end{array}\]
