Homework 1 - BST682
Homework 1 Overview
This homework is intended to serve three main purposes: (1) familiarize you with our course homework submission policy, (2) refresh your probability and linear modeling skills, and (3) introduce you to R/QUARTO. Give complete solutions, justifying your response when necessary.
This homework is due by 3:29pm on Thursday, January 30th. To complete this assignment, follow these steps:
Answer the questions below in a format for which you are comfortable (e.g., \(LaTeX\), \(\texttt{R}\), Word, paper, etc.). As
R/QUARTOis a focus of our class this semester, we strongly recommend aQUARTO.qmd file.Convert this work to a pdf and name it lastname1.pdf (replacing ‘lastname’ with your last name in lowercase).
For any questions that require programming, provide a similarly-named file that includes fully-reproducible code. (e.g.,
lastname1.r,lastname1.rmd,lastname1.sas, or – ideally –lastname1.qmd)Submit these files to Canvas.
Problem 1: probablity refresher 1
Tesla, AirBnB and OpenAI have 4000, 1800, and 800 employees, respectively, and 30, 45, and 65 percent of these employees respectively are women. Resignations are equally likely among companies and genders. One woman resigns. What is the probability she worked for AirBnB?
Problem 2: probability refresher 2
You flip five fair coins. Assuming the flips are independent, what is the pmf for the number of tails flipped?
Problem 3: probability refresher 3
Do problem 1.6 (a,b) from Dobson and Barnett.
Problem 4: probability refresher 4
Assume annual rainfall in Lexington is normally distributed with a mean of 46 inches and standard deviation of 4. What is the probability that it takes more than 7 years before having a rainfall over 55 inches? What assumptions are you making?
Problem 5: linear models refresher
Using the data from Table 2.3 Birthweight and gestational age.xls, calculate by matrix algebra the effect estimate resulting from regressing birth weight on gestational age.
Problem 6: R intro 1
You will inevitably use the ’ol Google or an LLM to problem solve while programming in R – many of you already do. Having go to resources for answering your questions and/or developing new skills can be quite helpful. Search around for what might be (or already is) a resource you will turn to as you improve your R skills. Give the site and url. What, in particular, makes this suitable for you?
Problem 7: R intro 2
Import the data from Table 2.3 Birthweight and gestational age.xls into R. Each observation should be a single row. Tip: I added a second sheet to make this easier if you prefer. Use the Import Dataset functionality in RStudio’s Environment tab and select Sheet 2. This simple example shows why some abhor Excel… Tip 2: Use the readxl package.
Plot birthweight by age and give each gender a different color on the same plot. Now, do the same plot stratified by gender (Tip: look at the Introduction to R notes). What observations do you have?
Problem 8: R intro 3
Using R and lm, confirm your regression parameter estimate in Problem 5.