Instructions
- This short homework is due on Monday Oct 21 by midnight. No late work is accepted unless prior arrangements have been made with the instructor.
- Please answer ALL questions (including those that do not involve R) in the locations marked in this template. Remember to periodically knit your document to check that the output appears as you want it to; you will be turning in your knit html document.
- You will need to upload this document in CANVAS. Please be sure to check that your file was uploaded correctly. It does not hurt to screenshot verification of the upload as CANVAS can be glitchy on occasion.
- Please answer the questions in the order in which they are posed. Write in complete sentences, and support your answers where asked.
Exercise 1: Vitamin supplements
Exercise 1.30 on page 35.
- This was an observational study because researchers specifically recruited 400 volunteers to observe how well varying amounts of Vitamin C can help reduce the duration of the common cold. They did not just observe data that arose, but conducted this study to observe a causal connection between varying amounts of Vitamin C and reduction in duration of the common cold.
- The explanatory variables in this study are the large doses of Vitamin C taken by the study participants. The response variable is the duration of the common cold for each of the study participants.
Exercise 2: Income and Education in U.S. countries
Exercise 1.40 on page 37. For part (c), be sure to clearly make a confounding variable argument.
Write your answer here
Exercise 3: R Practice: Working with class data
As always, be sure to load the tidyverse package. For this problem, we will work with data from the survey that (most of) you filled out during the first week of class. To start, you can read in a partial and anonymized version of the data using the code below:
Part A: Exploring the Data
How many observations are in the dataset? How many variables? For each variable, is it quantitative or categorical?
Write your answer here
Part B: Encountering Messy Data
One of the variables in the dataset is mathSAT. Try running the following code to get the average math SAT score of students who filled out the survey.
As you can see, this code results in an error. Explore the variable in your console using classdat %>% select(mathSAT). Explain in words why this error occurred.
Write your answer here
Part C: Dealing with Messy Data
The problem encountered in exercise 2 is common when working with real data. Learning how to deal with these errors is very useful. The following code filters the data to only include individuals with valid numeric SAT scores.
The as.numeric(mathSAT) command converts the mathSAT column to a numerical variable; any row that fails to be converted is given a value of NA. The !is.na() function will evaluate to TRUE for any row that does not have a value of NA (note that the ! symbol is frequently used in programming languages to mean not). Therefore, this filter()command leaves only individuals with valid (non-NA) numerical SAT scores.
Using the code segment above as a starting point, compute the mean math SAT score for Stat 311 students who reported valid scores.
Part D: Distribution of class years
Create a relative frequency table showing the distribution of the year variable in the dataset.
See exercise 1 of lab 2 for help. In order to make our table most readable, we should first tell R that year is an ordinal variable.
Part E: Do seniors tend to get less sleep than freshman?
Create side by side boxplots of the hours_sleep variable segmented by the variable year. Comment on what you see. Because the question asks specifically about seniors and freshman, you can make your plot with only a subset of the data:
See exercise 1 on Lab 2 for tips on creating a side-by-side boxplot.
Written comments here
Part F: Do dating preferences change over time?
On the survey, you were asked whether or not you would date someone who you did not find attractive if they had a great personality. Do freshman, sophomores, juniors, and seniors tend to give different answers to this question? Support your answer by including both a conditional distribution table (of date given year) and a standardized bar plot.
Note: In R, date is the name of the column in your dataset, while date() is the name of a built-in function. R may autocomplete date to date(). If it does this, be sure to delete the parentheses so that you are referring to the column of your dataset.
Written answer here
LS0tDQp0aXRsZTogIlN0YXQgMzExLCBIVzMiDQphdXRob3I6ICJMaXogQmFiYiINCmRhdGU6ICJNb25kYXksIE9jdG9iZXIgMjEiDQpvdXRwdXQ6IG9pbGFiczo6bGFiX3JlcG9ydA0KLS0tDQoNCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFKQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KHdhcm5pbmcgPSBGQUxTRSkNCmtuaXRyOjpvcHRzX2NodW5rJHNldChtZXNzYWdlID0gRkFMU0UpDQpgYGANCg0KIyMjIEluc3RydWN0aW9ucw0KKyBUaGlzIHNob3J0IGhvbWV3b3JrIGlzIGR1ZSBvbiBNb25kYXkgT2N0IDIxIGJ5IG1pZG5pZ2h0LiBObyBsYXRlIHdvcmsgaXMgYWNjZXB0ZWQgdW5sZXNzIHByaW9yIGFycmFuZ2VtZW50cyBoYXZlIGJlZW4gbWFkZSB3aXRoIHRoZSBpbnN0cnVjdG9yLg0KKyBQbGVhc2UgYW5zd2VyIEFMTCBxdWVzdGlvbnMgKGluY2x1ZGluZyB0aG9zZSB0aGF0IGRvIG5vdCBpbnZvbHZlIFIpIGluIHRoZSBsb2NhdGlvbnMgbWFya2VkIGluIHRoaXMgdGVtcGxhdGUuIFJlbWVtYmVyIHRvIHBlcmlvZGljYWxseSAqKmtuaXQqKiB5b3VyIGRvY3VtZW50IHRvIGNoZWNrIHRoYXQgdGhlIG91dHB1dCBhcHBlYXJzIGFzIHlvdSB3YW50IGl0IHRvOyB5b3Ugd2lsbCBiZSB0dXJuaW5nIGluIHlvdXIga25pdCBodG1sIGRvY3VtZW50LiANCisgWW91IHdpbGwgbmVlZCB0byB1cGxvYWQgdGhpcyBkb2N1bWVudCBpbiBDQU5WQVMuIFBsZWFzZSBiZSBzdXJlIHRvIGNoZWNrIHRoYXQgeW91ciBmaWxlIHdhcyB1cGxvYWRlZCBjb3JyZWN0bHkuIEl0IGRvZXMgbm90IGh1cnQgdG8gc2NyZWVuc2hvdCB2ZXJpZmljYXRpb24gb2YgdGhlIHVwbG9hZCBhcyBDQU5WQVMgY2FuIGJlIGdsaXRjaHkgb24gb2NjYXNpb24uDQorICBQbGVhc2UgYW5zd2VyIHRoZSBxdWVzdGlvbnMgaW4gdGhlIG9yZGVyIGluIHdoaWNoIHRoZXkgYXJlIHBvc2VkLiBXcml0ZSBpbiBjb21wbGV0ZSBzZW50ZW5jZXMsIGFuZCBzdXBwb3J0IHlvdXIgYW5zd2VycyB3aGVyZSBhc2tlZC4gDQoNCiMjIyBSZWFkaW5nDQorIFNlY3Rpb25zIDEuNCwgDQoNCiogKiAqDQoNCiMjIyBFeGVyY2lzZSAxOiBWaXRhbWluIHN1cHBsZW1lbnRzDQoNCkV4ZXJjaXNlIDEuMzAgb24gcGFnZSAzNS4NCg0KQSkgVGhpcyB3YXMgYW4gb2JzZXJ2YXRpb25hbCBzdHVkeSBiZWNhdXNlIHJlc2VhcmNoZXJzIHNwZWNpZmljYWxseSByZWNydWl0ZWQgNDAwIHZvbHVudGVlcnMgdG8gb2JzZXJ2ZSBob3cgd2VsbCB2YXJ5aW5nIGFtb3VudHMgb2YgVml0YW1pbiBDIGNhbiBoZWxwIHJlZHVjZSB0aGUgZHVyYXRpb24gb2YgdGhlIGNvbW1vbiBjb2xkLiBUaGV5IGRpZCBub3QganVzdCBvYnNlcnZlIGRhdGEgdGhhdCBhcm9zZSwgYnV0IGNvbmR1Y3RlZCB0aGlzIHN0dWR5IHRvIG9ic2VydmUgYSBjYXVzYWwgY29ubmVjdGlvbiBiZXR3ZWVuIHZhcnlpbmcgYW1vdW50cyBvZiBWaXRhbWluIEMgYW5kIHJlZHVjdGlvbiBpbiBkdXJhdGlvbiBvZiB0aGUgY29tbW9uIGNvbGQuIA0KQikgVGhlIGV4cGxhbmF0b3J5IHZhcmlhYmxlcyBpbiB0aGlzIHN0dWR5IGFyZSB0aGUgbGFyZ2UgZG9zZXMgb2YgVml0YW1pbiBDIHRha2VuIGJ5IHRoZSBzdHVkeSBwYXJ0aWNpcGFudHMuIFRoZSByZXNwb25zZSB2YXJpYWJsZSBpcyB0aGUgZHVyYXRpb24gb2YgdGhlIGNvbW1vbiBjb2xkIGZvciBlYWNoIG9mIHRoZSBzdHVkeSBwYXJ0aWNpcGFudHMuIA0KDQoNCiMjIyBFeGVyY2lzZSAyOiBJbmNvbWUgYW5kIEVkdWNhdGlvbiBpbiBVLlMuIGNvdW50cmllcw0KDQpFeGVyY2lzZSAxLjQwIG9uIHBhZ2UgMzcuIEZvciBwYXJ0IChjKSwgYmUgc3VyZSB0byBjbGVhcmx5IG1ha2UgYSBjb25mb3VuZGluZyB2YXJpYWJsZSBhcmd1bWVudC4NCg0KKipXcml0ZSB5b3VyIGFuc3dlciBoZXJlKioNCg0KDQojIyMgRXhlcmNpc2UgMzogUiBQcmFjdGljZTogV29ya2luZyB3aXRoIGNsYXNzIGRhdGENCg0KQXMgYWx3YXlzLCBiZSBzdXJlIHRvIGxvYWQgdGhlIGB0aWR5dmVyc2VgIHBhY2thZ2UuIEZvciB0aGlzIHByb2JsZW0sIHdlIHdpbGwgd29yayB3aXRoIGRhdGEgZnJvbSB0aGUgc3VydmV5IHRoYXQgKG1vc3Qgb2YpIHlvdSBmaWxsZWQgb3V0IGR1cmluZyB0aGUgZmlyc3Qgd2VlayBvZiBjbGFzcy4gVG8gc3RhcnQsIHlvdSBjYW4gcmVhZCBpbiBhIHBhcnRpYWwgYW5kIGFub255bWl6ZWQgdmVyc2lvbiBvZiB0aGUgZGF0YSB1c2luZyB0aGUgY29kZSBiZWxvdzoNCg0KYGBge3IsIG1lc3NhZ2U9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmNsYXNzZGF0IDwtIHJlYWQuY3N2KCJodHRwczovL2FubmEtbmV1ZmVsZC5naXRodWIuaW8vU3RhdDMxMS9vaUxhYnMvV2VlazMvcGFydGlhbF9jbGFzc19kYXRhLmNzdiIsIGFzLmlzPVRSVUUpDQpgYGANCg0KIyMjIyBQYXJ0IEE6IEV4cGxvcmluZyB0aGUgRGF0YQ0KSG93IG1hbnkgb2JzZXJ2YXRpb25zIGFyZSBpbiB0aGUgZGF0YXNldD8gSG93IG1hbnkgdmFyaWFibGVzPyBGb3IgZWFjaCB2YXJpYWJsZSwgaXMgaXQgcXVhbnRpdGF0aXZlIG9yIGNhdGVnb3JpY2FsPyANCg0KKipXcml0ZSB5b3VyIGFuc3dlciBoZXJlKioNCg0KIyMjIyBQYXJ0IEI6IEVuY291bnRlcmluZyBNZXNzeSBEYXRhDQoNCk9uZSBvZiB0aGUgdmFyaWFibGVzIGluIHRoZSBkYXRhc2V0IGlzIGBgbWF0aFNBVGBgLiBUcnkgcnVubmluZyB0aGUgZm9sbG93aW5nIGNvZGUgdG8gZ2V0IHRoZSBhdmVyYWdlIG1hdGggU0FUIHNjb3JlIG9mIHN0dWRlbnRzIHdobyBmaWxsZWQgb3V0IHRoZSBzdXJ2ZXkuIA0KDQpgYGB7ciwgZXZhbD1GQUxTRX0NCmNsYXNzZGF0ICU+JSBzdW1tYXJpemUobWVhblNBVCA9IG1lYW4obWF0aFNBVCkpDQpgYGANCg0KQXMgeW91IGNhbiBzZWUsIHRoaXMgY29kZSByZXN1bHRzIGluIGFuIGVycm9yLiBFeHBsb3JlIHRoZSB2YXJpYWJsZSBpbiB5b3VyIGNvbnNvbGUgdXNpbmcgYGBjbGFzc2RhdCAlPiUgc2VsZWN0KG1hdGhTQVQpYGAuIEV4cGxhaW4gaW4gd29yZHMgd2h5IHRoaXMgZXJyb3Igb2NjdXJyZWQuIA0KDQoqKldyaXRlIHlvdXIgYW5zd2VyIGhlcmUqKg0KDQojIyMjIFBhcnQgQzogRGVhbGluZyB3aXRoIE1lc3N5IERhdGENClRoZSBwcm9ibGVtIGVuY291bnRlcmVkIGluIGV4ZXJjaXNlIDIgaXMgY29tbW9uIHdoZW4gd29ya2luZyB3aXRoIHJlYWwgZGF0YS4gTGVhcm5pbmcgaG93IHRvIGRlYWwgd2l0aCB0aGVzZSBlcnJvcnMgaXMgdmVyeSB1c2VmdWwuIFRoZSBmb2xsb3dpbmcgY29kZSBmaWx0ZXJzIHRoZSBkYXRhIHRvIG9ubHkgaW5jbHVkZSBpbmRpdmlkdWFscyB3aXRoIHZhbGlkIG51bWVyaWMgU0FUIHNjb3Jlcy4gDQoNCmBgYHtyLCBldmFsPUZBTFNFfQ0KY2xhc3NkYXQgJT4lIGZpbHRlcighaXMubmEoYXMubnVtZXJpYyhtYXRoU0FUKSkpDQpgYGANCg0KKlRoZSBgYGFzLm51bWVyaWMobWF0aFNBVClgYCBjb21tYW5kIGNvbnZlcnRzIHRoZSBtYXRoU0FUIGNvbHVtbiB0byBhIG51bWVyaWNhbCB2YXJpYWJsZTsgYW55IHJvdyB0aGF0IGZhaWxzIHRvIGJlIGNvbnZlcnRlZCBpcyBnaXZlbiBhIHZhbHVlIG9mIGBOQWAuIFRoZSBgIWlzLm5hKClgIGZ1bmN0aW9uIHdpbGwgZXZhbHVhdGUgdG8gVFJVRSBmb3IgYW55IHJvdyB0aGF0IGRvZXMgbm90IGhhdmUgYSB2YWx1ZSBvZiBgTkFgIChub3RlIHRoYXQgdGhlIGAhYCBzeW1ib2wgaXMgZnJlcXVlbnRseSB1c2VkIGluIHByb2dyYW1taW5nIGxhbmd1YWdlcyB0byBtZWFuIGBub3RgKS4gVGhlcmVmb3JlLCB0aGlzIGBmaWx0ZXIoKWBjb21tYW5kIGxlYXZlcyBvbmx5IGluZGl2aWR1YWxzIHdpdGggdmFsaWQgKG5vbi1OQSkgbnVtZXJpY2FsIFNBVCBzY29yZXMuICoNCg0KVXNpbmcgdGhlIGNvZGUgc2VnbWVudCBhYm92ZSBhcyBhIHN0YXJ0aW5nIHBvaW50LCBjb21wdXRlIHRoZSBtZWFuIG1hdGggU0FUIHNjb3JlIGZvciBTdGF0IDMxMSBzdHVkZW50cyB3aG8gcmVwb3J0ZWQgdmFsaWQgc2NvcmVzLiANCg0KYGBge3J9DQojIyBJbmNsdWRlIHlvdXIgY29kZSB0byBjb21wdXRlIHlvdXIgYW5zd2VyIGhlcmUuIA0KYGBgDQoNCiMjIyMgUGFydCBEOiBEaXN0cmlidXRpb24gb2YgY2xhc3MgeWVhcnMNCkNyZWF0ZSBhIHJlbGF0aXZlIGZyZXF1ZW5jeSB0YWJsZSBzaG93aW5nIHRoZSBkaXN0cmlidXRpb24gb2YgdGhlIGBgeWVhcmBgIHZhcmlhYmxlIGluIHRoZSBkYXRhc2V0LiANCg0KKlNlZSBleGVyY2lzZSAxIG9mIGxhYiAyIGZvciBoZWxwLiBJbiBvcmRlciB0byBtYWtlIG91ciB0YWJsZSBtb3N0IHJlYWRhYmxlLCB3ZSBzaG91bGQgZmlyc3QgdGVsbCBSIHRoYXQgYGB5ZWFyYGAgaXMgYW4gb3JkaW5hbCB2YXJpYWJsZS4qDQpgYGB7cn0NCmNsYXNzZGF0IDwtIGNsYXNzZGF0ICU+JSBtdXRhdGUoeWVhciA9IG9yZGVyZWQoeWVhciwgbGV2ZWxzPWMoIkZyZXNobWFuIiwgIlNvcGhvbW9yZSIsICJKdW5pb3IiLCAiU2VuaW9yIikpKSANCmBgYA0KDQpgYGB7cn0NCiMgSW5jbHVkZSB5b3VyIHRhYmxlIGhlcmUuIA0KYGBgDQoNCiMjIyMgUGFydCBFOiBEbyBzZW5pb3JzIHRlbmQgdG8gZ2V0IGxlc3Mgc2xlZXAgdGhhbiBmcmVzaG1hbj8NCkNyZWF0ZSBzaWRlIGJ5IHNpZGUgYm94cGxvdHMgb2YgdGhlIGBgaG91cnNfc2xlZXBgYCB2YXJpYWJsZSBzZWdtZW50ZWQgYnkgdGhlIHZhcmlhYmxlIGBgeWVhcmBgLiBDb21tZW50IG9uIHdoYXQgeW91IHNlZS4gQmVjYXVzZSB0aGUgcXVlc3Rpb24gYXNrcyBzcGVjaWZpY2FsbHkgYWJvdXQgc2VuaW9ycyBhbmQgZnJlc2htYW4sIHlvdSBjYW4gbWFrZSB5b3VyIHBsb3Qgd2l0aCBvbmx5IGEgc3Vic2V0IG9mIHRoZSBkYXRhOg0KYGBge3J9DQpzdWJzZXQgPC0gY2xhc3NkYXQgJT4lIGZpbHRlcih5ZWFyPT0iU2VuaW9yIiB8IHllYXI9PSJGcmVzaG1hbiIpDQpgYGANCg0KKlNlZSBleGVyY2lzZSAxIG9uIExhYiAyIGZvciB0aXBzIG9uIGNyZWF0aW5nIGEgc2lkZS1ieS1zaWRlIGJveHBsb3QuKg0KDQpgYGB7cn0NCiMgQ29kZSBmb3IgYm94cGxvdA0KYGBgDQoNCioqV3JpdHRlbiBjb21tZW50cyBoZXJlKioNCg0KIyMjIyBQYXJ0IEY6IERvIGRhdGluZyBwcmVmZXJlbmNlcyBjaGFuZ2Ugb3ZlciB0aW1lPw0KT24gdGhlIHN1cnZleSwgeW91IHdlcmUgYXNrZWQgd2hldGhlciBvciBub3QgeW91IHdvdWxkIGRhdGUgc29tZW9uZSB3aG8geW91IGRpZCBub3QgZmluZCBhdHRyYWN0aXZlIGlmIHRoZXkgaGFkIGEgZ3JlYXQgcGVyc29uYWxpdHkuIERvIGZyZXNobWFuLCBzb3Bob21vcmVzLCBqdW5pb3JzLCBhbmQgc2VuaW9ycyB0ZW5kIHRvIGdpdmUgZGlmZmVyZW50IGFuc3dlcnMgdG8gdGhpcyBxdWVzdGlvbj8gU3VwcG9ydCB5b3VyIGFuc3dlciBieSBpbmNsdWRpbmcgYm90aCBhIGNvbmRpdGlvbmFsIGRpc3RyaWJ1dGlvbiB0YWJsZSAob2YgYGBkYXRlYGAgZ2l2ZW4gYGB5ZWFyYGApIGFuZCBhIHN0YW5kYXJkaXplZCBiYXIgcGxvdC4NCg0KKk5vdGU6IEluIFIsIGBgZGF0ZWBgIGlzIHRoZSBuYW1lIG9mIHRoZSBjb2x1bW4gaW4geW91ciBkYXRhc2V0LCB3aGlsZSBgYGRhdGUoKWBgIGlzIHRoZSBuYW1lIG9mIGEgYnVpbHQtaW4gZnVuY3Rpb24uIFIgbWF5IGF1dG9jb21wbGV0ZSBgYGRhdGVgYCB0byBgYGRhdGUoKWBgLiBJZiBpdCBkb2VzIHRoaXMsIGJlIHN1cmUgdG8gZGVsZXRlIHRoZSBwYXJlbnRoZXNlcyBzbyB0aGF0IHlvdSBhcmUgcmVmZXJyaW5nIHRvIHRoZSBjb2x1bW4gb2YgeW91ciBkYXRhc2V0LioNCg0KYGBge3J9DQojIENvZGUgZm9yIHRhYmxlDQojIENvZGUgZm9yIGJhcnBsb3QNCmBgYA0KDQoqKldyaXR0ZW4gYW5zd2VyIGhlcmUqKg0KDQoNCg==