library(tidyverse)
library(openintro)
library(tidyverse)
library(openintro)
data('hfi', package='openintro')

Exercise 1

There is 123 columns in this dataset with glimpse(hfi). (note: trust me)

Exercise 2

I would use a scatter plot to show the relationship between two or more numeric values. For the example,Pf_score has a positive linear increase with the Pf expression control

ggplot(hfi, aes(x=pf_expression_control, y=pf_score)) + geom_point()+theme_classic()

Exercise 3

There is a strong positive relationship between the pf score and the pf expression control. As pf score increase, the pf expression control increased as well. There are a few outliers between 0.0 and 2.5, but unsure of the circumstances of those areas.

Exercise 4

the line y=.4914x+ 4.6171 is the best fit for the line. The smallest sum of squares was 952.153

Exercise 5

y=5.153687+0.349862*hf_score

m1 <- lm(hf_score ~ pf_expression_control, data = hfi)
summary(m1)
## 
## Call:
## lm(formula = hf_score ~ pf_expression_control, data = hfi)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6198 -0.4908  0.1031  0.4703  2.2933 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           5.153687   0.046070  111.87   <2e-16 ***
## pf_expression_control 0.349862   0.008067   43.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.667 on 1376 degrees of freedom
##   (80 observations deleted due to missingness)
## Multiple R-squared:  0.5775, Adjusted R-squared:  0.5772 
## F-statistic:  1881 on 1 and 1376 DF,  p-value: < 2.2e-16

Exercise 6

They might guess the 4.6 on the pf expression control if they based it on the least squares regression line. There a point where it can be 4.6-4.606060= the residual of the

ggplot(data = hfi, aes(x = pf_expression_control, y = pf_score)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

Exercise 7

All the points are form in a line and have no direction. It is all linear.

ggplot(data = m1, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  xlab("Fitted values") +
  ylab("Residuals")

Exercise 8

yes, the residuals meet the conditions. The histogram is unimodal but just one bucket, no clue if it’s a concern.

ggplot(data = m1, aes(x = .resid)) +
  geom_histogram(binwidth = 25) +
  xlab("Residuals")

ggplot(data = m1, aes(sample = .resid)) +
  stat_qq()

Exercise 9

The conditions are met with both plots.

Bonus questions

  1. Lets see the conditions were pf expressions are killed. There seems to be a trend in the positive direction of people being killed as the pf_score increased. However, there is a lot of noise in the plot.
ggplot(data = hfi, aes(x = pf_expression_killed, y = pf_score)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

LS0tDQp0aXRsZTogIkxhYiA4ICINCmF1dGhvcjogIlZ5YW5uYSBIaWxsIg0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQpgYGB7ciBsb2FkLXBhY2thZ2VzLCBtZXNzYWdlPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KG9wZW5pbnRybykNCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQpkYXRhKCdoZmknLCBwYWNrYWdlPSdvcGVuaW50cm8nKQ0KYGBgDQoNCiMjIyBFeGVyY2lzZSAxDQoNClRoZXJlIGlzIDEyMyBjb2x1bW5zIGluIHRoaXMgZGF0YXNldCB3aXRoIGdsaW1wc2UoaGZpKS4gKG5vdGU6IHRydXN0IG1lKQ0KDQojIyMgRXhlcmNpc2UgMg0KDQpJIHdvdWxkIHVzZSBhIHNjYXR0ZXIgcGxvdCB0byBzaG93IHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0d28gb3IgbW9yZSBudW1lcmljIHZhbHVlcy4gRm9yIHRoZSBleGFtcGxlLFBmX3Njb3JlIGhhcyBhIHBvc2l0aXZlIGxpbmVhciBpbmNyZWFzZSB3aXRoIHRoZSBQZiBleHByZXNzaW9uIGNvbnRyb2wNCg0KYGBge3Igd2FybmluZz1GQUxTRX0NCmdncGxvdChoZmksIGFlcyh4PXBmX2V4cHJlc3Npb25fY29udHJvbCwgeT1wZl9zY29yZSkpICsgZ2VvbV9wb2ludCgpK3RoZW1lX2NsYXNzaWMoKQ0KDQpgYGANCg0KIyMjIEV4ZXJjaXNlIDMNCg0KVGhlcmUgaXMgYSBzdHJvbmcgcG9zaXRpdmUgcmVsYXRpb25zaGlwIGJldHdlZW4gdGhlIHBmIHNjb3JlIGFuZCB0aGUgcGYgZXhwcmVzc2lvbiBjb250cm9sLiBBcyBwZiBzY29yZSBpbmNyZWFzZSwgdGhlIHBmIGV4cHJlc3Npb24gY29udHJvbCBpbmNyZWFzZWQgYXMgd2VsbC4gVGhlcmUgYXJlIGEgZmV3IG91dGxpZXJzIGJldHdlZW4gMC4wIGFuZCAyLjUsIGJ1dCB1bnN1cmUgb2YgdGhlIGNpcmN1bXN0YW5jZXMgb2YgdGhvc2UgYXJlYXMuDQoNCiMjIyBFeGVyY2lzZSA0DQoNCnRoZSBsaW5lIHk9LjQ5MTR4KyA0LjYxNzEgaXMgdGhlIGJlc3QgZml0IGZvciB0aGUgbGluZS4gVGhlIHNtYWxsZXN0IHN1bSBvZiBzcXVhcmVzIHdhcyA5NTIuMTUzDQoNCiMjIyBFeGVyY2lzZSA1DQoNCnk9NS4xNTM2ODcrMC4zNDk4NjIqaGZfc2NvcmUNCg0KYGBge3J9DQptMSA8LSBsbShoZl9zY29yZSB+IHBmX2V4cHJlc3Npb25fY29udHJvbCwgZGF0YSA9IGhmaSkNCnN1bW1hcnkobTEpDQpgYGANCg0KIyMjIEV4ZXJjaXNlIDYNCg0KVGhleSBtaWdodCBndWVzcyB0aGUgNC42IG9uIHRoZSBwZiBleHByZXNzaW9uIGNvbnRyb2wgaWYgdGhleSBiYXNlZCBpdCBvbiB0aGUgbGVhc3Qgc3F1YXJlcyByZWdyZXNzaW9uIGxpbmUuIFRoZXJlIGEgcG9pbnQgd2hlcmUgaXQgY2FuIGJlIDQuNi00LjYwNjA2MD0gdGhlIHJlc2lkdWFsIG9mIHRoZSANCg0KYGBge3Igd2FybmluZz1GQUxTRX0NCmdncGxvdChkYXRhID0gaGZpLCBhZXMoeCA9IHBmX2V4cHJlc3Npb25fY29udHJvbCwgeSA9IHBmX3Njb3JlKSkgKw0KICBnZW9tX3BvaW50KCkgKw0KICBzdGF0X3Ntb290aChtZXRob2QgPSAibG0iLCBzZSA9IEZBTFNFKQ0KYGBgDQoNCg0KIyMjIEV4ZXJjaXNlIDcNCg0KQWxsIHRoZSBwb2ludHMgYXJlIGZvcm0gaW4gYSBsaW5lIGFuZCBoYXZlIG5vIGRpcmVjdGlvbi4gSXQgaXMgYWxsIGxpbmVhci4NCmBgYHtyfQ0KDQpnZ3Bsb3QoZGF0YSA9IG0xLCBhZXMoeCA9IC5maXR0ZWQsIHkgPSAucmVzaWQpKSArDQogIGdlb21fcG9pbnQoKSArDQogIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDAsIGxpbmV0eXBlID0gImRhc2hlZCIpICsNCiAgeGxhYigiRml0dGVkIHZhbHVlcyIpICsNCiAgeWxhYigiUmVzaWR1YWxzIikNCmBgYA0KDQojIyMgRXhlcmNpc2UgOA0KDQp5ZXMsIHRoZSByZXNpZHVhbHMgbWVldCB0aGUgY29uZGl0aW9ucy4gVGhlIGhpc3RvZ3JhbSBpcyB1bmltb2RhbCBidXQganVzdCBvbmUgYnVja2V0LCBubyBjbHVlIGlmIGl0J3MgYSBjb25jZXJuLg0KDQpgYGB7cn0NCmdncGxvdChkYXRhID0gbTEsIGFlcyh4ID0gLnJlc2lkKSkgKw0KICBnZW9tX2hpc3RvZ3JhbShiaW53aWR0aCA9IDI1KSArDQogIHhsYWIoIlJlc2lkdWFscyIpDQpnZ3Bsb3QoZGF0YSA9IG0xLCBhZXMoc2FtcGxlID0gLnJlc2lkKSkgKw0KICBzdGF0X3FxKCkNCg0KYGBgDQoNCiMjIyBFeGVyY2lzZSA5DQoNClRoZSBjb25kaXRpb25zIGFyZSBtZXQgd2l0aCBib3RoIHBsb3RzLg0KDQojIyMgQm9udXMgcXVlc3Rpb25zDQoNCjEpIExldHMgc2VlIHRoZSBjb25kaXRpb25zIHdlcmUgcGYgZXhwcmVzc2lvbnMgYXJlIGtpbGxlZC4gVGhlcmUgc2VlbXMgdG8gYmUgYSB0cmVuZCBpbiB0aGUgcG9zaXRpdmUgZGlyZWN0aW9uIG9mIHBlb3BsZSBiZWluZyBraWxsZWQgYXMgdGhlIHBmX3Njb3JlIGluY3JlYXNlZC4gSG93ZXZlciwgdGhlcmUgaXMgYSBsb3Qgb2Ygbm9pc2UgaW4gdGhlIHBsb3QuDQoNCmBgYHtyfQ0KZ2dwbG90KGRhdGEgPSBoZmksIGFlcyh4ID0gcGZfZXhwcmVzc2lvbl9raWxsZWQsIHkgPSBwZl9zY29yZSkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgc3RhdF9zbW9vdGgobWV0aG9kID0gImxtIiwgc2UgPSBGQUxTRSkNCg0KYGBgDQoNCjIpDQoNCg==