library(tidyverse)
library(openintro)
library(tidyverse)
library(openintro)
data('hfi', package='openintro')
Exercise 1
There is 123 columns in this dataset with glimpse(hfi). (note: trust me)
Exercise 2
I would use a scatter plot to show the relationship between two or more numeric values. For the example,Pf_score has a positive linear increase with the Pf expression control
ggplot(hfi, aes(x=pf_expression_control, y=pf_score)) + geom_point()+theme_classic()

Exercise 3
There is a strong positive relationship between the pf score and the pf expression control. As pf score increase, the pf expression control increased as well. There are a few outliers between 0.0 and 2.5, but unsure of the circumstances of those areas.
Exercise 4
the line y=.4914x+ 4.6171 is the best fit for the line. The smallest sum of squares was 952.153
Exercise 5
y=5.153687+0.349862*hf_score
m1 <- lm(hf_score ~ pf_expression_control, data = hfi)
summary(m1)
##
## Call:
## lm(formula = hf_score ~ pf_expression_control, data = hfi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6198 -0.4908 0.1031 0.4703 2.2933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.153687 0.046070 111.87 <2e-16 ***
## pf_expression_control 0.349862 0.008067 43.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.667 on 1376 degrees of freedom
## (80 observations deleted due to missingness)
## Multiple R-squared: 0.5775, Adjusted R-squared: 0.5772
## F-statistic: 1881 on 1 and 1376 DF, p-value: < 2.2e-16
Exercise 6
They might guess the 4.6 on the pf expression control if they based it on the least squares regression line. There a point where it can be 4.6-4.606060= the residual of the
ggplot(data = hfi, aes(x = pf_expression_control, y = pf_score)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

Exercise 7
All the points are form in a line and have no direction. It is all linear.
ggplot(data = m1, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted values") +
ylab("Residuals")

Exercise 8
yes, the residuals meet the conditions. The histogram is unimodal but just one bucket, no clue if it’s a concern.
ggplot(data = m1, aes(x = .resid)) +
geom_histogram(binwidth = 25) +
xlab("Residuals")

ggplot(data = m1, aes(sample = .resid)) +
stat_qq()

Exercise 9
The conditions are met with both plots.
Bonus questions
- Lets see the conditions were pf expressions are killed. There seems to be a trend in the positive direction of people being killed as the pf_score increased. However, there is a lot of noise in the plot.
ggplot(data = hfi, aes(x = pf_expression_killed, y = pf_score)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

LS0tDQp0aXRsZTogIkxhYiA4ICINCmF1dGhvcjogIlZ5YW5uYSBIaWxsIg0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQpgYGB7ciBsb2FkLXBhY2thZ2VzLCBtZXNzYWdlPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KG9wZW5pbnRybykNCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQpkYXRhKCdoZmknLCBwYWNrYWdlPSdvcGVuaW50cm8nKQ0KYGBgDQoNCiMjIyBFeGVyY2lzZSAxDQoNClRoZXJlIGlzIDEyMyBjb2x1bW5zIGluIHRoaXMgZGF0YXNldCB3aXRoIGdsaW1wc2UoaGZpKS4gKG5vdGU6IHRydXN0IG1lKQ0KDQojIyMgRXhlcmNpc2UgMg0KDQpJIHdvdWxkIHVzZSBhIHNjYXR0ZXIgcGxvdCB0byBzaG93IHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiB0d28gb3IgbW9yZSBudW1lcmljIHZhbHVlcy4gRm9yIHRoZSBleGFtcGxlLFBmX3Njb3JlIGhhcyBhIHBvc2l0aXZlIGxpbmVhciBpbmNyZWFzZSB3aXRoIHRoZSBQZiBleHByZXNzaW9uIGNvbnRyb2wNCg0KYGBge3Igd2FybmluZz1GQUxTRX0NCmdncGxvdChoZmksIGFlcyh4PXBmX2V4cHJlc3Npb25fY29udHJvbCwgeT1wZl9zY29yZSkpICsgZ2VvbV9wb2ludCgpK3RoZW1lX2NsYXNzaWMoKQ0KDQpgYGANCg0KIyMjIEV4ZXJjaXNlIDMNCg0KVGhlcmUgaXMgYSBzdHJvbmcgcG9zaXRpdmUgcmVsYXRpb25zaGlwIGJldHdlZW4gdGhlIHBmIHNjb3JlIGFuZCB0aGUgcGYgZXhwcmVzc2lvbiBjb250cm9sLiBBcyBwZiBzY29yZSBpbmNyZWFzZSwgdGhlIHBmIGV4cHJlc3Npb24gY29udHJvbCBpbmNyZWFzZWQgYXMgd2VsbC4gVGhlcmUgYXJlIGEgZmV3IG91dGxpZXJzIGJldHdlZW4gMC4wIGFuZCAyLjUsIGJ1dCB1bnN1cmUgb2YgdGhlIGNpcmN1bXN0YW5jZXMgb2YgdGhvc2UgYXJlYXMuDQoNCiMjIyBFeGVyY2lzZSA0DQoNCnRoZSBsaW5lIHk9LjQ5MTR4KyA0LjYxNzEgaXMgdGhlIGJlc3QgZml0IGZvciB0aGUgbGluZS4gVGhlIHNtYWxsZXN0IHN1bSBvZiBzcXVhcmVzIHdhcyA5NTIuMTUzDQoNCiMjIyBFeGVyY2lzZSA1DQoNCnk9NS4xNTM2ODcrMC4zNDk4NjIqaGZfc2NvcmUNCg0KYGBge3J9DQptMSA8LSBsbShoZl9zY29yZSB+IHBmX2V4cHJlc3Npb25fY29udHJvbCwgZGF0YSA9IGhmaSkNCnN1bW1hcnkobTEpDQpgYGANCg0KIyMjIEV4ZXJjaXNlIDYNCg0KVGhleSBtaWdodCBndWVzcyB0aGUgNC42IG9uIHRoZSBwZiBleHByZXNzaW9uIGNvbnRyb2wgaWYgdGhleSBiYXNlZCBpdCBvbiB0aGUgbGVhc3Qgc3F1YXJlcyByZWdyZXNzaW9uIGxpbmUuIFRoZXJlIGEgcG9pbnQgd2hlcmUgaXQgY2FuIGJlIDQuNi00LjYwNjA2MD0gdGhlIHJlc2lkdWFsIG9mIHRoZSANCg0KYGBge3Igd2FybmluZz1GQUxTRX0NCmdncGxvdChkYXRhID0gaGZpLCBhZXMoeCA9IHBmX2V4cHJlc3Npb25fY29udHJvbCwgeSA9IHBmX3Njb3JlKSkgKw0KICBnZW9tX3BvaW50KCkgKw0KICBzdGF0X3Ntb290aChtZXRob2QgPSAibG0iLCBzZSA9IEZBTFNFKQ0KYGBgDQoNCg0KIyMjIEV4ZXJjaXNlIDcNCg0KQWxsIHRoZSBwb2ludHMgYXJlIGZvcm0gaW4gYSBsaW5lIGFuZCBoYXZlIG5vIGRpcmVjdGlvbi4gSXQgaXMgYWxsIGxpbmVhci4NCmBgYHtyfQ0KDQpnZ3Bsb3QoZGF0YSA9IG0xLCBhZXMoeCA9IC5maXR0ZWQsIHkgPSAucmVzaWQpKSArDQogIGdlb21fcG9pbnQoKSArDQogIGdlb21faGxpbmUoeWludGVyY2VwdCA9IDAsIGxpbmV0eXBlID0gImRhc2hlZCIpICsNCiAgeGxhYigiRml0dGVkIHZhbHVlcyIpICsNCiAgeWxhYigiUmVzaWR1YWxzIikNCmBgYA0KDQojIyMgRXhlcmNpc2UgOA0KDQp5ZXMsIHRoZSByZXNpZHVhbHMgbWVldCB0aGUgY29uZGl0aW9ucy4gVGhlIGhpc3RvZ3JhbSBpcyB1bmltb2RhbCBidXQganVzdCBvbmUgYnVja2V0LCBubyBjbHVlIGlmIGl0J3MgYSBjb25jZXJuLg0KDQpgYGB7cn0NCmdncGxvdChkYXRhID0gbTEsIGFlcyh4ID0gLnJlc2lkKSkgKw0KICBnZW9tX2hpc3RvZ3JhbShiaW53aWR0aCA9IDI1KSArDQogIHhsYWIoIlJlc2lkdWFscyIpDQpnZ3Bsb3QoZGF0YSA9IG0xLCBhZXMoc2FtcGxlID0gLnJlc2lkKSkgKw0KICBzdGF0X3FxKCkNCg0KYGBgDQoNCiMjIyBFeGVyY2lzZSA5DQoNClRoZSBjb25kaXRpb25zIGFyZSBtZXQgd2l0aCBib3RoIHBsb3RzLg0KDQojIyMgQm9udXMgcXVlc3Rpb25zDQoNCjEpIExldHMgc2VlIHRoZSBjb25kaXRpb25zIHdlcmUgcGYgZXhwcmVzc2lvbnMgYXJlIGtpbGxlZC4gVGhlcmUgc2VlbXMgdG8gYmUgYSB0cmVuZCBpbiB0aGUgcG9zaXRpdmUgZGlyZWN0aW9uIG9mIHBlb3BsZSBiZWluZyBraWxsZWQgYXMgdGhlIHBmX3Njb3JlIGluY3JlYXNlZC4gSG93ZXZlciwgdGhlcmUgaXMgYSBsb3Qgb2Ygbm9pc2UgaW4gdGhlIHBsb3QuDQoNCmBgYHtyfQ0KZ2dwbG90KGRhdGEgPSBoZmksIGFlcyh4ID0gcGZfZXhwcmVzc2lvbl9raWxsZWQsIHkgPSBwZl9zY29yZSkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgc3RhdF9zbW9vdGgobWV0aG9kID0gImxtIiwgc2UgPSBGQUxTRSkNCg0KYGBgDQoNCjIpDQoNCg==