Q1: How many variables are included in the Wages and Hours dataset?

A1: 9 variables.

setwd("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Lab 2")
Warning: The working directory was changed to C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Lab 2 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
wages_hours<-read.csv("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Lab 2/Wages and Hours.csv")
str(wages_hours)
'data.frame':   39 obs. of  9 variables:
 $ HRS   : int  2157 2174 2062 2111 2134 2185 2210 2105 2267 2205 ...
 $ ERSP  : int  1121 1128 1214 1203 1013 1135 1100 1180 1298 885 ...
 $ ERNO  : int  291 301 326 49 594 287 295 310 252 264 ...
 $ NEIN  : int  380 398 185 117 730 382 474 255 431 373 ...
 $ ASSET : int  7250 7744 3068 1632 12710 7706 9338 4730 8317 6789 ...
 $ AGE   : num  38.5 39.3 40.1 22.4 57.7 38.6 39 39.9 38.9 38.8 ...
 $ DEP   : num  2.34 2.33 2.85 1.16 1.23 ...
 $ RACE  : num  32.1 31.2 NA 27.5 32.5 31.4 10.1 71.1 9.7 25.2 ...
 $ SCHOOL: num  10.5 10.5 8.9 11.5 8.8 10.7 11.2 9.3 11.1 9.5 ...

Q2: How many observations (or rows) are included in the Wages and Hours dataset?

A2: 39 observations

Q3: What is the correlation between the average highest grade of school completed and average hours worked during the year? Is this association significant? Interpret the correlation coefficient.

A3: SCHOOL and HRS have a positive correlation of 0.6656. The association is significant. The correlation coefficient means that when the value of SCHOOL or HRS increases, the value of the other variable also tends to increase.

cor.test(wages_hours$SCHOOL,wages_hours$HRS)

    Pearson's product-moment correlation

data:  wages_hours$SCHOOL and wages_hours$HRS
t = 5.4251, df = 37, p-value = 3.77e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.4431654 0.8108427
sample estimates:
      cor 
0.6656124 

Q4: Produce a plot of the average highest grade of school completed (x) and average hours worked during the year (y). Be sure to include a smoothing line.

install.packages("ggplot2")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
将程序包安装入‘C:/Users/Qiu J/Documents/R/win-library/4.1’
(因为‘lib’没有被指定)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.1/ggplot2_3.3.5.zip'
Content type 'application/zip' length 4130848 bytes (3.9 MB)
downloaded 3.9 MB
package ‘ggplot2’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Qiu J\AppData\Local\Temp\RtmpC8aHWb\downloaded_packages
library(ggplot2)
Warning: 程辑包‘ggplot2’是用R版本4.1.1 来建造的
plot <- ggplot(wages_hours,aes(x=SCHOOL,y=HRS))+geom_point()
plot + geom_smooth(method="auto",se=TRUE,fullrange=FALSE,level=0.95)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

Q5: Estimate a simple linear regression model where average hours worked during the year (y) is regressed on the highest grade of school completed (x).

HRS_SCHOOL_Mod <- lm(HRS~SCHOOL,data=wages_hours)
summary(HRS_SCHOOL_Mod)

Call:
lm(formula = HRS ~ SCHOOL, data = wages_hours)

Residuals:
   Min     1Q Median     3Q    Max 
-96.69 -31.30  10.94  27.89 115.78 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1763.811     69.295  25.454  < 2e-16 ***
SCHOOL        37.367      6.888   5.425 3.77e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 48.43 on 37 degrees of freedom
Multiple R-squared:  0.443, Adjusted R-squared:  0.428 
F-statistic: 29.43 on 1 and 37 DF,  p-value: 3.77e-06
LS0tDQp0aXRsZTogIkxhYiBBc3NpZ25tZW50IDIiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpRMTogSG93IG1hbnkgdmFyaWFibGVzIGFyZSBpbmNsdWRlZCBpbiB0aGUgV2FnZXMgYW5kIEhvdXJzIGRhdGFzZXQ/DQoNCkExOiA5IHZhcmlhYmxlcy4NCmBgYHtyfQ0Kc2V0d2QoIkM6L1VzZXJzL1FpdSBKL0Rlc2t0b3AvTVNTUCtEQSAyMDIxRkFMTC9NU1NQIDg5Ny0wMDIgQXBwbGllZCBMaW5lYXIgTW9kZWxpbmcvTGFiIDIiKQ0Kd2FnZXNfaG91cnM8LXJlYWQuY3N2KCJDOi9Vc2Vycy9RaXUgSi9EZXNrdG9wL01TU1ArREEgMjAyMUZBTEwvTVNTUCA4OTctMDAyIEFwcGxpZWQgTGluZWFyIE1vZGVsaW5nL0xhYiAyL1dhZ2VzIGFuZCBIb3Vycy5jc3YiKQ0Kc3RyKHdhZ2VzX2hvdXJzKQ0KYGBgDQpRMjogSG93IG1hbnkgb2JzZXJ2YXRpb25zIChvciByb3dzKSBhcmUgaW5jbHVkZWQgaW4gdGhlIFdhZ2VzIGFuZCBIb3VycyBkYXRhc2V0Pw0KDQpBMjogMzkgb2JzZXJ2YXRpb25zDQoNClEzOiBXaGF0IGlzIHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuIHRoZSBhdmVyYWdlIGhpZ2hlc3QgZ3JhZGUgb2Ygc2Nob29sIGNvbXBsZXRlZCBhbmQgYXZlcmFnZSBob3VycyB3b3JrZWQgZHVyaW5nIHRoZSB5ZWFyPyBJcyB0aGlzIGFzc29jaWF0aW9uIHNpZ25pZmljYW50PyBJbnRlcnByZXQgdGhlIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50Lg0KDQpBMzogU0NIT09MIGFuZCBIUlMgaGF2ZSBhIHBvc2l0aXZlIGNvcnJlbGF0aW9uIG9mIDAuNjY1Ni4gVGhlIGFzc29jaWF0aW9uIGlzIHNpZ25pZmljYW50LiBUaGUgY29ycmVsYXRpb24gY29lZmZpY2llbnQgbWVhbnMgdGhhdCB3aGVuIHRoZSB2YWx1ZSBvZiBTQ0hPT0wgb3IgSFJTIGluY3JlYXNlcywgdGhlIHZhbHVlIG9mIHRoZSBvdGhlciB2YXJpYWJsZSBhbHNvIHRlbmRzIHRvIGluY3JlYXNlLg0KYGBge3J9DQpjb3IudGVzdCh3YWdlc19ob3VycyRTQ0hPT0wsd2FnZXNfaG91cnMkSFJTKQ0KYGBgDQoNClE0OiBQcm9kdWNlIGEgcGxvdCBvZiB0aGUgYXZlcmFnZSBoaWdoZXN0IGdyYWRlIG9mIHNjaG9vbCBjb21wbGV0ZWQgKHgpIGFuZCBhdmVyYWdlIGhvdXJzIHdvcmtlZCBkdXJpbmcgdGhlIHllYXIgKHkpLiBCZSBzdXJlIHRvIGluY2x1ZGUgYSBzbW9vdGhpbmcgbGluZS4NCmBgYHtyfQ0KaW5zdGFsbC5wYWNrYWdlcygiZ2dwbG90MiIpDQpsaWJyYXJ5KGdncGxvdDIpDQpwbG90IDwtIGdncGxvdCh3YWdlc19ob3VycyxhZXMoeD1TQ0hPT0wseT1IUlMpKStnZW9tX3BvaW50KCkNCnBsb3QgKyBnZW9tX3Ntb290aChtZXRob2Q9ImF1dG8iLHNlPVRSVUUsZnVsbHJhbmdlPUZBTFNFLGxldmVsPTAuOTUpDQpgYGANCg0KUTU6IEVzdGltYXRlIGEgc2ltcGxlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIHdoZXJlIGF2ZXJhZ2UgaG91cnMgd29ya2VkIGR1cmluZyB0aGUgeWVhciAoeSkgaXMgcmVncmVzc2VkIG9uIHRoZSBoaWdoZXN0IGdyYWRlIG9mIHNjaG9vbCBjb21wbGV0ZWQgKHgpLg0KYGBge3J9DQpIUlNfU0NIT09MX01vZCA8LSBsbShIUlN+U0NIT09MLGRhdGE9d2FnZXNfaG91cnMpDQpzdW1tYXJ5KEhSU19TQ0hPT0xfTW9kKQ0KYGBgDQoNCg==