Q1: How many variables are included in the Wages and Hours dataset?
A1: 9 variables.
setwd("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Lab 2")
Warning: The working directory was changed to C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Lab 2 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
wages_hours<-read.csv("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Lab 2/Wages and Hours.csv")
str(wages_hours)
'data.frame': 39 obs. of 9 variables:
$ HRS : int 2157 2174 2062 2111 2134 2185 2210 2105 2267 2205 ...
$ ERSP : int 1121 1128 1214 1203 1013 1135 1100 1180 1298 885 ...
$ ERNO : int 291 301 326 49 594 287 295 310 252 264 ...
$ NEIN : int 380 398 185 117 730 382 474 255 431 373 ...
$ ASSET : int 7250 7744 3068 1632 12710 7706 9338 4730 8317 6789 ...
$ AGE : num 38.5 39.3 40.1 22.4 57.7 38.6 39 39.9 38.9 38.8 ...
$ DEP : num 2.34 2.33 2.85 1.16 1.23 ...
$ RACE : num 32.1 31.2 NA 27.5 32.5 31.4 10.1 71.1 9.7 25.2 ...
$ SCHOOL: num 10.5 10.5 8.9 11.5 8.8 10.7 11.2 9.3 11.1 9.5 ...
Q2: How many observations (or rows) are included in the Wages and Hours dataset?
A2: 39 observations
Q3: What is the correlation between the average highest grade of school completed and average hours worked during the year? Is this association significant? Interpret the correlation coefficient.
A3: SCHOOL and HRS have a positive correlation of 0.6656. The association is significant. The correlation coefficient means that when the value of SCHOOL or HRS increases, the value of the other variable also tends to increase.
cor.test(wages_hours$SCHOOL,wages_hours$HRS)
Pearson's product-moment correlation
data: wages_hours$SCHOOL and wages_hours$HRS
t = 5.4251, df = 37, p-value = 3.77e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4431654 0.8108427
sample estimates:
cor
0.6656124
Q4: Produce a plot of the average highest grade of school completed (x) and average hours worked during the year (y). Be sure to include a smoothing line.
install.packages("ggplot2")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
将程序包安装入‘C:/Users/Qiu J/Documents/R/win-library/4.1’
(因为‘lib’没有被指定)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.1/ggplot2_3.3.5.zip'
Content type 'application/zip' length 4130848 bytes (3.9 MB)
downloaded 3.9 MB
package ‘ggplot2’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Qiu J\AppData\Local\Temp\RtmpC8aHWb\downloaded_packages
library(ggplot2)
Warning: 程辑包‘ggplot2’是用R版本4.1.1 来建造的
plot <- ggplot(wages_hours,aes(x=SCHOOL,y=HRS))+geom_point()
plot + geom_smooth(method="auto",se=TRUE,fullrange=FALSE,level=0.95)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
Q5: Estimate a simple linear regression model where average hours worked during the year (y) is regressed on the highest grade of school completed (x).
HRS_SCHOOL_Mod <- lm(HRS~SCHOOL,data=wages_hours)
summary(HRS_SCHOOL_Mod)
Call:
lm(formula = HRS ~ SCHOOL, data = wages_hours)
Residuals:
Min 1Q Median 3Q Max
-96.69 -31.30 10.94 27.89 115.78
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1763.811 69.295 25.454 < 2e-16 ***
SCHOOL 37.367 6.888 5.425 3.77e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 48.43 on 37 degrees of freedom
Multiple R-squared: 0.443, Adjusted R-squared: 0.428
F-statistic: 29.43 on 1 and 37 DF, p-value: 3.77e-06