In this project we were asked to create the best linear model for sleep75 data. We focused as in the task on correlation between hour average wage and minutes slept with or without naps. We also checked how the variables are depended on sex and age. First we wanted to see our data in general. We used kableExtra library to beautify our tables :)
data<-sleep75[1:6,c(16,33,21,22,4,5,7,1)]
data<-na.omit(data)
data %>%
mutate(sleepInHours = sleep /60) %>%
relocate(sleepInHours,.before=sleep)%>%
relocate(sleep,.after=age)%>%
mutate(sleepNapsInHours = slpnaps /60) %>%
relocate(sleepNapsInHours,.after=sleepInHours)%>%
relocate(slpnaps,.after=sleep)%>%
relocate(age,.after=sleepNapsInHours)%>%
kbl()%>%
kable_classic("hover",full_width=F) %>%
kable_material_dark()%>%
row_spec(0,angle=-20)%>%
add_header_above(c(" ","Data set"=9),bold = TRUE)
|
Data set
|
|||||||||
|---|---|---|---|---|---|---|---|---|---|
| male | hrwage | sleepInHours | sleepNapsInHours | age | clerical | construc | earns74 | sleep | slpnaps |
| 1 | 7.070004 | 51.88333 | 52.71667 | 32 | 0 | 0 | 0 | 3113 | 3163 |
| 1 | 1.429999 | 48.66667 | 48.66667 | 31 | 0 | 0 | 9500 | 2920 | 2920 |
| 1 | 20.529997 | 44.50000 | 46.00000 | 44 | 0 | 0 | 42500 | 2670 | 2760 |
| 0 | 9.619998 | 51.38333 | 51.38333 | 30 | 0 | 0 | 42500 | 3083 | 3083 |
| 1 | 2.750000 | 57.46667 | 58.21667 | 64 | 0 | 0 | 2500 | 3448 | 3493 |
| 1 | 19.249998 | 67.71667 | 67.96667 | 41 | 0 | 0 | 0 | 4063 | 4078 |
We created two new columns with time in hours, because it is more natural. Now when we have a glimpse of our data, we could also use some information about general knowledge about our data like minimum and maximum value, mean, median and quartiles. For that we used summary() function.
First, we can conclude that mean value of male variable is 0.83 which means in our model we will have more man than woman. We know that people in a sample data are between 30 and 64 years old. The minimum sleep they get is 2670 minutes and maximum is 4063 hours.The hour average wage is between 1,430 and 20,530. The mean is 10,108, which means we have normal distribution here.
summary(data)
## male hrwage sleep slpnaps clerical
## Min. :0.0000 Min. : 1.430 Min. :2670 Min. :2760 Min. :0
## 1st Qu.:1.0000 1st Qu.: 3.830 1st Qu.:2961 1st Qu.:2961 1st Qu.:0
## Median :1.0000 Median : 8.345 Median :3098 Median :3123 Median :0
## Mean :0.8333 Mean :10.108 Mean :3216 Mean :3250 Mean :0
## 3rd Qu.:1.0000 3rd Qu.:16.842 3rd Qu.:3364 3rd Qu.:3410 3rd Qu.:0
## Max. :1.0000 Max. :20.530 Max. :4063 Max. :4078 Max. :0
## construc earns74 age
## Min. :0 Min. : 0 Min. :30.00
## 1st Qu.:0 1st Qu.: 625 1st Qu.:31.25
## Median :0 Median : 6000 Median :36.50
## Mean :0 Mean :16167 Mean :40.33
## 3rd Qu.:0 3rd Qu.:34250 3rd Qu.:43.25
## Max. :0 Max. :42500 Max. :64.00
Letâs try to create a simple plot for our data with time for sleep with naps and the connection between hour average wage. We can see here that our plot is condensed. There is many people that sleep between 3000 and 4000 minutes and get paid very low average wage. Letâs compare it with the plot for time of sleep without naps.
ggplot(data=sleep75, aes(slpnaps,hrwage))+geom_jitter(alpha = 0.5)+geom_smooth(method="lm", se=TRUE,color="deeppink4")+xlab("Sleep with naps")+ylab("Hour wage")+labs(title="Linear model for sleep with naps and hour wage")+theme(plot.title = element_text(size = 15, color = "deeppink4"))
From this plot we can conclude that the vast majority of man get bigger
payment. Woman have smaller hour avarage wage. We can see that while man
and woman sleep aproximately the same, man still have bigger wage.
model_1 <- lm(slpnaps ~ hrwage + male,data = data)
library(ggiraphExtra)
sex<-factor(sleep75$male)
ggplot(data=sleep75, aes(hrwage,slpnaps,color=sex))+geom_point()+scale_shape_discrete(labels=c("male","female"))+geom_smooth(method="lm")
model_1 <- lm(slpnaps ~ hrwage + age,data = data)
library(ggiraphExtra)
ggplot(data=sleep75, aes(hrwage,slpnaps,color=age))+geom_point()+geom_smooth(method="lm")+scale_x_sqrt()
From the second plot we can say that there is no huge correlation between age and sleep with naps and hour wage. It is in average the same.