1 The Data

Propose

This time i want to research regarding hapiness at Y work in some company in Indonesia, i want to know the actual condition before i sign out from there.

The data

I want to know more the process if i be a researcher and want to execute some case if one day i got this role. n of data, i got 67 responses Google Form. I use Maslow Pyramid to get the variables with X is the factors and Y is for The Office. For details of the form, you can check it https://forms.gle/ZYfm1vhoEXvD4X9i8.

knitr::include_graphics("Assets/download.jpg")

1.1 Library

First, i will attach the library, the purpose to make this not hazzle

library(dplyr)
library(GGally)
library(MLmetrics)
library(lmtest)
library(car)

1.2 Sample



\[\Large n = \frac{N}{1+N.e²}\]

Info:
n: Total Sample
N: Total Population
e: Toleration Sample (0,05/5%)

FYI: Y company has total 80 members, based on Slovin methode, i will count how many min sample

N <- 80
e <- 0.05*0.05
Ne <- e*80
y <- 1+Ne
Sample <- N/y
Sample
## [1] 66.66667

Min sample 66.66667 ~ 67 sample

1.3 Read Data

work <- read.csv("Data/Happiness at Work (Responses) - Form Responses 1.csv")
work

legend:
- Timestamp: date and time got the response.
- X1: Self Actualisation, people are self-aware, concerned with personal growth, less concerned with the opinions of others, and interested in fulfilling their potential..
- x2: Esteem Needs, People need to sense that they are valued and by others and feel that they are making a contribution to the world.
- x3: Belongings & Love Needs, At this level, the need for emotional relationships drives human behavior. Some of the things that satisfy this need include, Friendships, Romantic attachments, Family, Social groups, community groups, Churches and religious organizations.
- x4: Safety Needs, at this level, the needs for security and safety become primary, Financial security, Health and wellness, Safety against accidents and injury.
- x5: Physiological Needs, the basic physiological needs are probably fairly apparent—these include the things that are vital to our survival. Some examples of physiological needs include, food, water, breathing, homeostasis.

1.4 Inspects the data

glimpse(work)
## Rows: 67
## Columns: 6
## $ Timestamp <chr> "4/17/2021 19:52:53", "4/17/2021 19:53:01", "4/17/2021 20:09~
## $ X1        <int> 4, 3, 4, 3, 4, 4, 5, 5, 5, 2, 2, 3, 2, 3, 5, 2, 3, 5, 3, 4, ~
## $ X2        <int> 5, 4, 2, 3, 5, 3, 5, 5, 4, 1, 1, 3, 3, 3, 4, 2, 3, 5, 3, 4, ~
## $ X3        <int> 5, 5, 2, 5, 5, 4, 5, 5, 4, 2, 2, 3, 4, 4, 5, 3, 4, 5, 4, 4, ~
## $ X4        <int> 5, 3, 4, 4, 4, 3, 5, 5, 4, 1, 2, 4, 3, 3, 5, 2, 4, 4, 4, 4, ~
## $ X5        <int> 4, 4, 3, 4, 4, 3, 4, 5, 4, 5, 2, 4, 2, 3, 5, 1, 4, 4, 4, 4, ~

1.5 Check the blank

anyNA(work)
## [1] FALSE

Means no N/A, great!

1.6 Subsetting data

w1 <- work[,-c(1)]

1.7 Check Outliers

boxplot(w1)

insight: No outliers

2 Corellation

ggcorr(w1, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)

Through this we got insight:
1. Column Timestamp automatically not in to this function because we cant count timestamp.
2. All variables has positive corellation.
3. X2 the highest variable who has strong corellation with X5.

3 Regression Models

w2 <- lm(X5 ~ X2, w1)
w2
## 
## Call:
## lm(formula = X5 ~ X2, data = w1)
## 
## Coefficients:
## (Intercept)           X2  
##      0.2616       0.8637
summary(w2)
## 
## Call:
## lm(formula = X5 ~ X2, data = w1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7165 -0.6484  0.1472  0.4197  3.8747 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.2616     0.3025   0.865     0.39    
## X2            0.8637     0.0924   9.348 1.21e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9646 on 65 degrees of freedom
## Multiple R-squared:  0.5734, Adjusted R-squared:  0.5669 
## F-statistic: 87.38 on 1 and 65 DF,  p-value: 1.215e-13

Insight: From this, we got Multiple R-squared: 0.5734, Adjusted R-squared: 0.5669.

4 Plot

plot(w1$X2, w1$X5)

5 Variable Predictor (Stepwise Regression - Backward)

w3 <- lm(X5 ~ ., w1)
step(w3, direction = "backward")
## Start:  AIC=-14.06
## X5 ~ X1 + X2 + X3 + X4
## 
##        Df Sum of Sq    RSS     AIC
## - X3    1    0.9482 47.738 -14.711
## <none>              46.789 -14.055
## - X1    1    2.4260 49.216 -12.668
## - X2    1    3.1145 49.904 -11.738
## - X4    1    8.4731 55.263  -4.904
## 
## Step:  AIC=-14.71
## X5 ~ X1 + X2 + X4
## 
##        Df Sum of Sq    RSS      AIC
## <none>              47.738 -14.7111
## - X1    1    1.7744 49.512 -14.2658
## - X2    1    2.5485 50.286 -13.2265
## - X4    1    8.4559 56.194  -5.7846
## 
## Call:
## lm(formula = X5 ~ X1 + X2 + X4, data = w1)
## 
## Coefficients:
## (Intercept)           X1           X2           X4  
##    -0.02764      0.21538      0.30314      0.43481
summary(lm(formula = X5 ~ X1 + X2 + X3 + X4, data = w1))
## 
## Call:
## lm(formula = X5 ~ X1 + X2 + X3 + X4, data = w1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6119 -0.4532  0.0377  0.2674  3.8156 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.1155     0.3165   0.365  0.71653   
## X1            0.2632     0.1468   1.793  0.07786 . 
## X2            0.3427     0.1687   2.031  0.04650 * 
## X3           -0.1177     0.1050  -1.121  0.26664   
## X4            0.4353     0.1299   3.351  0.00138 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8687 on 62 degrees of freedom
## Multiple R-squared:   0.67,  Adjusted R-squared:  0.6487 
## F-statistic: 31.47 on 4 and 62 DF,  p-value: 2.578e-14

Model:
- W3: X5 = 0.1155 + 0.2632(X1) + 0.3427(X2) + -0.1177(X3) + 0.4353(X4)
- W2: x5 = 0.2616 + 0.8637(X2)

6 Model and Error

First, we use variable X3 to predict and compare the actuall data

W2

predict(w2, data.frame(X2 = 5), interval = "confidence", level = 0.95)
##        fit      lwr      upr
## 1 4.580252 4.144836 5.015668

W3

predict(w3, data.frame(X1 = 5, X2 = 5, X3 = 5, X4 = 5), interval = "confidence", level = 0.95)
##       fit      lwr      upr
## 1 4.73258 4.332409 5.132751

Error w2

sqrt((5-4.580252)^2)
## [1] 0.419748

Error W3

sqrt((5-4.73258)^2)
## [1] 0.26742

Insight: Error from w3 is smaller than w2 and data from w3 is better than w2

7 Evaluation Models

Normality

hist(w2$residuals, breaks = 20)

shapiro.test(w2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  w2$residuals
## W = 0.92556, p-value = 0.0006304
hist(w3$residuals, breaks = 20)

shapiro.test(w3$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  w3$residuals
## W = 0.88952, p-value = 2.239e-05

insight: data from w2 have p-value under < 0,05, for w3 > 0,05, means the data from w3 is spread well, H0 for w3 but H1 for w2.

8 Heteroscedasticity

plot(w1$X2, w3$residuals)
abline(h = 0, col = "red")

bptest(w3)
## 
##  studentized Breusch-Pagan test
## 
## data:  w3
## BP = 3.2703, df = 4, p-value = 0.5137
plot(w1$X2, w2$residuals)
abline(h = 0, col = "red")

bptest(w2)
## 
##  studentized Breusch-Pagan test
## 
## data:  w2
## BP = 1.5997, df = 1, p-value = 0.206

Insight: Both of data not have Heteroscedasticity because P-Value > 0,05

9 Variance Inflation Factor (Multicollinearity)

vif(w3)
##       X1       X2       X3       X4 
## 3.268742 4.109520 1.821732 2.949984

*w2 cant test with VIF because they only have 1 variable.

x < 10, not found multicollinearity between the variables.

10 Conclusion

  • Min sample 66.66667 ~ 67 sample
  • Means no N/A, great!
  • insight: No outliers
  • Through this we got insight: Column Timestamp automatically not in to this function because we cant count timestamp, All variables has positive corellation, X2 the highest variable who has strong corellation with X5.
  • Insight: From this, we got Multiple R-squared: 0.5734, Adjusted R-squared: 0.5669.
  • Model: W3: X5 = 0.1155 + 0.2632(X1) + 0.3427(X2) + -0.1177(X3) + 0.4353(X4) ; W2: x5 = 0.2616 + 0.8637(X2)
  • Insight: Error from w3 is smaller than w2 and data from w3 is better than w2
  • insight: data from w2 have p-value under < 0,05, for w3 > 0,05, means the data from w3 is spread well, H0 for w3 but H1 for w2.
  • Insight: Both of data not have Heteroscedasticity because P-Value > 0,05

Minimal sample i must count is 67 sample, and no N/A, no outliers. The process is, first i drop the timestamp column, and find out about the correlation. I got X2 have the highest variable who has strong corellation with X5 and 2 models X5 = 0.1155 + 0.2632(X1) + 0.3427(X2) + -0.1177(X3) + 0.4353(X4) & x5 = 0.2616 + 0.8637(X2). Error from w3 is smaller than w2 and data from w3 is better than w2, no heteroscedasticity and H0 for W3 and H1 for W2.