Propose
This time i want to research regarding hapiness at Y work in some company in Indonesia, i want to know the actual condition before i sign out from there.
The data
I want to know more the process if i be a researcher and want to execute some case if one day i got this role. n of data, i got 67 responses Google Form. I use Maslow Pyramid to get the variables with X is the factors and Y is for The Office. For details of the form, you can check it https://forms.gle/ZYfm1vhoEXvD4X9i8.
knitr::include_graphics("Assets/download.jpg")
First, i will attach the library, the purpose to make this not hazzle
library(dplyr)
library(GGally)
library(MLmetrics)
library(lmtest)
library(car)
\[\Large n = \frac{N}{1+N.e²}\]
Info:
n: Total Sample
N: Total Population
e: Toleration Sample (0,05/5%)
FYI: Y company has total 80 members, based on Slovin methode, i will count how many min sample
N <- 80
e <- 0.05*0.05
Ne <- e*80
y <- 1+Ne
Sample <- N/y
Sample
## [1] 66.66667
Min sample 66.66667 ~ 67 sample
work <- read.csv("Data/Happiness at Work (Responses) - Form Responses 1.csv")
work
legend:
- Timestamp: date and time got the response.
- X1: Self Actualisation, people are self-aware, concerned with personal growth, less concerned with the opinions of others, and interested in fulfilling their potential..
- x2: Esteem Needs, People need to sense that they are valued and by others and feel that they are making a contribution to the world.
- x3: Belongings & Love Needs, At this level, the need for emotional relationships drives human behavior. Some of the things that satisfy this need include, Friendships, Romantic attachments, Family, Social groups, community groups, Churches and religious organizations.
- x4: Safety Needs, at this level, the needs for security and safety become primary, Financial security, Health and wellness, Safety against accidents and injury.
- x5: Physiological Needs, the basic physiological needs are probably fairly apparent—these include the things that are vital to our survival. Some examples of physiological needs include, food, water, breathing, homeostasis.
glimpse(work)
## Rows: 67
## Columns: 6
## $ Timestamp <chr> "4/17/2021 19:52:53", "4/17/2021 19:53:01", "4/17/2021 20:09~
## $ X1 <int> 4, 3, 4, 3, 4, 4, 5, 5, 5, 2, 2, 3, 2, 3, 5, 2, 3, 5, 3, 4, ~
## $ X2 <int> 5, 4, 2, 3, 5, 3, 5, 5, 4, 1, 1, 3, 3, 3, 4, 2, 3, 5, 3, 4, ~
## $ X3 <int> 5, 5, 2, 5, 5, 4, 5, 5, 4, 2, 2, 3, 4, 4, 5, 3, 4, 5, 4, 4, ~
## $ X4 <int> 5, 3, 4, 4, 4, 3, 5, 5, 4, 1, 2, 4, 3, 3, 5, 2, 4, 4, 4, 4, ~
## $ X5 <int> 4, 4, 3, 4, 4, 3, 4, 5, 4, 5, 2, 4, 2, 3, 5, 1, 4, 4, 4, 4, ~
anyNA(work)
## [1] FALSE
Means no N/A, great!
w1 <- work[,-c(1)]
boxplot(w1)
insight: No outliers
ggcorr(w1, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)
Through this we got insight:
1. Column Timestamp automatically not in to this function because we cant count timestamp.
2. All variables has positive corellation.
3. X2 the highest variable who has strong corellation with X5.
w2 <- lm(X5 ~ X2, w1)
w2
##
## Call:
## lm(formula = X5 ~ X2, data = w1)
##
## Coefficients:
## (Intercept) X2
## 0.2616 0.8637
summary(w2)
##
## Call:
## lm(formula = X5 ~ X2, data = w1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7165 -0.6484 0.1472 0.4197 3.8747
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2616 0.3025 0.865 0.39
## X2 0.8637 0.0924 9.348 1.21e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9646 on 65 degrees of freedom
## Multiple R-squared: 0.5734, Adjusted R-squared: 0.5669
## F-statistic: 87.38 on 1 and 65 DF, p-value: 1.215e-13
Insight: From this, we got Multiple R-squared: 0.5734, Adjusted R-squared: 0.5669.
plot(w1$X2, w1$X5)
w3 <- lm(X5 ~ ., w1)
step(w3, direction = "backward")
## Start: AIC=-14.06
## X5 ~ X1 + X2 + X3 + X4
##
## Df Sum of Sq RSS AIC
## - X3 1 0.9482 47.738 -14.711
## <none> 46.789 -14.055
## - X1 1 2.4260 49.216 -12.668
## - X2 1 3.1145 49.904 -11.738
## - X4 1 8.4731 55.263 -4.904
##
## Step: AIC=-14.71
## X5 ~ X1 + X2 + X4
##
## Df Sum of Sq RSS AIC
## <none> 47.738 -14.7111
## - X1 1 1.7744 49.512 -14.2658
## - X2 1 2.5485 50.286 -13.2265
## - X4 1 8.4559 56.194 -5.7846
##
## Call:
## lm(formula = X5 ~ X1 + X2 + X4, data = w1)
##
## Coefficients:
## (Intercept) X1 X2 X4
## -0.02764 0.21538 0.30314 0.43481
summary(lm(formula = X5 ~ X1 + X2 + X3 + X4, data = w1))
##
## Call:
## lm(formula = X5 ~ X1 + X2 + X3 + X4, data = w1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6119 -0.4532 0.0377 0.2674 3.8156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1155 0.3165 0.365 0.71653
## X1 0.2632 0.1468 1.793 0.07786 .
## X2 0.3427 0.1687 2.031 0.04650 *
## X3 -0.1177 0.1050 -1.121 0.26664
## X4 0.4353 0.1299 3.351 0.00138 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8687 on 62 degrees of freedom
## Multiple R-squared: 0.67, Adjusted R-squared: 0.6487
## F-statistic: 31.47 on 4 and 62 DF, p-value: 2.578e-14
Model:
- W3: X5 = 0.1155 + 0.2632(X1) + 0.3427(X2) + -0.1177(X3) + 0.4353(X4)
- W2: x5 = 0.2616 + 0.8637(X2)
First, we use variable X3 to predict and compare the actuall data
W2
predict(w2, data.frame(X2 = 5), interval = "confidence", level = 0.95)
## fit lwr upr
## 1 4.580252 4.144836 5.015668
W3
predict(w3, data.frame(X1 = 5, X2 = 5, X3 = 5, X4 = 5), interval = "confidence", level = 0.95)
## fit lwr upr
## 1 4.73258 4.332409 5.132751
Error w2
sqrt((5-4.580252)^2)
## [1] 0.419748
Error W3
sqrt((5-4.73258)^2)
## [1] 0.26742
Insight: Error from w3 is smaller than w2 and data from w3 is better than w2
Normality
hist(w2$residuals, breaks = 20)
shapiro.test(w2$residuals)
##
## Shapiro-Wilk normality test
##
## data: w2$residuals
## W = 0.92556, p-value = 0.0006304
hist(w3$residuals, breaks = 20)
shapiro.test(w3$residuals)
##
## Shapiro-Wilk normality test
##
## data: w3$residuals
## W = 0.88952, p-value = 2.239e-05
insight: data from w2 have p-value under < 0,05, for w3 > 0,05, means the data from w3 is spread well, H0 for w3 but H1 for w2.
plot(w1$X2, w3$residuals)
abline(h = 0, col = "red")
bptest(w3)
##
## studentized Breusch-Pagan test
##
## data: w3
## BP = 3.2703, df = 4, p-value = 0.5137
plot(w1$X2, w2$residuals)
abline(h = 0, col = "red")
bptest(w2)
##
## studentized Breusch-Pagan test
##
## data: w2
## BP = 1.5997, df = 1, p-value = 0.206
Insight: Both of data not have Heteroscedasticity because P-Value > 0,05
vif(w3)
## X1 X2 X3 X4
## 3.268742 4.109520 1.821732 2.949984
*w2 cant test with VIF because they only have 1 variable.
x < 10, not found multicollinearity between the variables.
insight: No outliersInsight: From this, we got Multiple R-squared: 0.5734, Adjusted R-squared: 0.5669.Model: W3: X5 = 0.1155 + 0.2632(X1) + 0.3427(X2) + -0.1177(X3) + 0.4353(X4) ; W2: x5 = 0.2616 + 0.8637(X2)Insight: Error from w3 is smaller than w2 and data from w3 is better than w2insight: data from w2 have p-value under < 0,05, for w3 > 0,05, means the data from w3 is spread well, H0 for w3 but H1 for w2.Insight: Both of data not have Heteroscedasticity because P-Value > 0,05Minimal sample i must count is 67 sample, and no N/A, no outliers. The process is, first i drop the timestamp column, and find out about the correlation. I got X2 have the highest variable who has strong corellation with X5 and 2 models X5 = 0.1155 + 0.2632(X1) + 0.3427(X2) + -0.1177(X3) + 0.4353(X4) & x5 = 0.2616 + 0.8637(X2). Error from w3 is smaller than w2 and data from w3 is better than w2, no heteroscedasticity and H0 for W3 and H1 for W2.