You’re tasked with performing analysis on some employee satisfaction data. Note: We will be working exclusively in base vanilla R without may common third-party packages installed. The data is preprocessed and can be analyzed using built-in R function These data describe an employee satisfaction survey. There’s an overall rating and several subcategories
From a survey of the clerical employees of a large financial organization, the data are aggregated from the questionnaires of the approximately 35 employees for each of 30 (randomly selected) departments. The numbers give the percent proportion of favourable responses to seven questions in each department. (source: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/attitude)
# Load attitude dataset
head(attitude)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
dim(attitude)
## [1] 30 7
names(attitude)
## [1] "rating" "complaints" "privileges" "learning" "raises"
## [6] "critical" "advance"
sapply(attitude, class)
## rating complaints privileges learning raises critical advance
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
sum(is.na(attitude))
## [1] 0
summary(attitude)
## rating complaints privileges learning raises
## Min. :40.00 Min. :37.0 Min. :30.00 Min. :34.00 Min. :43.00
## 1st Qu.:58.75 1st Qu.:58.5 1st Qu.:45.00 1st Qu.:47.00 1st Qu.:58.25
## Median :65.50 Median :65.0 Median :51.50 Median :56.50 Median :63.50
## Mean :64.63 Mean :66.6 Mean :53.13 Mean :56.37 Mean :64.63
## 3rd Qu.:71.75 3rd Qu.:77.0 3rd Qu.:62.50 3rd Qu.:66.75 3rd Qu.:71.00
## Max. :85.00 Max. :90.0 Max. :83.00 Max. :75.00 Max. :88.00
## critical advance
## Min. :49.00 Min. :25.00
## 1st Qu.:69.25 1st Qu.:35.00
## Median :77.50 Median :41.00
## Mean :74.77 Mean :42.93
## 3rd Qu.:80.00 3rd Qu.:47.75
## Max. :92.00 Max. :72.00
# print mean of each feature
mean_value <- round(colMeans(attitude), 2)
print(mean_value)
## rating complaints privileges learning raises critical advance
## 64.63 66.60 53.13 56.37 64.63 74.77 42.93
# max and min values of each feature
max_value <- sapply(attitude, max)
print(max_value)
## rating complaints privileges learning raises critical advance
## 85 90 83 75 88 92 72
min_value <- sapply(attitude, min)
print(min_value)
## rating complaints privileges learning raises critical advance
## 40 37 30 34 43 49 25
#difference
diff <- max_value - min_value
print(diff)
## rating complaints privileges learning raises critical advance
## 45 53 53 41 45 43 47
# Using boolean filtering, print only the rows of the dataframe where rating is greater than 75
filter_df <- attitude[attitude$rating > 75, ]
print(filter_df)
## rating complaints privileges learning raises critical advance
## 5 81 78 56 66 71 83 47
## 15 77 77 54 72 79 77 46
## 16 81 90 50 72 60 54 36
## 27 78 75 58 74 80 78 49
## 29 85 85 71 71 77 74 55
## 30 82 82 39 59 64 78 39
# linear regression model (lm), rating is the dependent variable, and predicted by all other feature
# the most significant predictor of the rating is the complaints variable (p-value: 0.000903), for every one-unit increase in the complaints score, the overall rating is expected to increase by approximately by 0.61319. Additionally, row 6, 12, and 23 appear to be outliers based on the plots
m1 <- lm(rating ~ ., data = attitude)
summary(m1)
##
## Call:
## lm(formula = rating ~ ., data = attitude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.9418 -4.3555 0.3158 5.5425 11.5990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.78708 11.58926 0.931 0.361634
## complaints 0.61319 0.16098 3.809 0.000903 ***
## privileges -0.07305 0.13572 -0.538 0.595594
## learning 0.32033 0.16852 1.901 0.069925 .
## raises 0.08173 0.22148 0.369 0.715480
## critical 0.03838 0.14700 0.261 0.796334
## advance -0.21706 0.17821 -1.218 0.235577
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.068 on 23 degrees of freedom
## Multiple R-squared: 0.7326, Adjusted R-squared: 0.6628
## F-statistic: 10.5 on 6 and 23 DF, p-value: 1.24e-05
anova(m1)
## Analysis of Variance Table
##
## Response: rating
## Df Sum Sq Mean Sq F value Pr(>F)
## complaints 1 2927.58 2927.58 58.6026 9.056e-08 ***
## privileges 1 7.52 7.52 0.1505 0.7016
## learning 1 137.25 137.25 2.7473 0.1110
## raises 1 0.94 0.94 0.0189 0.8920
## critical 1 0.56 0.56 0.0113 0.9163
## advance 1 74.11 74.11 1.4835 0.2356
## Residuals 23 1149.00 49.96
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(m1)
# correlation
# There is a strong positive relationship between complaints and rating with a correlation of 0.83
round(cor(attitude), 2)
## rating complaints privileges learning raises critical advance
## rating 1.00 0.83 0.43 0.62 0.59 0.16 0.16
## complaints 0.83 1.00 0.56 0.60 0.67 0.19 0.22
## privileges 0.43 0.56 1.00 0.49 0.45 0.15 0.34
## learning 0.62 0.60 0.49 1.00 0.64 0.12 0.53
## raises 0.59 0.67 0.45 0.64 1.00 0.38 0.57
## critical 0.16 0.19 0.15 0.12 0.38 1.00 0.28
## advance 0.16 0.22 0.34 0.53 0.57 0.28 1.00