Overview

You’re tasked with performing analysis on some employee satisfaction data. Note: We will be working exclusively in base vanilla R without may common third-party packages installed. The data is preprocessed and can be analyzed using built-in R function These data describe an employee satisfaction survey. There’s an overall rating and several subcategories

  1. Print the mean of each feature in the file
  2. Print the max and min values of each feature
  3. Using boolean filtering, print only the rows of the dataframe where rating is greater than 75
  4. Create a model or models to find out which features are most closely related to rating
    1. What kind of model did you choose?
    2. What did it tell you about which features are most closely related to the ratings?

Attitude Description

From a survey of the clerical employees of a large financial organization, the data are aggregated from the questionnaires of the approximately 35 employees for each of 30 (randomly selected) departments. The numbers give the percent proportion of favourable responses to seven questions in each department. (source: https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/attitude)

Attitude Codebook

# Load attitude dataset
head(attitude)
##   rating complaints privileges learning raises critical advance
## 1     43         51         30       39     61       92      45
## 2     63         64         51       54     63       73      47
## 3     71         70         68       69     76       86      48
## 4     61         63         45       47     54       84      35
## 5     81         78         56       66     71       83      47
## 6     43         55         49       44     54       49      34
dim(attitude)
## [1] 30  7
names(attitude)
## [1] "rating"     "complaints" "privileges" "learning"   "raises"    
## [6] "critical"   "advance"
sapply(attitude, class)
##     rating complaints privileges   learning     raises   critical    advance 
##  "numeric"  "numeric"  "numeric"  "numeric"  "numeric"  "numeric"  "numeric"
sum(is.na(attitude)) 
## [1] 0
summary(attitude)
##      rating        complaints     privileges       learning         raises     
##  Min.   :40.00   Min.   :37.0   Min.   :30.00   Min.   :34.00   Min.   :43.00  
##  1st Qu.:58.75   1st Qu.:58.5   1st Qu.:45.00   1st Qu.:47.00   1st Qu.:58.25  
##  Median :65.50   Median :65.0   Median :51.50   Median :56.50   Median :63.50  
##  Mean   :64.63   Mean   :66.6   Mean   :53.13   Mean   :56.37   Mean   :64.63  
##  3rd Qu.:71.75   3rd Qu.:77.0   3rd Qu.:62.50   3rd Qu.:66.75   3rd Qu.:71.00  
##  Max.   :85.00   Max.   :90.0   Max.   :83.00   Max.   :75.00   Max.   :88.00  
##     critical        advance     
##  Min.   :49.00   Min.   :25.00  
##  1st Qu.:69.25   1st Qu.:35.00  
##  Median :77.50   Median :41.00  
##  Mean   :74.77   Mean   :42.93  
##  3rd Qu.:80.00   3rd Qu.:47.75  
##  Max.   :92.00   Max.   :72.00
# print mean of each feature
mean_value <- round(colMeans(attitude), 2)
print(mean_value)
##     rating complaints privileges   learning     raises   critical    advance 
##      64.63      66.60      53.13      56.37      64.63      74.77      42.93
# max and min values of each feature
max_value <- sapply(attitude, max)
print(max_value)
##     rating complaints privileges   learning     raises   critical    advance 
##         85         90         83         75         88         92         72
min_value <- sapply(attitude, min)
print(min_value)
##     rating complaints privileges   learning     raises   critical    advance 
##         40         37         30         34         43         49         25
#difference
diff <- max_value - min_value
print(diff)
##     rating complaints privileges   learning     raises   critical    advance 
##         45         53         53         41         45         43         47
# Using boolean filtering, print only the rows of the dataframe where rating is greater than 75
filter_df <- attitude[attitude$rating > 75, ]
print(filter_df)
##    rating complaints privileges learning raises critical advance
## 5      81         78         56       66     71       83      47
## 15     77         77         54       72     79       77      46
## 16     81         90         50       72     60       54      36
## 27     78         75         58       74     80       78      49
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39
# linear regression model (lm), rating is the dependent variable, and predicted by all other feature 
# the most significant predictor of the rating is the complaints variable (p-value: 0.000903), for every one-unit increase in the complaints score, the overall rating is expected to increase by approximately by 0.61319. Additionally, row 6, 12, and 23 appear to be outliers based on the plots

m1 <- lm(rating ~ ., data = attitude)
summary(m1)
## 
## Call:
## lm(formula = rating ~ ., data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.9418  -4.3555   0.3158   5.5425  11.5990 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.78708   11.58926   0.931 0.361634    
## complaints   0.61319    0.16098   3.809 0.000903 ***
## privileges  -0.07305    0.13572  -0.538 0.595594    
## learning     0.32033    0.16852   1.901 0.069925 .  
## raises       0.08173    0.22148   0.369 0.715480    
## critical     0.03838    0.14700   0.261 0.796334    
## advance     -0.21706    0.17821  -1.218 0.235577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.068 on 23 degrees of freedom
## Multiple R-squared:  0.7326, Adjusted R-squared:  0.6628 
## F-statistic:  10.5 on 6 and 23 DF,  p-value: 1.24e-05
anova(m1)
## Analysis of Variance Table
## 
## Response: rating
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## complaints  1 2927.58 2927.58 58.6026 9.056e-08 ***
## privileges  1    7.52    7.52  0.1505    0.7016    
## learning    1  137.25  137.25  2.7473    0.1110    
## raises      1    0.94    0.94  0.0189    0.8920    
## critical    1    0.56    0.56  0.0113    0.9163    
## advance     1   74.11   74.11  1.4835    0.2356    
## Residuals  23 1149.00   49.96                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(m1)

# correlation 
# There is a strong positive relationship between complaints and rating with a correlation of 0.83
round(cor(attitude), 2)
##            rating complaints privileges learning raises critical advance
## rating       1.00       0.83       0.43     0.62   0.59     0.16    0.16
## complaints   0.83       1.00       0.56     0.60   0.67     0.19    0.22
## privileges   0.43       0.56       1.00     0.49   0.45     0.15    0.34
## learning     0.62       0.60       0.49     1.00   0.64     0.12    0.53
## raises       0.59       0.67       0.45     0.64   1.00     0.38    0.57
## critical     0.16       0.19       0.15     0.12   0.38     1.00    0.28
## advance      0.16       0.22       0.34     0.53   0.57     0.28    1.00