2020-03-13 Case Study

Overview

The ABC is the heart of development and learning excellence for the Syldavian Public Service. As such, ABC is dedicated to ensuring that public agencies and officers are future-ready: it supports initiatives that bring about change, learning and collaboration across the Public Service, and ensures that public agencies and officers are ready to embrace the future.

Technological disruptions over the years have led to significant strategic shifts within the Syldavian Public Service. Amidst this evolving landscape, the Senior Management has queried whether ABC is still meeting public officers’ expectations as a central learning institution for the Public Service.

You have been asked to mine a dataset to address this query, and to generate additional insights which you think would be beneficial for the Senior Management. The dataset comprises course feedback from all participants who have attended at least one course at the College between 2016 and 2018 inclusive, and cuts across different course domains and course types.

Loading Libraries

# Loading Libraries
library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(ggpubr)

## Warning: package 'ggpubr' was built under R version 3.6.3

## Loading required package: magrittr

Loading Data

We will separate Questions 9 and 10 from the rest of the Questions in the Course Feedback as they have different rating scales from the rest of the Questions.

# Loading Data
data <- read.csv('./LFG_Analyst_Interview_Data_R.csv', header=TRUE)
data8 <- subset(data, Question != "Q9" & Question != "Q10")
data9 <- subset(data, Question == "Q9")
data10 <- subset(data, Question == "Q10")
head(data8)

head(data9)

head(data10)

Summary of Data

# Summary of Data
summary(data8, digits = 1)

##        ID             Year           Type       
##  Min.   :    1   Min.   :2016   Inhouse:436068  
##  1st Qu.:23443   1st Qu.:2016   Public :407871  
##  Median :46886   Median :2017                   
##  Mean   :46886   Mean   :2017                   
##  3rd Qu.:70329   3rd Qu.:2017                   
##  Max.   :93771   Max.   :2018                   
##                                                 
##                      Domain          Question         Response     
##  Data Analytics         :  5868   Q1     : 93771   4      :453234  
##  Effective Communication:581931   Q11    : 93771   5      :240603  
##  Governance             :   477   Q2     : 93771   3      :140234  
##  Personal Development   :243936   Q3     : 93771   2      :  9638  
##  Service Excellence     : 11727   Q4     : 93771   1      :   230  
##                                   Q5     : 93771   10     :     0  
##                                   (Other):281313   (Other):     0

summary(data9, digits = 1)

##        ID             Year           Type                           Domain     
##  Min.   :    1   Min.   :2016   Inhouse:48452   Data Analytics         :  652  
##  1st Qu.:23444   1st Qu.:2016   Public :45319   Effective Communication:64659  
##  Median :46886   Median :2017                   Governance             :   53  
##  Mean   :46886   Mean   :2017                   Personal Development   :27104  
##  3rd Qu.:70329   3rd Qu.:2017                   Service Excellence     : 1303  
##  Max.   :93771   Max.   :2018                                                  
##                                                                                
##     Question        Response    
##  Q9     :93771   9      :32447  
##  Q1     :    0   8      :27345  
##  Q10    :    0   10     :21305  
##  Q11    :    0   7      :10709  
##  Q2     :    0   6      : 1821  
##  Q3     :    0   5      :  141  
##  (Other):    0   (Other):    3

summary(data10, digits = 1)

##        ID             Year           Type                           Domain     
##  Min.   :    1   Min.   :2016   Inhouse:48452   Data Analytics         :  652  
##  1st Qu.:23444   1st Qu.:2016   Public :45319   Effective Communication:64659  
##  Median :46886   Median :2017                   Governance             :   53  
##  Mean   :46886   Mean   :2017                   Personal Development   :27104  
##  3rd Qu.:70329   3rd Qu.:2017                   Service Excellence     : 1303  
##  Max.   :93771   Max.   :2018                                                  
##                                                                                
##     Question        Response    
##  Q10    :93771   Yes    :83650  
##  Q1     :    0   No     :10121  
##  Q11    :    0   1      :    0  
##  Q2     :    0   10     :    0  
##  Q3     :    0   2      :    0  
##  Q4     :    0   3      :    0  
##  (Other):    0   (Other):    0

Setting Up Contingency Tables

Visualisation of Simple Contingency Table for Question 9

# Setting Up a Data Frame for Response to Question 9
Q9Response <- as.data.frame(table(data9$Response))
Q9Response <- Q9Response[-c(11, 12), ]
Q9Response <- Q9Response[c(1, 3, 4, 5, 6, 7, 8, 9, 10, 2), ]
print(Q9Response)

##    Var1  Freq
## 1     1     0
## 3     2     0
## 4     3     0
## 5     4     3
## 6     5   141
## 7     6  1821
## 8     7 10709
## 9     8 27345
## 10    9 32447
## 2    10 21305

# Visualisation of Simple Contingency Table for Question 9
ggbarplot(Q9Response, x = "Var1", y = "Freq",
          title = "How likely are you to recommend this course to your colleagues?",
          xlab = "Response", ylab = "Frequency", order = c(1:10))

From the Bar Plot shown above, majority of the responses had a rating of 8 to 10. This suggests that majority of the participants are likely to recommend their course to their colleagues.

Computation of Relative Frequencies and Percentages for Question 9

# Computation of Relative Frequencies for Question 9
Q9_prop <- Q9Response$Freq/sum(Q9Response$Freq)
print(Q9_prop)

##  [1] 0.000000e+00 0.000000e+00 0.000000e+00 3.199283e-05 1.503663e-03
##  [6] 1.941965e-02 1.142038e-01 2.916147e-01 3.460238e-01 2.272024e-01

# Computation of Relative Percentages for Question 9
Q9_percent <- round(Q9_prop, 2)*100
print(Q9_percent)

##  [1]  0  0  0  0  0  2 11 29 35 23

For Question 9, a rating of 8 to 10 took up the majority of responses (87%). This suggests that majority of the participants are likely to recommend their course to their colleagues.

Visualisation of Simple Contingency Table for Question 10

# Setting Up a Data Frame for Response to Question 10
Q10Response <- as.data.frame(table(data10$Response))
Q10Response <- Q10Response[-c(1:10), ]
print(Q10Response)

##    Var1  Freq
## 11   No 10121
## 12  Yes 83650

# Visualisation of Simple Contingency Table for Question 10
ggbarplot(Q10Response, x = "Var1", y = "Freq",
          title = "Would you like to be updated on ABC's events and courses?",
          xlab = "Response", ylab = "Frequency")

From the Bar Plot shown above, majority of the responses were “Yes”. This suggests that majority of the participants would like to be updated on ABC’s events and courses.

Computation of Relative Frequencies and Percentages for Question 10

# Computation of Relative Frequencies for Question 10
Q10_prop <- Q10Response$Freq/sum(Q10Response$Freq)
print(Q10_prop)

## [1] 0.1079332 0.8920668

# Computation of Relative Percentages for Question 10
Q10_percent <- round(Q10_prop, 2)*100
print(Q10_percent)

## [1] 11 89

For Question 10, “Yes” took up the majority of responses (89%). This suggests that majority of the participants would like to be updated on ABC’s events and courses.

Setting Up Two-Way Contingency Tables for Questions 1 to 8 and 11

# Setting Up Variables
ID <- data8$ID
Year <- data8$Year
Type <- data8$Type
Domain <- data8$Domain
Question <- data8$Question
Response <- data8$Response

# Setting Up Data Frames
ID_df <- as.data.frame(table(ID, Response))
ID_df <- subset(ID_df, Response == "1" | Response == "2" | Response == "3" |
                  Response == "4" | Response == "5")
Year_df <- as.data.frame(table(Year, Response))
Year_df <- subset(Year_df, Response == "1" | Response == "2" | Response == "3" |
                    Response == "4" | Response == "5")
Type_df <- as.data.frame(table(Type, Response))
Type_df <- subset(Type_df, Response == "1" | Response == "2" | Response == "3" |
                    Response == "4" | Response == "5")
Domain_df <- as.data.frame(table(Domain, Response))
Domain_df <- subset(Domain_df, Response == "1" | Response == "2" | Response == "3" |
                      Response == "4" | Response == "5")
Question_df <- as.data.frame(table(Question, Response))
Question_df <- subset(Question_df, Response == "1" | Response == "2" | Response == "3" |
                        Response == "4" | Response == "5")
Question_df <- subset(Question_df, Question != "Q9" & Question != "Q10")

# Summary of Data Frames
summary(ID_df)

##        ID            Response          Freq    
##  1      :     5   1      :93771   Min.   :0.0  
##  2      :     5   2      :93771   1st Qu.:0.0  
##  3      :     5   3      :93771   Median :0.0  
##  4      :     5   4      :93771   Mean   :1.8  
##  5      :     5   5      :93771   3rd Qu.:3.0  
##  6      :     5   10     :    0   Max.   :9.0  
##  (Other):468825   (Other):    0

summary(Year_df)

##    Year      Response      Freq       
##  2016:5   1      :3   Min.   :    47  
##  2017:5   2      :3   1st Qu.:  2466  
##  2018:5   3      :3   Median : 35914  
##           4      :3   Mean   : 56263  
##           5      :3   3rd Qu.: 88418  
##           10     :0   Max.   :225313  
##           (Other):0

summary(Type_df)

##       Type      Response      Freq       
##  Inhouse:5   1      :2   Min.   :    98  
##  Public :5   2      :2   1st Qu.:  4593  
##              3      :2   Median : 70117  
##              4      :2   Mean   : 84394  
##              5      :2   3rd Qu.:125447  
##              10     :0   Max.   :232515  
##              (Other):0

summary(Domain_df)

##                      Domain     Response      Freq       
##  Data Analytics         :5   1      :5   Min.   :     0  
##  Effective Communication:5   2      :5   1st Qu.:    61  
##  Governance             :5   3      :5   Median :  1450  
##  Personal Development   :5   4      :5   Mean   : 33758  
##  Service Excellence     :5   5      :5   3rd Qu.:  6793  
##                              10     :0   Max.   :312359  
##                              (Other):0

summary(Question_df)

##     Question     Response      Freq      
##  Q1     : 5   1      :9   Min.   :    0  
##  Q11    : 5   2      :9   1st Qu.:  422  
##  Q2     : 5   3      :9   Median :14601  
##  Q3     : 5   4      :9   Mean   :18754  
##  Q4     : 5   5      :9   3rd Qu.:32749  
##  Q5     : 5   10     :0   Max.   :74840  
##  (Other):15   (Other):0

Visualisations of Two-Way Contingency Tables for Questions 1 to 8 and 11

We will skip the visualisation of the two-way contingency table for ID against Response as there are too many unique values for ID.

# Visualisation of Year against Response
ggbarplot(Year_df, x = "Year", y = "Freq", title = "General Response over the Years",
          color = "black", fill = "Response", xlab = "Year", ylab = "Frequency",
          position = position_dodge())

From the Bar Plot shown above, the general response over the years is a rating of 4 to 5. This suggests that majority of the participants agree or strongly agree with the objectives set out for their courses, over the years.

# Visualisation of Type against Response
ggbarplot(Type_df, x = "Type", y = "Freq",
          title = "General Response based on Type of Course",
          color = "black", fill = "Response", xlab = "Type of Course",
          ylab = "Frequency", position = position_dodge())

From the Bar Plot shown above, the general response based on type of course is a rating of 4 to 5. This suggests that majority of the participants do not have a particular preference for inhouse or public courses.

# Visualisation of Domain against Response
ggbarplot(Domain_df, x = "Domain", y = "Freq",
          title = "General Response based on Course Domain",
          color = "black", fill = "Response", xlab = "Course Domain",
          ylab = "Frequency", position = position_dodge()) +
  scale_x_discrete(limits = c("Data Analytics", "Effective Communication", "Governance",
                              "Personal Development", "Service Excellence"),
                   labels = c("Data\nAnalytics", "Effective\nCommunication", "Governance",
                              "Personal\nDevelopment", "Service\nExcellence"))

From the Bar Plot shown above, the general response based on course domain is a rating of 4 to 5, particularly in the areas of Effective Communication and Personal Development. This suggests that majority of the participants have taken a course in either Effective Communication or Personal Development. They may have a preference for these courses from ABC.

# Visualisation of Question against Response
ggbarplot(Question_df, x = "Question", y = "Freq",
          title = "General Response based on Questions in Course Feedback",
          color = "black", fill = "Response",
          xlab = "Questions in Course Feedback", ylab = "Frequency",
          order = c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8", "Q11"),
          position = position_dodge())

From the Bar Plot shown above, the general response based on questions in course feedback is a rating of 4 to 5 for all the questions, excluding questions 9 and 10. This suggests that majority of the participants agree or strongly agree with the objectives set out for their courses.

Computation of Relative Frequencies and Percentages for Questions 1 to 8 and 11

# Computation of Relative Frequencies for Year against Response
Year_prop <- table(Year, Response)
Year_prop <- Year_prop[, -c(2, 7, 8, 9, 10, 11, 12)]
Year_prop <- prop.table(Year_prop, 1)
print(Year_prop)

##       Response
## Year              1            2            3            4            5
##   2016 0.0001911362 0.0091704691 0.1460524282 0.5261693873 0.3184165792
##   2017 0.0002519526 0.0112898775 0.1674837131 0.5406495579 0.2803248989
##   2018 0.0004302356 0.0147714235 0.1904178802 0.5435144736 0.2508659871

# Computation of Relative Percentages for Year against Response
Year_percent <- round(Year_prop, 2)*100
print(Year_percent)

##       Response
## Year    1  2  3  4  5
##   2016  0  1 15 53 32
##   2017  0  1 17 54 28
##   2018  0  1 19 54 25

From Year 2016 to 2018, a rating of 4 to 5 is the general response (85%, 82%, 79%). This suggests that majority of the participants agree or strongly agree with the objectives set out for their courses, over the years.

# Computation of Relative Frequencies for Type against Response
Type_prop <- table(Type, Response)
Type_prop <- Type_prop[, -c(2, 7, 8, 9, 10, 11, 12)]
Type_prop <- prop.table(Type_prop, 1)
print(Type_prop)

##          Response
## Type                 1            2            3            4            5
##   Inhouse 0.0002247356 0.0100144932 0.1570741261 0.5332081235 0.2994785217
##   Public  0.0003236317 0.0129232037 0.1758864935 0.5411490398 0.2697176313

# Computation of Relative Percentages for Type against Response
Type_percent <- round(Type_prop, 2)*100
print(Type_percent)

##          Response
## Type       1  2  3  4  5
##   Inhouse  0  1 16 53 30
##   Public   0  1 18 54 27

For the type of course, a rating of 4 to 5 is the general response (83%, 81%). This suggests that majority of the participants do not have a particular preference for inhouse or public courses.

# Computation of Relative Frequencies for Domain against Response
Domain_prop <- table(Domain, Response)
Domain_prop <- Domain_prop[, -c(2, 7, 8, 9, 10, 11, 12)]
Domain_prop <- prop.table(Domain_prop, 1)
print(Domain_prop)

##                          Response
## Domain                               1            2            3            4
##   Data Analytics          0.0000000000 0.0103953647 0.2471029312 0.4645535106
##   Effective Communication 0.0001374733 0.0086814416 0.1400939287 0.5367629496
##   Governance              0.0230607966 0.4633123690 0.4675052411 0.0461215933
##   Personal Development    0.0004550374 0.0174021055 0.2327823691 0.5474058770
##   Service Excellence      0.0023876524 0.0050311248 0.0214888718 0.3918308178
##                          Response
## Domain                               5
##   Data Analytics          0.2779481936
##   Effective Communication 0.3143242068
##   Governance              0.0000000000
##   Personal Development    0.2019546110
##   Service Excellence      0.5792615332

# Computation of Relative Percentages for Domain against Response
Domain_percent <- round(Domain_prop, 2)*100
print(Domain_percent)

##                          Response
## Domain                     1  2  3  4  5
##   Data Analytics           0  1 25 46 28
##   Effective Communication  0  1 14 54 31
##   Governance               2 46 47  5  0
##   Personal Development     0  2 23 55 20
##   Service Excellence       0  1  2 39 58

For the course domain, a rating of 4 to 5 is the general response, particularly in the areas of Data Analytics (74%), Effective Communication (85%), Personal Development (75%), and Service Excellence (97%). This is similar to the results from the Bar Plot for course domain, where absolute frequencies are highest in the areas of Effective Communication and Personal Development. This suggests that majority of the participants have taken a course in these areas. They may have a preference for these courses from ABC.

# Computation of Relative Frequencies for Question against Response
Question_prop <- table(Question, Response)
Question_prop <- Question_prop[, -c(2, 7, 8, 9, 10, 11, 12)]
Question_prop <- Question_prop[-c(2, 11), ]
Question_prop <- Question_prop[c(1, 3, 4, 5, 6, 7, 8, 9, 2), ]
Question_prop <- prop.table(Question_prop, 1)
print(Question_prop)

##         Response
## Question            1            2            3            4            5
##      Q1  4.265711e-05 9.053972e-03 1.765045e-01 5.267087e-01 2.876902e-01
##      Q2  1.397020e-03 3.377377e-02 2.111527e-01 4.249821e-01 3.286944e-01
##      Q3  6.291924e-04 2.220303e-02 2.075908e-01 4.659330e-01 3.036440e-01
##      Q4  7.464995e-05 9.544529e-03 1.498118e-01 4.809163e-01 3.596528e-01
##      Q5  1.599642e-04 1.033369e-02 1.557091e-01 4.845528e-01 3.492444e-01
##      Q6  1.386356e-04 9.203272e-03 1.465378e-01 4.789007e-01 3.652195e-01
##      Q7  0.000000e+00 4.084418e-03 1.679304e-01 5.979461e-01 2.300391e-01
##      Q8  1.066428e-05 4.500325e-03 1.610626e-01 5.753591e-01 2.590673e-01
##      Q11 0.000000e+00 8.531422e-05 1.191946e-01 7.981146e-01 8.260550e-02

# Computation of Relative Percentages for Question against Response
Question_percent <- round(Question_prop, 2)*100
print(Question_percent)

##         Response
## Question  1  2  3  4  5
##      Q1   0  1 18 53 29
##      Q2   0  3 21 42 33
##      Q3   0  2 21 47 30
##      Q4   0  1 15 48 36
##      Q5   0  1 16 48 35
##      Q6   0  1 15 48 37
##      Q7   0  0 17 60 23
##      Q8   0  0 16 58 26
##      Q11  0  0 12 80  8

For the questions in course feedback, a rating of 4 to 5 is the general response, excluding questions 9 and 10. The relative percentages are 82% for Question 1, 75% for Question 2, 77% for Question 3, 84% for Question 4, 83% for Question 5, 85% for Question 6, 83% for Question 7, 84% for Question 8, and 88% for Question 11. This suggests that majority of the participants agree or strongly agree with the objectives set out for their courses.

The ABC is said to meet public service officers’ expectations as a central learning institution for the Public Service if at least 75% of all public service officers either agree or strongly agree to the following statement: Does ABC meet your expectations as a central learning institution for the public service? Otherwise, the College is said to not meet public service officers’ expectations.

Essentially, this statement is reflected in Question 11 and 88% of all participants who have attended at least one course at the College between 2016 and 2018 inclusive; gave a rating of 4 to 5 across different course domains and course types. Therefore, it can be said that the ABC has met the expectations of public service officers as a central learning institution for the Public Service.