Survey analysis - General Study

Artem Larionov

September 4, 2016

Abstract

The purpose of this article is to show how simple visualization techniques can be used to better understand trends evident in data gathered from employee surveys and determine directions for future analysis. Typically, such surveys give respondents a few choices when answering a question by simply clicking on a relevant box. In contrast, open-ended surveys, on which this article is based, ask respondents to answer questions in their own words. Here, I’ve applied a few techniques of text analysis to interpret open ended answers from surveys.

Introduction

In this article, I focus specifically on analyzing word frequencies and sentiments to understand tendency in open-ended responses. I use a base plotting system of R for visualization of word frequencies and the “syuzhet” package to retrieve sentiments. The implementation of the analysis itself won’t be the main issue here, but if you are interested in this sub-topic, you can find the source code under this link.

Word clouds

Word clouds (aka tag clouds) are a visual representation of text data used mainly to characterize keyword metadata (tags) on websites or to visualize free form text. They depict the word frequency in a given text as a weighted list. More information on word clouds can be found here. Below you can see word clouds for our questions and answers:

As we could expect, the questions are focused on “what” and “how” about “company” and “work”. Answers look quite positive with a big “good” in the middle, the biggest “know” is probably related to “don’t” right above it. It might be useful to find those answers and questions to understand whether the employees don’t know something important, or just haven’t gotten familiar with the survey tool yet.

Word frequency

Word frequency can be useful for certain types of questions because it summarizes all answers into the most common words. Firstly, all answers are being split into separate words. Secondly, we count the frequency of each word. And thirdly, we plot a bar where we show the frequency of each and every word from the text, sorted from the most to the least frequent. Thanks to word frequency method we can see the most frequently used words and trends behind even the biggest number of answers. However, it also ignores the context of the word so, for example, it doesn’t understand and show the difference between “very good” and “not very good”.

For instance, in the graph above the word “disturb” actually has positive meaning, because it was used with negation, so it’s always good to check what is hiding behind the numbers.

## [1] "It does not disturb the working process"                             
## [2] "quiet. thats awesome. I have nothing that could disturb me from work"

Sentiments

According to Plutchik’s theory, there are eight basic emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust. “Syuzhet” package provides a functionality to check if a word or a sentence is related to these emotions. It also says if it’s positive or negative.

##          anger anticipation disgust fear joy sadness surprise trust
## good         0            1       0    0   1       0        1     1
## friendly     0            1       0    0   1       0        0     1
## work         0            0       0    0   0       0        0     0
## just         0            0       0    0   0       0        0     0
## fine         0            0       0    0   0       0        0     0
##          negative positive
## good            0        1
## friendly        0        1
## work            0        0
## just            0        0
## fine            0        0

See the examples of sentiment graphs below.

To demonstrate what words from the provided answers are related to each emotion, we can combine word frequencies and sentiments.

Sentiment graph allows to see the common trend behind the big amount of text, but it also ignores the context so it doesn’t show irony, sarcasm or negation.

Sentiments and time series

Even though the answers seem mostly positive, they may keep changing over time (see the graphs below).

Conlusion

These techniques allow to understand common trends and could be used as a summary/overview of а big amount of text, but they also require a detailed analysis as to the context.