1. Overview

This makeover aims to visualise the media consumption patterns across the different generations of Singapore residents using a survey that was conducted in 2018 with 2300 Singapore Residents and Work Permit holders (note: survey data source is confidential). The scope of this visualisation is to look at Singapore residents (i.e. Singapore citizen and PR) media consumption only and in total there are 1047 Singapore resident respondents in the survey.

1.1. Purpose of Visualisation

It would be interesting to look at the media consumption across different generations. The survey data consist of year of births from 1941 to 1993 and it would mean that there are four groups of generation, namely the Silent Generation or Greatest Generation, Baby Boomers, Generation X and Millennials as seen from the image below.

Image Credit: Nielsen Total Audience Report Q1 2017

The survey contains demographic information along with details on the usage of different types of media conumption which could be grouped according to the following media groups as seen from the table below.

No.	Media Group	List of Media
1	TV	Free-to-air Channels, Cable TV Channels
2	Radio	Local Radio Channels, Foreign Radio Channels
3	Newspapers & Magazines	Local Newspapers (Printed), Local Newspapers / News Brands (Digital – websites/apps), Foreign Newspapers (Print), Foreign Newspaper / News Brands (Digital – websites/apps), Locally Published Magazine (Print), Foreign Published Magazine (Print), Other Local/Foreign Publishers (Digital magazines)
4	Online Video & Music	Online Video Streaming Websites/Apps, Online Music Websites/Apps
5	Social Media	Social Media
6	Messaging Apps	Messaging Apps

One of the questions that respondents had to answer was

“Of the time you spent using these platforms, what percent of your time in the
past 4 weeks did you spend accessing each of them?”

The survey respondents had allocate percentage of their time to each of the 15 medias in the past 4 weeks (as identified in the last column of the table above) and the numbers has to add up to 100%.

With the advert of the internet and increase in ownership of smartphones / smart devices, there’s a trend towards digital marketing and advertising and many advertising and marketing firms / public government entities would want to know how consumers are consuming new media and social media so that they can have more effective targeted advertising campaigns.

Hence, for the purpose of this visualisation, we will only look at New Media (i.e. Newspapers & Magazines (Digital), Online Video & Music and Messaging Apps) and Social Media.

We will conduct a one-way Analysis of Variance (ANOVA) test to determine if there is statistical differences between the means (i.e. percentage of time spent past 4 weeks for new and social media) across the different groups of generations.

1.2. Sketch of Proposed DataViz Design

2. Suggestions

Use plotly to include interactivity to the ggstatsplot ANOVA graph.

3. DataViz Step-by Step

3.1. Install and Load R packages

tidyverse contains a set of essential packages for data manipulation and exploration.
ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the information-rich plots.
plotly to create interactive web graphics from ‘ggplot2’ graphs.

Important note: ggstatsplot requires ggplot2 version 3.3.0 to work, ensure that you have the latest version installed.

packages <- c('tidyverse', 'ggstatsplot', 'plotly')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3.2. Load the Data

data <- read_csv("data/survey_data.csv")

3.3. Data Wrangling

3.3.1. Create Generation groups

We have a column with the year of birth of respondents and we will create a column that segments the respondents according to the table below.

Generation	Year of Birth
Millennials	1980 - 1996
Generation X	1965 - 1979
Baby Boomers	1947 - 1964
Silent Generation	1917 - 1946

millennials <- seq(1996,1980)
genX <- seq(1965,1979)
boomer <- seq(1947,1964)
silent <- seq(1917,1946)
  
data <- data %>% mutate(Generation = 
                        case_when(`Year of Birth` %in% millennials ~ "Millennials",
                                  `Year of Birth` %in% genX ~ "Generation X",
                                  `Year of Birth` %in% boomer ~ "Baby Boomers",
                                  `Year of Birth` %in% silent ~ "Silent Generation",
                                   TRUE ~ ""))

3.3.2. Segment the media into groups

Based on the table above which shows the media groups, we will group the responses into the same group.

Create the grouping.

tv_col <- c("A2_1 of Residents","A2_2 of Residents")
radio_col <- c("A2_3 of Residents", "A2_4 of Residents")
newspapers_magazines_print <- c("A2_5 of Residents", "A2_7 of Residents", 
                                "A2_9 of Residents", "A2_10 of Residents")
newspapers_magazines_digital <- c("A2_6 of Residents","A2_8 of Residents","A2_11 of Residents")
video_music_col <- c("A2_12 of Residents", "A2_13 of Residents")
socialmedia_col <- c("A2_14 of Residents")
messaging_col <- c("A2_15 of Residents")

Do a row sum for each of the media type.

df <- data %>% mutate(TV = rowSums(data[,tv_col]),
                     Radio = rowSums(data[,radio_col]),
                    `Newspapers & Magazines (Print)` = rowSums(data[,newspapers_magazines_print]),
                    `Newspapers & Magazines (Digital)` = rowSums(data[,newspapers_magazines_digital]),
                    `Online Video & Music` = rowSums(data[,video_music_col]),
                    `Social Media` = rowSums(data[,socialmedia_col]),
                    `Messaging Apps` = rowSums(data[,messaging_col])) %>%
              select(RespID, Generation, 
                     TV, Radio, `Newspapers & Magazines (Print)`, 
                    `Newspapers & Magazines (Digital)`,`Online Video & Music`, 
                    `Social Media`, `Messaging Apps`)

# Create a new column with sum of responses and 
# filter to ensure responses with only 100% are selected          
df <- df %>% mutate(Total = rowSums(df[,3:9])) %>%
             filter(Total == 100)

Check the data.

DT::datatable(
  head(df), extensions = 'FixedColumns',
  options = 
    list(dom = 't',
         columnDefs = list(list(width = '100px', targets = c(1, ncol(df)))),
         scrollX = TRUE,
         scrollCollapse = TRUE)
)

3.3.3. Check for Normality

Before perfoming an one-way ANOVA test, we have to check the distribution of the reponses for each of the gneration across each media groups. We will use gghistostats from the ggstatsplot package. As mentioned in section 1.1, we will only look at New Media (i.e. Newspapers & Magazines (Digital), Online Video & Music and Messaging Apps) and Social Media. Use the code chunk below to plot the histograms according to the media type.

(Note: The code for “Online Video & Music” and “Social Media” is the same just changed the x value in the grouped_gghistostats argument)

# for reproducibility
set.seed(123)

# plot histogram
ggstatsplot::grouped_gghistostats(
  data = dplyr::filter(
    .data = df,
    Generation %in% c("Millennials", "Generation X", "Baby Boomers", "Silent Generation")
  ),
  x = `Newspapers & Magazines (Digital)`,
  xlab = "% of Time Spent past 4 weeks",
  type = "robust", # use robust location measure
  grouping.var = Generation, # grouping variable
  normal.curve = TRUE, # superimpose a normal distribution curve
  normal.curve.args = list(color = "darkred", size = 1),
  ggtheme = ggthemes::theme_tufte(),
  ggplot.component = list( # modify the defaults from `ggstatsplot` for each plot
    ggplot2::scale_x_continuous(breaks = seq(0, 100, 10), limits = (c(0, 100))),
    ggplot2::theme(title = element_text(size = 17), 
                   plot.title = element_text(size = 17),
                   plot.subtitle = element_text(size = 15))
  ),
  messages = FALSE,
  plotgrid.args = list(nrow = 2),
  title.text = "Digital Newspapers & Magazines consumption across different generations",
  title.args = list(size = 20, fontface = "bold")
)

While these gghistogram plots are a good way to visualise the distribution, to make it interactive we can also use ggplot2 and plotly as seen below.

histogram <- ggplot(df, aes(x = `Newspapers & Magazines (Digital)`)) + 
             geom_histogram(binwidth = 3, colour="grey30", 
                             aes(y=..density.., fill=..count..), 
                             alpha=0.5) +
            scale_fill_gradient("Count", low="#DCDCDC", high="#7C7C7C") +
            stat_function(fun = dnorm, args = list(mean = mean(df$`Newspapers & Magazines (Digital)`), 
                                                     sd = sd(df$`Newspapers & Magazines (Digital)`))) + 
            ggtitle("Distribution of Digital Newspapers & Magazines Media conumption past 4 weeks") +
            theme_bw() +
            facet_wrap(.~Generation)

ggplotly(histogram)

We can also plot the QQ plot to verify that the distribution of the new and social media consumption across the different generations. Looking at the QQ plot below, we can see that the dots do not align close to the line and this indicates that the distribution does not resemble a normal ditribution.

Below are some of the key observations from Data Wrangling in summary:

The Silent Generation represents a very small proportion of the survey responses (n=13).
Almost all the distributions are positively skewed.

In view of these observations, a non-parametric ANOVA test would be conducted.

Note: With more than 2 groups, ggstatsplot::ggbetweenstats will use Kruskal–Wallis one-way ANOVA test for non-parametric test. (reference).

3.4. Use `ggstatsplot` to plot ANOVA graph

3.4.1. Plot the ANOVA graph

Use ggbetweenstats to plot the ANOVA plot and set type to "np", that is non-parametric test and pairwise.comparisons set to TRUE. For pairwise comparisons, Dwass-Steel-Crichtlow-Fligner test was used as it is a non-parametric test with unequal variance (reference).

(Note: Only showing the code chunk for one of the plot as the other 3 media types plots share the same code chunk as well, just different media type.)

# for reproducibility
set.seed(123)

# plot
p1 <- ggstatsplot::ggbetweenstats(
      data = df,
      x = Generation,
      y = `Newspapers & Magazines (Digital)`,
      mean.plotting = TRUE,
      mean.ci = TRUE,
      pairwise.comparisons = TRUE, # display results from pairwise comparisons
      notch = TRUE,
      type = "np",
      k=3,
      title = "Differences in mean across \nDigital Newspapers & Magazines Consumption",
      messages = FALSE
    ) + 
      ggplot2::scale_y_continuous(breaks = seq(0, 110, by = 20)) +
      ggplot2::coord_cartesian(ylim = c(0, 110)) 

p1

3.4.2. Include interactivity in ANOVA graph

Use ggplotly to convert ggstatsplot to a plotly object. The code below plots for Digital Newspaper and Magazines. Do the same for the other three media type plots.

Note that after coverting into a plotly object the statistical results of the one-way ANOVA test at the top of the chart is missing. As clarified by the creator of ggstatsplot package, this is a ggplot2 issue not a ggstatsplot issue, here. Instead, we will display the results in the subtitle when we combine all the plots together.

(Note: Only showing the code chunk for one of the plot as the other 3 media types plots share the same code chunk as well, just different media type.)

# Margin settings
m <- list(
  l = 50,
  r = 50,
  b = 100,
  t = 100,
  pad = 4
)

fig1 <- plotly::ggplotly(p1, tooltip=c("text","x","y"))
fig1 <- fig1 %>% layout(yaxis= list(title = "% of time spend past 4 weeks", 
                               titlefont=list(family='Arial', size=12),
                               tickfont=list(family='Arial', size = 13)),
                        xaxis=list(tickfont=list(family='Arial', size = 11)),
                        margin=m) 
fig1

4. Final Visualisation and Insights

Using plotly’s subplot function, we can combine all the plots together into one plotly object as seen below.

Looking at the one-way ANOVA plots for each of New and Social Media types across each of the generations, the main insights gathered are as follows:

The null hypothesis of ANOVA test is that the means between groups are the same. Based on the ANOVA test results, the means of all the new and social media types across all generation groups are statistically significant at 95% confidence interval (p-value < 0.05, reject null hypothesis at 95% CI) which suggests that the mean of new and social media consumption in the past 4 weeks are not the same across generation groups.
Millennials have the highest mean consumption in the past 4 weeks across Digital Newspaper and Magazines (11.1%), Online Video & Music (28.8%) and Social Media (19.7%) followed by Generation X.
For Messaging Apps, the mean percentage of time spent in past 4 weeks are around the same for Baby Boomers (19.2%), Generation X (20.1%) and Millennials (19.1%).
In all New and Social Media catagories, the Silent Generation had negligible or minimal consumption as compared to the other generations.

5. Reflection on Advantages of incorporating Interactivity

5.1. Greater Attention to Detail

A static visualisation would not be able to show the values of each individual points (e.g. comparing the static ANOVA plot vs. ggplotly ANOVA plot above) or show the changes over time like a bubble plot unlike an interactive visualisation. Also, an interactive data visualisation gives the DataViz greater level of detail, for instance, showing what does each point in the graph represents rather than just displaying point on a graph. While you could label every single point in a static visualisation, but this becomes a problem when we have a lot of data points. Therefore, it would be wiser to include interactivity so that the user can select and hover over the points to see what is the details for that particular point.

5.2. Interactive User Experience, Data Discovery and Flexibility

With an interactive visualisation, the user is able to explore the visualisation on their own, deriving insights and patterns, and this gives greater flexibility (i.e. users are not restricted to what they can visualise or explore with the data). For example, including sliders or filters that will change the visualisation accordingly so that the user is able to select variables or parameters that interest them. As such, there will be greater engagement with the users as they would be able to customise and explore what interests them rather than having a default static visualisation.

5.3. Data Storytelling

An interactive visualisation would do a much better job at telling a story rather than a static visualisation when it comes to visualising changes. For example, including animations to display the changes overtime as compared to a static visualisation. Also, an interactive visualisation would be much more compact as compared to a static visualisation that shows the pattern for each year like a ternary plot or bubble plot with multiple plots (e.g. facet by year).

6. References

All about ggstatsplot

How are the Different Generations in Singapore consuming New and Social Media?

ISSS608 Visual Analytics and Applications | DataViz Makeover 9

Author: Cherie Wong

Date: 29 Mar 2020 23:49:18

1. Overview

1.1. Purpose of Visualisation

1.2. Sketch of Proposed DataViz Design

2. Suggestions

3. DataViz Step-by Step

3.1. Install and Load R packages

3.2. Load the Data

3.3. Data Wrangling

3.3.1. Create Generation groups

3.3.2. Segment the media into groups

3.3.3. Check for Normality

3.4. Use `ggstatsplot` to plot ANOVA graph

3.4.1. Plot the ANOVA graph

3.4.2. Include interactivity in ANOVA graph

4. Final Visualisation and Insights

5. Reflection on Advantages of incorporating Interactivity

5.1. Greater Attention to Detail

5.2. Interactive User Experience, Data Discovery and Flexibility

5.3. Data Storytelling

6. References

How are the Different Generations in Singapore consuming New and Social Media?

ISSS608 Visual Analytics and Applications | DataViz Makeover 9

Author: Cherie Wong

Date: 29 Mar 2020 23:49:18

1. Overview

1.1. Purpose of Visualisation

1.2. Sketch of Proposed DataViz Design

2. Suggestions

3. DataViz Step-by Step

3.1. Install and Load R packages

3.2. Load the Data

3.3. Data Wrangling

3.3.1. Create Generation groups

3.3.2. Segment the media into groups

3.3.3. Check for Normality

3.4. Use ggstatsplot to plot ANOVA graph

3.4.1. Plot the ANOVA graph

3.4.2. Include interactivity in ANOVA graph

4. Final Visualisation and Insights

5. Reflection on Advantages of incorporating Interactivity

5.1. Greater Attention to Detail

5.2. Interactive User Experience, Data Discovery and Flexibility

5.3. Data Storytelling

6. References

3.4. Use `ggstatsplot` to plot ANOVA graph