This makeover aims to visualise the media consumption patterns across the different generations of Singapore residents using a survey that was conducted in 2018 with 2300 Singapore Residents and Work Permit holders (note: survey data source is confidential). The scope of this visualisation is to look at Singapore residents (i.e. Singapore citizen and PR) media consumption only and in total there are 1047 Singapore resident respondents in the survey.
It would be interesting to look at the media consumption across different generations. The survey data consist of year of births from 1941 to 1993 and it would mean that there are four groups of generation, namely the Silent Generation or Greatest Generation, Baby Boomers, Generation X and Millennials as seen from the image below.
The survey contains demographic information along with details on the usage of different types of media conumption which could be grouped according to the following media groups as seen from the table below.
| No. | Media Group | List of Media |
|---|---|---|
| 1 | TV | Free-to-air Channels, Cable TV Channels |
| 2 | Radio | Local Radio Channels, Foreign Radio Channels |
| 3 | Newspapers & Magazines | Local Newspapers (Printed), Local Newspapers / News Brands (Digital – websites/apps), Foreign Newspapers (Print), Foreign Newspaper / News Brands (Digital – websites/apps), Locally Published Magazine (Print), Foreign Published Magazine (Print), Other Local/Foreign Publishers (Digital magazines) |
| 4 | Online Video & Music | Online Video Streaming Websites/Apps, Online Music Websites/Apps |
| 5 | Social Media | Social Media |
| 6 | Messaging Apps | Messaging Apps |
One of the questions that respondents had to answer was
“Of the time you spent using these platforms, what percent of your time in the
past 4 weeks did you spend accessing each of them?”
The survey respondents had allocate percentage of their time to each of the 15 medias in the past 4 weeks (as identified in the last column of the table above) and the numbers has to add up to 100%.
With the advert of the internet and increase in ownership of smartphones / smart devices, there’s a trend towards digital marketing and advertising and many advertising and marketing firms / public government entities would want to know how consumers are consuming new media and social media so that they can have more effective targeted advertising campaigns.
Hence, for the purpose of this visualisation, we will only look at New Media (i.e. Newspapers & Magazines (Digital), Online Video & Music and Messaging Apps) and Social Media.
We will conduct a one-way Analysis of Variance (ANOVA) test to determine if there is statistical differences between the means (i.e. percentage of time spent past 4 weeks for new and social media) across the different groups of generations.
Use plotly to include interactivity to the ggstatsplot ANOVA graph.
tidyverse contains a set of essential packages for data manipulation and exploration.ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the information-rich plots.plotly to create interactive web graphics from ‘ggplot2’ graphs.Important note: ggstatsplot requires ggplot2 version 3.3.0 to work, ensure that you have the latest version installed.
We have a column with the year of birth of respondents and we will create a column that segments the respondents according to the table below.
| Generation | Year of Birth |
|---|---|
| Millennials | 1980 - 1996 |
| Generation X | 1965 - 1979 |
| Baby Boomers | 1947 - 1964 |
| Silent Generation | 1917 - 1946 |
millennials <- seq(1996,1980)
genX <- seq(1965,1979)
boomer <- seq(1947,1964)
silent <- seq(1917,1946)
data <- data %>% mutate(Generation =
case_when(`Year of Birth` %in% millennials ~ "Millennials",
`Year of Birth` %in% genX ~ "Generation X",
`Year of Birth` %in% boomer ~ "Baby Boomers",
`Year of Birth` %in% silent ~ "Silent Generation",
TRUE ~ ""))Based on the table above which shows the media groups, we will group the responses into the same group.
Create the grouping.
tv_col <- c("A2_1 of Residents","A2_2 of Residents")
radio_col <- c("A2_3 of Residents", "A2_4 of Residents")
newspapers_magazines_print <- c("A2_5 of Residents", "A2_7 of Residents",
"A2_9 of Residents", "A2_10 of Residents")
newspapers_magazines_digital <- c("A2_6 of Residents","A2_8 of Residents","A2_11 of Residents")
video_music_col <- c("A2_12 of Residents", "A2_13 of Residents")
socialmedia_col <- c("A2_14 of Residents")
messaging_col <- c("A2_15 of Residents")Do a row sum for each of the media type.
df <- data %>% mutate(TV = rowSums(data[,tv_col]),
Radio = rowSums(data[,radio_col]),
`Newspapers & Magazines (Print)` = rowSums(data[,newspapers_magazines_print]),
`Newspapers & Magazines (Digital)` = rowSums(data[,newspapers_magazines_digital]),
`Online Video & Music` = rowSums(data[,video_music_col]),
`Social Media` = rowSums(data[,socialmedia_col]),
`Messaging Apps` = rowSums(data[,messaging_col])) %>%
select(RespID, Generation,
TV, Radio, `Newspapers & Magazines (Print)`,
`Newspapers & Magazines (Digital)`,`Online Video & Music`,
`Social Media`, `Messaging Apps`)
# Create a new column with sum of responses and
# filter to ensure responses with only 100% are selected
df <- df %>% mutate(Total = rowSums(df[,3:9])) %>%
filter(Total == 100)Check the data.
Before perfoming an one-way ANOVA test, we have to check the distribution of the reponses for each of the gneration across each media groups. We will use gghistostats from the ggstatsplot package. As mentioned in section 1.1, we will only look at New Media (i.e. Newspapers & Magazines (Digital), Online Video & Music and Messaging Apps) and Social Media. Use the code chunk below to plot the histograms according to the media type.
(Note: The code for “Online Video & Music” and “Social Media” is the same just changed the x value in the grouped_gghistostats argument)
# for reproducibility
set.seed(123)
# plot histogram
ggstatsplot::grouped_gghistostats(
data = dplyr::filter(
.data = df,
Generation %in% c("Millennials", "Generation X", "Baby Boomers", "Silent Generation")
),
x = `Newspapers & Magazines (Digital)`,
xlab = "% of Time Spent past 4 weeks",
type = "robust", # use robust location measure
grouping.var = Generation, # grouping variable
normal.curve = TRUE, # superimpose a normal distribution curve
normal.curve.args = list(color = "darkred", size = 1),
ggtheme = ggthemes::theme_tufte(),
ggplot.component = list( # modify the defaults from `ggstatsplot` for each plot
ggplot2::scale_x_continuous(breaks = seq(0, 100, 10), limits = (c(0, 100))),
ggplot2::theme(title = element_text(size = 17),
plot.title = element_text(size = 17),
plot.subtitle = element_text(size = 15))
),
messages = FALSE,
plotgrid.args = list(nrow = 2),
title.text = "Digital Newspapers & Magazines consumption across different generations",
title.args = list(size = 20, fontface = "bold")
)While these gghistogram plots are a good way to visualise the distribution, to make it interactive we can also use ggplot2 and plotly as seen below.
histogram <- ggplot(df, aes(x = `Newspapers & Magazines (Digital)`)) +
geom_histogram(binwidth = 3, colour="grey30",
aes(y=..density.., fill=..count..),
alpha=0.5) +
scale_fill_gradient("Count", low="#DCDCDC", high="#7C7C7C") +
stat_function(fun = dnorm, args = list(mean = mean(df$`Newspapers & Magazines (Digital)`),
sd = sd(df$`Newspapers & Magazines (Digital)`))) +
ggtitle("Distribution of Digital Newspapers & Magazines Media conumption past 4 weeks") +
theme_bw() +
facet_wrap(.~Generation)
ggplotly(histogram) We can also plot the QQ plot to verify that the distribution of the new and social media consumption across the different generations. Looking at the QQ plot below, we can see that the dots do not align close to the line and this indicates that the distribution does not resemble a normal ditribution.
Below are some of the key observations from Data Wrangling in summary:
In view of these observations, a non-parametric ANOVA test would be conducted.
Note: With more than 2 groups, ggstatsplot::ggbetweenstats will use Kruskal–Wallis one-way ANOVA test for non-parametric test. (reference).
ggstatsplot to plot ANOVA graphUse ggbetweenstats to plot the ANOVA plot and set type to "np", that is non-parametric test and pairwise.comparisons set to TRUE. For pairwise comparisons, Dwass-Steel-Crichtlow-Fligner test was used as it is a non-parametric test with unequal variance (reference).
(Note: Only showing the code chunk for one of the plot as the other 3 media types plots share the same code chunk as well, just different media type.)
# for reproducibility
set.seed(123)
# plot
p1 <- ggstatsplot::ggbetweenstats(
data = df,
x = Generation,
y = `Newspapers & Magazines (Digital)`,
mean.plotting = TRUE,
mean.ci = TRUE,
pairwise.comparisons = TRUE, # display results from pairwise comparisons
notch = TRUE,
type = "np",
k=3,
title = "Differences in mean across \nDigital Newspapers & Magazines Consumption",
messages = FALSE
) +
ggplot2::scale_y_continuous(breaks = seq(0, 110, by = 20)) +
ggplot2::coord_cartesian(ylim = c(0, 110))
p1Use ggplotly to convert ggstatsplot to a plotly object. The code below plots for Digital Newspaper and Magazines. Do the same for the other three media type plots.
Note that after coverting into a plotly object the statistical results of the one-way ANOVA test at the top of the chart is missing. As clarified by the creator of ggstatsplot package, this is a ggplot2 issue not a ggstatsplot issue, here. Instead, we will display the results in the subtitle when we combine all the plots together.
(Note: Only showing the code chunk for one of the plot as the other 3 media types plots share the same code chunk as well, just different media type.)
# Margin settings
m <- list(
l = 50,
r = 50,
b = 100,
t = 100,
pad = 4
)
fig1 <- plotly::ggplotly(p1, tooltip=c("text","x","y"))
fig1 <- fig1 %>% layout(yaxis= list(title = "% of time spend past 4 weeks",
titlefont=list(family='Arial', size=12),
tickfont=list(family='Arial', size = 13)),
xaxis=list(tickfont=list(family='Arial', size = 11)),
margin=m)
fig1Using plotly’s subplot function, we can combine all the plots together into one plotly object as seen below.
Looking at the one-way ANOVA plots for each of New and Social Media types across each of the generations, the main insights gathered are as follows:
The null hypothesis of ANOVA test is that the means between groups are the same. Based on the ANOVA test results, the means of all the new and social media types across all generation groups are statistically significant at 95% confidence interval (p-value < 0.05, reject null hypothesis at 95% CI) which suggests that the mean of new and social media consumption in the past 4 weeks are not the same across generation groups.
Millennials have the highest mean consumption in the past 4 weeks across Digital Newspaper and Magazines (11.1%), Online Video & Music (28.8%) and Social Media (19.7%) followed by Generation X.
For Messaging Apps, the mean percentage of time spent in past 4 weeks are around the same for Baby Boomers (19.2%), Generation X (20.1%) and Millennials (19.1%).
In all New and Social Media catagories, the Silent Generation had negligible or minimal consumption as compared to the other generations.
A static visualisation would not be able to show the values of each individual points (e.g. comparing the static ANOVA plot vs. ggplotly ANOVA plot above) or show the changes over time like a bubble plot unlike an interactive visualisation. Also, an interactive data visualisation gives the DataViz greater level of detail, for instance, showing what does each point in the graph represents rather than just displaying point on a graph. While you could label every single point in a static visualisation, but this becomes a problem when we have a lot of data points. Therefore, it would be wiser to include interactivity so that the user can select and hover over the points to see what is the details for that particular point.
With an interactive visualisation, the user is able to explore the visualisation on their own, deriving insights and patterns, and this gives greater flexibility (i.e. users are not restricted to what they can visualise or explore with the data). For example, including sliders or filters that will change the visualisation accordingly so that the user is able to select variables or parameters that interest them. As such, there will be greater engagement with the users as they would be able to customise and explore what interests them rather than having a default static visualisation.
An interactive visualisation would do a much better job at telling a story rather than a static visualisation when it comes to visualising changes. For example, including animations to display the changes overtime as compared to a static visualisation. Also, an interactive visualisation would be much more compact as compared to a static visualisation that shows the pattern for each year like a ternary plot or bubble plot with multiple plots (e.g. facet by year).