Motivation:
Tiktok (https://www.tiktok.com/en/) is an immensely popular social networking and mini-video sharing application worldwide with more than 2 billion downloads since Apr 20201. It is especially around the Gen Z and millennial age groups but has also penetrated the other age groups due to the virality and relevance and engagement of its content. Very similar to the social media platforms like Facebook, Instagram, Twitter, WeChat that have reached a massive scale, the platform both has the potential to shape and propagate popular culture, politics and consumer trends - and these information is in parts captured by video content, hashtags, descriptions and meta-data like “shares”, “view counts”, “likes”, “mentions” etc. Hence, I would like to build an interactive visualization to explore the key trends around these.
Data Challenges:
Design Challenge:
2. Provide step-by-step description on how the data visualization was prepared by using ggplot2 and other related R packages. (3 marks)
Step 0 - Data prep | The data was first extracted using TikTok API and prepped using Pandas creating the file “cleaned_trending_5000” which is loaded into R.
Step 1: First using GGplot 2, I tried to create the main visualization which is the plot of Hashtag counts by their respective count based on static data. As shown in the plot below the chart is not interactive and only shows the top 15 hashtags.
Step 2: Using the Shiny Reactive function and the “dplyr”library, I added the logic to allow UI elements to respond to user filters and selection on the fly. The 3 filter capabilities bind the values of the “No. of likes” slider, “Video Length slider” and “Search by Hashtag” input box to filter the plots and datatables.
UI:
Server - Filter logic:
Step 3: Next, I added a data table below using the “DT” library to allow users to dive into the details of each video. The main logic of the table is also shown below:
Server:
Step 4: Then I made the original hashtag plots responsive. This is harder to achieve because it requires the responsive counting of hashtag as user interacts with the dashboard. I experimented with “stringr” and “tidytext” libraries for strings processing but ended writing a the function using the base R/Shiny code.
Server - logic to count and rank by top hashtags
Step 5: Last but not least, 2 more responsive plots, showing Likes count and View counts are plotted using the “ggplot” library, and arranged into the UI using tabular views to the interface is cleaner.
UI - showing tabset widget:
Server - Logic for Views and Likes:
3. The final data visualization and a short description of not more than 350 words. The description must provide at least two useful information revealed by the data visualization. (4 marks)
The key final visualization shows what are the top 15 keys by various additive filters allowing for interactive analysis. Furthermore, a data table is provided below to give the users additional details and context of the visualization. Let’s walk through what this visualization:
At (A), the search for hashtag bar, “#comedy” is set as the input text. This will cause the visualization in (B) and the data table in (C) to be filtered by the comedy hashtag.
This leads us to observation (1): Alongside #comedy, we are able to see what are some of the hashtags that are often used together with it for trending videos. Here we can see some more informative hashtags like:
Using this we are able to identify what are the some of the key social trends in the world as well as what is generating buzz on the heavily used platform.
Observation (2): Using the data table in (C) one will be able to glean more insights on what does a particular hashtag via the video description along side the video analytics. For example: the video created by the user arnaldomangini has the following description “
“Do you wanna dance with me?#laxedsirenbeat #laxedsirenbeatchallenge #tiktok #tiktokdance #comedy #clown”
Which we are able to see that its a comedic version of a dance competition and dance challenges are indeed very interactive and popular on TikTik. This 14 seconds video almost has 1bn views and 43 mm shares.
Besides this, users are able to filter the video by likes, by count and also get a sets of the likes and video views distribution of popular trending videos.
The key limitation for this visualization is that I am not able to obtain time series data because of the API throttle and limitation. Hence, am not able to provide momentum and time based views on hashtags/popularity trends.
Furthermore, Tiktok only provides for a small sub-set of query via the API which limits the analysis to top users / videos / hashtags, etc. We are not then able to examine some of the sub-cultures like “#gundam”, “#mimic”, etc
Future work: