Background

Twin on Task

The task of the assignment was to have at least six individuals rate five movies. However, it was not very suitable in my case as those I know very rarely watch movies, so little that they have never seen Black Panther! Yes, this makes me the cool friend. Therefore, I opt for the one thing that we all like watching, that is Korean Popular Music Videos.


Overview

K-Pop has become quite a phenomenon in Western society over the last decade. SHINee is one of distinguished boy bands that continues to greatly influence the trends of K-Pop music industry. They debuted as a 5-member group in 2008, and since then has release over 35 music videos.

This assignment seeks to investigate what is the most appealing attribute to fans about 5 random SHINee music videos, and which video has an equal distribution of the attributes based on fans’ rankings.


MVs

The five music videos that were selected from SHINee’s discography are:

Music_Videos Released_Date
Sherlock 2012
Everybody 2013
View 2015
Tell Me What To Do 2016
Good Evening 2018

Methodology

The Attributes

The music videos were ranked based on these attributes:

  1. Visual - the aspects of the music video that appeals strictly to the sense of sight.
  2. Theme/Story - how captivating the themes of the music video was depicted or how clear the music video told a story.
  3. Performance - how compelling the band/actors were at expressing the concept of the music video.

Survey

Utilizing the influence of my social media profile with over 1000 followers, on February 5th, 2019 from 9 AM to 2 PM EST, I announced and made available a link to a SurveyMonkey form that allows my followers, nearly 85% of whom are fans of SHINee, to participate in this survey.

Because the participants are devoted fans, I minimized bias by changing from a ‘rating scale’ to a ‘ranking scale’, where the participants are asked to rank the music videos from 1st place to 5th place over 3 attributes. For example, if someone ranks a specific attribute of the music video as 1, both the ranking spot and music video can no longer be used for the remaining four ranks in this attribute category. A rank in the first spot (1) means it is the best for the specific attribute than the others.

The following is a screenshot of what the survey looks like which allows participants to drag and drop their rankings for each attribute.

Screenshot

Screenshot


The Assignment Task

Data Summary

After the allotted time for responses elapsed, the survey was closed. There were a total of 50 participants. The .csv file was exported from SurveyMonkey, and cleaned to be uploaded as a database on MySQL.


SQL Database

The following code created the database on MySQL.

-- Samantha Deokinanan
-- CUNY MSDS DATA 607 Assignment 2
-- Database: SHINee Ranking 2/5/2019
-- Host: localhost    
-- ------------------------------------------------------

-- Table structure for contengency table 'SHINeeRanking' 
DROP TABLE IF EXISTS `SHINeeRanking`;
CREATE TABLE `SHINeeRanking` (
  `ID` integer NOT NULL,
  `mvs` varchar(100) NOT NULL,
  `first_visual` varchar(100) NOT NULL,
  `second_visual` varchar(100) NOT NULL,
  `third_visual` varchar(100) NOT NULL,
  `forth_visual` varchar(100) NOT NULL,
  `fifth_visual` varchar(100) NOT NULL,
  `first_theme_story` varchar(100) NOT NULL,
  `second_theme_story` varchar(100) NOT NULL,
  `third_theme_story` varchar(100) NOT NULL,
  `forth_theme_story` varchar(100) NOT NULL,
  `fifth_theme_story` varchar(100) NOT NULL,
  `first_performance` varchar(100) NOT NULL,
  `second_performance` varchar(100) NOT NULL,
  `third_performance` varchar(100) NOT NULL,
  `forth_performance` varchar(100) NOT NULL,
  `fifth_performance` varchar(100) NOT NULL,
   PRIMARY KEY (`ID`)
  );

-- Dumping data for table `SHINeeRanking`
LOAD DATA INFILE 'path\to\hw2dataset_SHINeeRanking.csv'
INTO TABLE SHINeeRanking
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Connect to R

Now, it is time to connect R to MySQL database.

# Loading the required library 
library(RMySQL)
# Connecting to MySQL Sever
db<- dbConnect(MySQL(), user = 'root', password='', dbname = 'shineeranking', host = 'localhost', port=3306)
# Retrieving the database and make it avaliable in RStudio
retrieved<- dbSendQuery(db, "SELECT * FROM shineeranking")
SHINeeRanking<- fetch(retrieved, n = -1)
SHINeeRanking<- SHINeeRanking[, -1]
# The database of the music video ranking
library(kableExtra)
kable(SHINeeRanking) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "200px")
mvs first_visual second_visual third_visual forth_visual fifth_visual first_theme_story second_theme_story third_theme_story forth_theme_story fifth_theme_story first_performance second_performance third_performance forth_performance fifth_performance
Sherlock 0 0 14 21 15 10 7 14 13 6 17 15 7 9 2
Everybody 8 12 11 11 8 7 4 2 13 24 16 9 10 12 3
View 13 12 4 10 11 7 14 13 8 8 6 10 17 9 8
Tell Me What To Do 15 6 14 4 11 10 9 9 12 10 3 10 7 10 20
Good Evening 14 20 7 4 5 16 16 12 4 2 8 6 9 10 17

Format for Use

Using SHINeeRanking, the database was separated based on the attributes and converted to matrix form for analysis.

# Visual Ranking
visual<-SHINeeRanking[,c(2:6)]
row.names(visual)<-SHINeeRanking$mvs
colnames(visual)<-c("first", "second", "third", "forth", "fifth")
visualm<-data.matrix(visual)

# Theme/Story Ranking
theme_story<-SHINeeRanking[,c(7:11)]
row.names(theme_story)<-SHINeeRanking$mvs
colnames(theme_story)<-c("first", "second", "third", "forth", "fifth")
theme_storym<-data.matrix(theme_story)

# Performance Ranking
performance<-SHINeeRanking[,c(12:16)]
row.names(performance)<-SHINeeRanking$mvs
colnames(performance)<-c("first", "second", "third", "forth", "fifth")
performancem<-data.matrix(performance)

Results

Statistics

I wanted to visualize the nature of the dependence of how each music video rank among a specific attribute. Therefore, chi-squared statistic were used, where this test evaluates whether there is a significant association between the categories.

  • Null hypothesis (H0): the row and the column variables of the contingency table are independent.
  • Alternative hypothesis (H1): row and column variables are dependent.

For visualization, corrplot() was used. Positive association are displayed in white and negative association in black color. Color intensity and the size of the circle are proportional to the correlation coefficients.

library(corrplot)

Visual

Firstly, for Visual, the test revealed that the variables are statistically associated, p-value < 0. The most contributing cells can be shown below.

The music videos with the frequent rankings of first place for Visual are Tell Me What To Do and Good Evening. While looking at the music video Sherlock, it is clear that it wasn’t commonly ranked as first or second for Visual.

chi_visual<-chisq.test(visualm)
chi_visual
## 
##  Pearson's Chi-squared test
## 
## data:  visualm
## X-squared = 70.6, df = 16, p-value = 7.832e-09
corrplot(chi_visual$residuals, col = c("black", "white"), bg = "lightblue", is.cor = FALSE, sig.level = .05)


Theme/Story

Next, the test revealed that the variables for Theme/Story are also statistically associated, p-value < 0. The most contributing cells can be shown below by the plot. The music video with the most association ranking of first place for the best theme or story was Good Evening. While Everybody was frequently ranked fifth place among the music videos.

chi_theme<-chisq.test(theme_storym)
chi_theme
## 
##  Pearson's Chi-squared test
## 
## data:  theme_storym
## X-squared = 58.8, df = 16, p-value = 8.326e-07
corrplot(chi_theme$residuals, col = c("black", "white"), bg = "lightblue", is.cor = FALSE, sig.level = .05)


Performance

Lastly, the chi-squared test revealed that the variables are statistically associated, p-value < 0. The most contributing cells can be shown below. The Performance attribute of the music videos were mostly associated with Sherlock and Everybody. While it is clear that Tell Me What To Do and Good Evening were commonly associated as fifth place.

chi_pref<-chisq.test(performancem)
chi_pref
## 
##  Pearson's Chi-squared test
## 
## data:  performancem
## X-squared = 53.6, df = 16, p-value = 6.026e-06
corrplot(chi_pref$residuals, col = c("black", "white"), bg = "lightblue", is.cor = FALSE, sig.level = .05)


Conclusion

In conclusion, the music video that does not appear to be frequently ranked as outstanding to have one of the three attributes was View. It was close to a even rank on Visual, ranked often as second and third for Theme/Story, and often ranked third for Performance. This may suggest that fans believe that View is well-balanced on Visual, Theme/Story and Performance with no attribute being distinctively impressive over the next. Whereas, Good Evening was mostly associated with both Visual and Theme/Story, and Sherlock was mostly associated with the Performance attribute.