logo

Introduction

In this project we will analyse the trend and patterns in the sentiment score (mood) in I-dle or previously known as (G)I-dle’s lyrics. I-dle, formerly known as (G)I-dle, is a South Korean girl group formed by Cube Entertainment in 2018. The project data is a excel spreadsheet containing all their timestamped lyrics from Lrclib and Lyricstranslation. I’ve manually collected the dataset and cleaned it to ensure that all lyrics are in English and that no lines of lyrics are missing.

Time Series

Started off by installing and loading all the packages we will need into R Studio. In this project, readxl, syuzhet, dplyr, prophet and zoo packages will be used.

library(readxl)
library(dplyr)
library(syuzhet)
library(prophet)
library(zoo)

Importing the excel data with the timestamped lyrics into R and storing it as a data frame using the read_excel function.

Timestamp<-read_excel("Data/I-dle_Timestamp_Lyrics.xlsx")

Filtering is required here to filter out any empty rows of data, by only keeping the rows where data exists in the Lyrics column.

Timestamp<-Timestamp%>%
    filter(!is.na(Lyrics)&Lyrics!="")

Next, I used the get_sentiment function to calculate the sentiment scores (mood) for each line of lyrics. Here I’ve chosen to use the method “afinn”, where “afinn” is a method for scoring sentiments. As it runs through large data much faster than the method “bing”, which was the method I’ve experimented with previously.

Timestamp$sentiment<-get_sentiment(Timestamp$Lyrics,method="afinn")

Grouping the data by their year of release, and calculating the average sentiment scores for each year, whilst ignoring any missing data. The data was then sorted in chronological order from 2018 to 2025, aligned with their average sentiment score.

Yearly_Sentiment_Score<-Timestamp %>%
    group_by(Year) %>%
    summarise(Sentiment_Score=mean(sentiment, na.rm=TRUE)) %>%
    arrange(Year)

Yearly_Sentiment_Score
## # A tibble: 8 × 2
##    Year Sentiment_Score
##   <dbl>           <dbl>
## 1  2018           0.396
## 2  2019           0.434
## 3  2020           0.109
## 4  2021           0.593
## 5  2022           0.271
## 6  2023           0.687
## 7  2024           0.317
## 8  2025           0.775

Here the time series plot shows how the sentiment scores (mood) change over the years in I-dle / (G)I-dle albums.

plot(Yearly_Sentiment_Score$Year, Yearly_Sentiment_Score$Sentiment_Score, 
    type="o", main="Mood Score Over I-dle/(G)I-dle Albums",
    xlab="Year", ylab="Sentiment Score(Mood)")

In the plot, we notice that in 2019 to 2020 there is quite a significant drop in the sentiment scores. During this period the members were participating in Mnet’s Queendom, facing high pressure and public scrutiny. The darker tone in their songs reflects their experience with competition, stress and the need to prove themselves (after the success of their debut of their first mini album “I Am”).

From 2021 to 2022, the sentiment scores drops again. In this period, the mix of defiance and vulnerability in their songs reflects personal growth and coping with changes. During this period, one of the members faced public controversy and had to leave the group, which impacted the group emotionally.

In the period 2023 to 2024 the sentiment scores lowers again. The album “I Sway” was the last album released before the group contract coming to an end. The whole album was quite melancholic and wistful, with sadness of the upcoming disbanding of the group.

The sentiment scores reached the highest in 2025, with the group announcing a new contract and comeback under the name I_dle. The new album “We are” leans into confidence, self worth and emotional resilience, resulting in higher sentiment scores.

Data Frame

To set up the time series as a data frame for the prophet function, I’ve assigned the two columns in Yearly_Sentiment_Score to time column “ds” and data column “y”.

Idle_Sentiment_Score<-data.frame(ds=Yearly_Sentiment_Score$Year, y=Yearly_Sentiment_Score$Sentiment_Score)

Idle_Sentiment_Score
##     ds         y
## 1 2018 0.3957219
## 2 2019 0.4341085
## 3 2020 0.1091954
## 4 2021 0.5928571
## 5 2022 0.2706767
## 6 2023 0.6866953
## 7 2024 0.3174905
## 8 2025 0.7752294

To convert the data frame to a vector of dates, I used the yearmon() function in the zoo package. However since I’ve cleaned the data and the time series is in years, here I can just assign the values accordingly.

Idle_Sentiment_Score$ds<-as.yearmon(Idle_Sentiment_Score$ds)
Idle_Sentiment_Score$y<-Idle_Sentiment_Score$y

Meta Prophet

For this analysis, I used Meta Prophet to forecast yearly time series data, capturing long term trends.

Sentiment_score_forecast_model=prophet(Idle_Sentiment_Score)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## n.changepoints greater than number of observations. Using 5
Sentiment_score_forecast_model
## $growth
## [1] "linear"
## 
## $changepoints
## [1] "2019-01-01 GMT" "2020-01-01 GMT" "2021-01-01 GMT" "2022-01-01 GMT"
## [5] "2023-01-01 GMT"
## 
## $n.changepoints
## [1] 5
## 
## $changepoint.range
## [1] 0.8
## 
## $yearly.seasonality
## [1] "auto"
## 
## $weekly.seasonality
## [1] "auto"
## 
## $daily.seasonality
## [1] "auto"
## 
## $holidays
## NULL
## 
## $seasonality.mode
## [1] "additive"
## 
## $seasonality.prior.scale
## [1] 10
## 
## $changepoint.prior.scale
## [1] 0.05
## 
## $holidays.prior.scale
## [1] 10
## 
## $mcmc.samples
## [1] 0
## 
## $interval.width
## [1] 0.8
## 
## $uncertainty.samples
## [1] 1000
## 
## $backend
## [1] "rstan"
## 
## $specified.changepoints
## [1] FALSE
## 
## $start
## [1] "2018-01-01 GMT"
## 
## $y.scale
## [1] 0.7752294
## 
## $logistic.floor
## [1] FALSE
## 
## $t.scale
## [1] 220924800
## 
## $changepoints.t
## [1] 0.1427454 0.2854908 0.4286273 0.5713727 0.7141181
## 
## $seasonalities
## $seasonalities$yearly
## $seasonalities$yearly$period
## [1] 365.25
## 
## $seasonalities$yearly$fourier.order
## [1] 10
## 
## $seasonalities$yearly$prior.scale
## [1] 10
## 
## $seasonalities$yearly$mode
## [1] "additive"
## 
## $seasonalities$yearly$condition.name
## NULL
## 
## 
## 
## $extra_regressors
## list()
## 
## $country_holidays
## NULL
## 
## $stan.fit
## $stan.fit$par
## $stan.fit$par$k
## [1] 0.3279702
## 
## $stan.fit$par$m
## [1] -0.08378159
## 
## $stan.fit$par$delta
## [1] -3.737458e-10 -9.664025e-11 -1.682758e-10 -8.935243e-10 -2.186429e-10
## 
## $stan.fit$par$sigma_obs
## [1] 0.1786658
## 
## $stan.fit$par$beta
##  [1]  0.08910128 -0.33418296  0.17787260 -0.30286295  0.26598403 -0.25066829
##  [7]  0.35310568 -0.17760700  0.43890778 -0.08369031  0.52306066  0.03106733
## [13]  0.60523483  0.16664827  0.68510095  0.32303167  0.76232991  0.50019346
## [19]  0.83659284  0.69810640
## 
## $stan.fit$par$trend
## [1] -0.083781586 -0.036965352  0.009850883  0.056795381  0.103611615
## [6]  0.150427849  0.197244083  0.244188581
## 
## 
## $stan.fit$value
## [1] 9.628267
## 
## $stan.fit$return_code
## [1] 0
## 
## $stan.fit$theta_tilde
##              k           m      delta[1]      delta[2]      delta[3]
## [1,] 0.3279702 -0.08378159 -3.737458e-10 -9.664025e-11 -1.682758e-10
##           delta[4]      delta[5] sigma_obs    beta[1]   beta[2]   beta[3]
## [1,] -8.935243e-10 -2.186429e-10 0.1786658 0.08910128 -0.334183 0.1778726
##        beta[4]  beta[5]    beta[6]   beta[7]   beta[8]   beta[9]    beta[10]
## [1,] -0.302863 0.265984 -0.2506683 0.3531057 -0.177607 0.4389078 -0.08369031
##       beta[11]   beta[12]  beta[13]  beta[14] beta[15]  beta[16]  beta[17]
## [1,] 0.5230607 0.03106733 0.6052348 0.1666483 0.685101 0.3230317 0.7623299
##       beta[18]  beta[19]  beta[20]    trend[1]    trend[2]    trend[3]
## [1,] 0.5001935 0.8365928 0.6981064 -0.08378159 -0.03696535 0.009850883
##        trend[4]  trend[5]  trend[6]  trend[7]  trend[8]
## [1,] 0.05679538 0.1036116 0.1504278 0.1972441 0.2441886
## 
## 
## $params
## $params$k
## [1] 0.3279702
## 
## $params$m
## [1] -0.08378159
## 
## $params$delta
##               [,1]          [,2]          [,3]          [,4]          [,5]
## [1,] -3.737458e-10 -9.664025e-11 -1.682758e-10 -8.935243e-10 -2.186429e-10
## 
## $params$sigma_obs
## [1] 0.1786658
## 
## $params$beta
##            [,1]      [,2]      [,3]      [,4]     [,5]       [,6]      [,7]
## [1,] 0.08910128 -0.334183 0.1778726 -0.302863 0.265984 -0.2506683 0.3531057
##           [,8]      [,9]       [,10]     [,11]      [,12]     [,13]     [,14]
## [1,] -0.177607 0.4389078 -0.08369031 0.5230607 0.03106733 0.6052348 0.1666483
##         [,15]     [,16]     [,17]     [,18]     [,19]     [,20]
## [1,] 0.685101 0.3230317 0.7623299 0.5001935 0.8365928 0.6981064
## 
## $params$trend
## [1] -0.083781586 -0.036965352  0.009850883  0.056795381  0.103611615
## [6]  0.150427849  0.197244083  0.244188581
## 
## 
## $history
##           ds         y floor         t  y_scaled
## 1 2018-01-01 0.3957219     0 0.0000000 0.5104579
## 2 2019-01-01 0.4341085     0 0.1427454 0.5599743
## 3 2020-01-01 0.1091954     0 0.2854908 0.1408556
## 4 2021-01-01 0.5928571     0 0.4286273 0.7647506
## 5 2022-01-01 0.2706767     0 0.5713727 0.3491569
## 6 2023-01-01 0.6866953     0 0.7141181 0.8857963
## 7 2024-01-01 0.3174905     0 0.8568635 0.4095440
## 8 2025-01-01 0.7752294     0 1.0000000 1.0000000
## 
## $history.dates
## [1] "2018-01-01 GMT" "2019-01-01 GMT" "2020-01-01 GMT" "2021-01-01 GMT"
## [5] "2022-01-01 GMT" "2023-01-01 GMT" "2024-01-01 GMT" "2025-01-01 GMT"
## 
## $train.holiday.names
## NULL
## 
## $train.component.cols
##    additive_terms yearly multiplicative_terms
## 1               1      1                    0
## 2               1      1                    0
## 3               1      1                    0
## 4               1      1                    0
## 5               1      1                    0
## 6               1      1                    0
## 7               1      1                    0
## 8               1      1                    0
## 9               1      1                    0
## 10              1      1                    0
## 11              1      1                    0
## 12              1      1                    0
## 13              1      1                    0
## 14              1      1                    0
## 15              1      1                    0
## 16              1      1                    0
## 17              1      1                    0
## 18              1      1                    0
## 19              1      1                    0
## 20              1      1                    0
## 
## $component.modes
## $component.modes$additive
## [1] "yearly"                    "additive_terms"           
## [3] "extra_regressors_additive" "holidays"                 
## 
## $component.modes$multiplicative
## [1] "multiplicative_terms"            "extra_regressors_multiplicative"
## 
## 
## $fit.kwargs
## list()
## 
## attr(,"class")
## [1] "prophet" "list"

The forecast timelines is accomplished with function make_future_dataframe(), for which the function will forecast the next 3 years (3 period in frequency of years). I’ve chosen to forecast only the next 3 years, because the new contract with the group will end in 2028.

Next_three_years=make_future_dataframe(Sentiment_score_forecast_model,3, freq="year")
Next_three_years
##            ds
## 1  2018-01-01
## 2  2019-01-01
## 3  2020-01-01
## 4  2021-01-01
## 5  2022-01-01
## 6  2023-01-01
## 7  2024-01-01
## 8  2025-01-01
## 9  2026-01-01
## 10 2027-01-01
## 11 2028-01-01

The prediction function will predict the forecast values for the forecast timelines, in this case, it will predict the sentiment scores of their songs for the next 3 year.

Sentiment_score_predictions=predict(Sentiment_score_forecast_model,Next_three_years)
Sentiment_score_predictions
##            ds        trend additive_terms additive_terms_lower
## 1  2018-01-01 -0.064949945      0.4419083            0.4419083
## 2  2019-01-01 -0.028656626      0.3311921            0.3311921
## 3  2020-01-01  0.007636693      0.2187240            0.2187240
## 4  2021-01-01  0.044029446      0.5507387            0.5507387
## 5  2022-01-01  0.080322766      0.4419083            0.4419083
## 6  2023-01-01  0.116616085      0.3311921            0.3311921
## 7  2024-01-01  0.152909404      0.2187240            0.2187240
## 8  2025-01-01  0.189302157      0.5507387            0.5507387
## 9  2026-01-01  0.225595476      0.4419083            0.4419083
## 10 2027-01-01  0.261888795      0.3311921            0.3311921
## 11 2028-01-01  0.298182114      0.2187240            0.2187240
##    additive_terms_upper    yearly yearly_lower yearly_upper
## 1             0.4419083 0.4419083    0.4419083    0.4419083
## 2             0.3311921 0.3311921    0.3311921    0.3311921
## 3             0.2187240 0.2187240    0.2187240    0.2187240
## 4             0.5507387 0.5507387    0.5507387    0.5507387
## 5             0.4419083 0.4419083    0.4419083    0.4419083
## 6             0.3311921 0.3311921    0.3311921    0.3311921
## 7             0.2187240 0.2187240    0.2187240    0.2187240
## 8             0.5507387 0.5507387    0.5507387    0.5507387
## 9             0.4419083 0.4419083    0.4419083    0.4419083
## 10            0.3311921 0.3311921    0.3311921    0.3311921
## 11            0.2187240 0.2187240    0.2187240    0.2187240
##    multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper
## 1                     0                          0                          0
## 2                     0                          0                          0
## 3                     0                          0                          0
## 4                     0                          0                          0
## 5                     0                          0                          0
## 6                     0                          0                          0
## 7                     0                          0                          0
## 8                     0                          0                          0
## 9                     0                          0                          0
## 10                    0                          0                          0
## 11                    0                          0                          0
##    yhat_lower yhat_upper  trend_lower  trend_upper      yhat
## 1  0.19211376  0.5498841 -0.064949945 -0.064949945 0.3769584
## 2  0.12780973  0.4886202 -0.028656626 -0.028656626 0.3025354
## 3  0.04442483  0.3946003  0.007636693  0.007636693 0.2263607
## 4  0.41323838  0.7790636  0.044029446  0.044029446 0.5947682
## 5  0.33950700  0.6950944  0.080322766  0.080322766 0.5222311
## 6  0.26878278  0.6225325  0.116616085  0.116616085 0.4478082
## 7  0.17799950  0.5447289  0.152909404  0.152909404 0.3716334
## 8  0.55556676  0.9265615  0.189302157  0.189302157 0.7400409
## 9  0.49708938  0.8491114  0.225595475  0.225595476 0.6675038
## 10 0.42725327  0.7791596  0.261888793  0.261888797 0.5930809
## 11 0.35234645  0.6993363  0.298182110  0.298182119 0.5169061

Here the forecast plot shows the prediction in sentiment scores in the next 3 years in I-dle / (G)I-dle albums.

plot(Sentiment_score_forecast_model, Sentiment_score_predictions)

The black dots represents the actual sentiment scores from 2018 to 2025. We see that the actual yearly data shows fluctuation with increase and decrease. The blue line is the forecast trend generated by Meta Prophet, representing the expected sentiment scores for the next 3 years. According to the model, we would expected a steady decreasing trend in the sentiment scores from 2025 to 2028. The blue shaded area around the trend line represents the model’s uncertainty interval, showing where the actual values are likely to fall. The interval is quite wide suggesting more uncertainty in prediction of the period 2026 to 2028. The sentiment score drops the lowest in 2028 (in the period 2026 to 2028), which aligns with the disbanding of the new contract in 2028.

Conclusion

With the analysis of the sentiment scores over the years of I-dle / (G)I_dle albums. I’ve noticed that the sentiment scores of each year is influenced strongly by their emotions, knitting into the lyrics of their songs. The forecast trend indicated an overall decreasing trend, this offers insight possibilities of evolving themes and artistic direction.