February 12, 2024

Introduction

  • The analysis of the 10,000 Apple Music Tracks dataset aims to explore various attributes and trends within the collection of songs. This dataset contains information such as artist names, track names, release dates, genres, and more. Analyzing this dataset can provide valuable insights into the music industry, popular genres, artist popularity, and consumer preferences.

  • Throughout this analysis, we’ll utilize tools like R, ggplot2 for visualization, and statistical modeling techniques to uncover patterns and relationships within the dataset.

Setup

To conduct our analysis, we’ll be using the R programming language along with essential packages for data manipulation and visualization. Here are the key components: ggplot2 for creating visualizations to illustrate trends and patterns in the dataset. dplyr for data manipulation tasks such as filtering, summarizing, and arranging data. stats package for statistical analysis and modeling.

Loading and Preparing Data

Before diving into the analysis, we need to load the dataset and perform necessary data preparation steps such as cleaning and organizing the data.

## 'data.frame':    10000 obs. of  24 variables:
##  $ artistId              : int  46087 20044 486597 156987 46087 20044 889327 4488522 138226712 20044 ...
##  $ artistName            : chr  "Erick Sermon" "Madonna" "Journey" "Jason Mraz" ...
##  $ collectionCensoredName: chr  "Music" "Music" "Greatest Hits (2024 Remaster)" "We Sing. We Dance. We Steal Things" ...
##  $ collectionId          : int  298321651 80815197 169003304 277635758 298429528 329064696 155658405 545398133 486040153 329043011 ...
##  $ collectionName        : chr  "Music" "Music" "Greatest Hits (2024 Remaster)" "We Sing. We Dance. We Steal Things" ...
##  $ collectionPrice       : num  9.99 9.99 10.99 11.99 9.99 ...
##  $ contentAdvisoryRating : chr  "Explicit" "" "" "" ...
##  $ country               : chr  "USA" "USA" "USA" "USA" ...
##  $ currency              : chr  "USD" "USD" "USD" "USD" ...
##  $ discCount             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ discNumber            : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ isStreamable          : chr  "True" "True" "True" "True" ...
##  $ kind                  : chr  "song" "song" "song" "song" ...
##  $ previewUrl            : chr  "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview125/v4/64/61/b5/6461b58b-c65e-a48b-30fe-1c73224277"| __truncated__ "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview115/v4/f7/3a/72/f73a728e-5e89-8c8d-69f2-578e6e2d18"| __truncated__ "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview126/v4/5c/72/97/5c72974f-6022-f760-ad82-35964fb636"| __truncated__ "https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview116/v4/31/eb/0d/31eb0dab-85d5-84db-009c-14b17db984"| __truncated__ ...
##  $ primaryGenreName      : chr  "Hip-Hop/Rap" "Pop" "Rock" "Pop" ...
##  $ releaseDate           : chr  "2001-08-27T12:00:00Z" "2000-08-21T07:00:00Z" "1981-06-03T07:00:00Z" "2008-02-12T08:00:00Z" ...
##  $ trackCensoredName     : chr  "Music (feat. Marvin Gaye)" "Music" "Don't Stop Believin' (2024 Remaster)" "I'm Yours" ...
##  $ trackCount            : int  16 10 16 12 16 68 13 15 11 37 ...
##  $ trackExplicitness     : chr  "explicit" "notExplicit" "notExplicit" "notExplicit" ...
##  $ trackId               : int  298321904 80815173 169003415 277635828 298429596 329064769 155658510 545398139 486040194 329043268 ...
##  $ trackName             : chr  "Music" "Music" "Don't Stop Believin' (2024 Remaster)" "I'm Yours" ...
##  $ trackNumber           : int  4 1 2 2 4 59 3 4 2 2 ...
##  $ trackPrice            : num  1.29 1.29 1.29 1.29 1.29 1.99 1.29 1.29 1.29 1.29 ...
##  $ trackTimeMillis       : int  223133 225973 250835 242947 223133 286687 218093 242721 277040 225813 ...

Data Cleaning Steps

##     artistId          artistName        collectionCensoredName
##  Min.   :1.196e+04   Length:9919        Length:9919           
##  1st Qu.:4.689e+05   Class :character   Class :character      
##  Median :6.766e+06   Mode  :character   Mode  :character      
##  Mean   :1.688e+08                                            
##  3rd Qu.:2.756e+08                                            
##  Max.   :1.669e+09                                            
##   collectionId       collectionName     collectionPrice  contentAdvisoryRating
##  Min.   :9.529e+05   Length:9919        Min.   : -1.00   Length:9919          
##  1st Qu.:2.598e+08   Class :character   1st Qu.:  9.99   Class :character     
##  Median :4.201e+08   Mode  :character   Median : 10.99   Mode  :character     
##  Mean   :5.837e+08                      Mean   : 11.05                        
##  3rd Qu.:9.096e+08                      3rd Qu.: 12.99                        
##  Max.   :1.728e+09                      Max.   :149.99                        
##    country            currency           discCount        discNumber    
##  Length:9919        Length:9919        Min.   : 1.000   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.: 1.000   1st Qu.: 1.000  
##  Mode  :character   Mode  :character   Median : 1.000   Median : 1.000  
##                                        Mean   : 1.094   Mean   : 1.041  
##                                        3rd Qu.: 1.000   3rd Qu.: 1.000  
##                                        Max.   :28.000   Max.   :14.000  
##  isStreamable           kind            previewUrl        primaryGenreName  
##  Length:9919        Length:9919        Length:9919        Length:9919       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  releaseDate        trackCensoredName    trackCount     trackExplicitness 
##  Length:9919        Length:9919        Min.   :  1.00   Length:9919       
##  Class :character   Class :character   1st Qu.: 12.00   Class :character  
##  Mode  :character   Mode  :character   Median : 14.00   Mode  :character  
##                                        Mean   : 14.66                     
##                                        3rd Qu.: 17.00                     
##                                        Max.   :100.00                     
##     trackId           trackName          trackNumber       trackPrice    
##  Min.   :9.526e+05   Length:9919        Min.   : 1.000   Min.   :-1.000  
##  1st Qu.:2.598e+08   Class :character   1st Qu.: 2.000   1st Qu.: 1.290  
##  Median :4.201e+08   Mode  :character   Median : 6.000   Median : 1.290  
##  Mean   :5.837e+08                      Mean   : 6.878   Mean   : 1.196  
##  3rd Qu.:9.096e+08                      3rd Qu.:10.000   3rd Qu.: 1.290  
##  Max.   :1.728e+09                      Max.   :75.000   Max.   : 1.990  
##  trackTimeMillis 
##  Min.   :  8192  
##  1st Qu.:197728  
##  Median :225328  
##  Mean   :232864  
##  3rd Qu.:258867  
##  Max.   :943529

Exploratory Data Analysis

Let’s begin by exploring the dataset to get a better understanding of its structure and contents.

## 
##   Adult Alternative  Adult Contemporary           Afrobeats         Alternative 
##                   1                   3                   1                1107 
##               Blues                 CCM    Children's Music           Christian 
##                   3                   3                  45                 174 
##           Christmas  Christmas: Classic  Christmas: Country      Christmas: Pop 
##                   1                   9                   6                  80 
##      Christmas: R&B     Christmas: Rock           Classical Classical Crossover 
##                   1                   1                  67                   1 
##              Comedy             Country               Dance         Dirty South 
##                  13                1365                 155                   3 
##      Easy Listening          Electronic   Fitness & Workout                Folk 
##                  11                  62                  14                   5 
##           Hard Rock             Hip-Hop         Hip-Hop/Rap             Holiday 
##                 545                  17                1012                  75 
##          Indie Rock                Jazz               Latin               Metal 
##                   3                   7                  12                 111 
##     Música tropical            Musicals            New Wave      Outlaw Country 
##                  24                  76                   5                   1 
##                 Pop          Pop Latino            Pop/Rock         Psychedelic 
##                2185                  15                   2                   2 
##                Punk            R&B/Soul                 Rap              Reggae 
##                   6                 444                  29                  13 
##                Rock  Rock y Alternativo   Singer/Songwriter           Soft Rock 
##                1325                   2                 156                  16 
##          Soundtrack       Southern Rock Traditional Country       Urbano latino 
##                 688                   1                   1                   6 
##          Video Game               Vocal           Vocal Pop           Worldwide 
##                   1                   2                   1                   5

Visualizing Trends

Analyzing Relationships

We can analyze relationships between variables such as track duration, release date, and genre popularity.

Apply Statistical Procedures

Hypothesis Testing

Let’s perform a hypothesis test to compare the means of track durations between different primary genres. We’ll use ANOVA (Analysis of Variance) to test whether there are significant differences in mean track durations among the genres.

##                    Df    Sum Sq   Mean Sq F value Pr(>F)    
## primaryGenreName   55 5.295e+12 9.626e+10   25.66 <2e-16 ***
## Residuals        9863 3.700e+13 3.751e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression Modeling

Next, let’s fit a regression model to examine the relationship between track duration and release date. We’ll use linear regression for this analysis.

## 
## Call:
## lm(formula = trackTimeMillis ~ as.Date(releaseDate), data = songs_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -223399  -34555   -6962   26098  713247 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.385e+05  1.603e+03 148.849  < 2e-16 ***
## as.Date(releaseDate) -4.551e-01  1.173e-01  -3.879 0.000105 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 65250 on 9917 degrees of freedom
## Multiple R-squared:  0.001515,   Adjusted R-squared:  0.001415 
## F-statistic: 15.05 on 1 and 9917 DF,  p-value: 0.0001054

Correlation Analysis

We can also calculate correlation coefficients to quantify the strength and direction of relationships between variables. Let’s calculate the correlation coefficient between track duration and popularity.

## Correlation between track duration and track price: -0.0524848

Conclusion

  • The analysis of the 10,000 Apple Music Tracks dataset provides valuable insights into various aspects of the music industry. From exploring genre distribution to analyzing relationships between track attributes, this analysis offers a comprehensive understanding of the dataset.

  • The statistical procedures applied, including hypothesis testing, regression modeling, and correlation analysis, further enhance our understanding of the dataset and provide insights into relationships between variables.

  • Further exploration and analysis can be conducted to delve deeper into specific aspects of the dataset and uncover more insights.