Data Explorer - Ejemplo “Most Streamed Spotify Songs 2024”

R Markdown

spotify <- read.csv("C:\\Users\\karee\\Downloads\\Most Streamed Spotify Songs 2024.csv")

#install.packages("DataExplorer")
library("DataExplorer")
#install.packages("nycflights13")
library(nycflights13)


create_report(spotify)

## 
## 
## processing file: report.rmd

##   |                                             |                                     |   0%  |                                             |.                                    |   2%                                   |                                             |..                                   |   5% [global_options]                  |                                             |...                                  |   7%                                   |                                             |....                                 |  10% [introduce]                       |                                             |....                                 |  12%                                   |                                             |.....                                |  14% [plot_intro]                      |                                             |......                               |  17%                                   |                                             |.......                              |  19% [data_structure]                  |                                             |........                             |  21%                                   |                                             |.........                            |  24% [missing_profile]                 |                                             |..........                           |  26%                                   |                                             |...........                          |  29% [univariate_distribution_header]  |                                             |...........                          |  31%                                   |                                             |............                         |  33% [plot_histogram]                  |                                             |.............                        |  36%                                   |                                             |..............                       |  38% [plot_density]                    |                                             |...............                      |  40%                                   |                                             |................                     |  43% [plot_frequency_bar]              |                                             |.................                    |  45%                                   |                                             |..................                   |  48% [plot_response_bar]               |                                             |..................                   |  50%                                   |                                             |...................                  |  52% [plot_with_bar]                   |                                             |....................                 |  55%                                   |                                             |.....................                |  57% [plot_normal_qq]                  |                                             |......................               |  60%                                   |                                             |.......................              |  62% [plot_response_qq]                |                                             |........................             |  64%                                   |                                             |.........................            |  67% [plot_by_qq]                      |                                             |..........................           |  69%                                   |                                             |..........................           |  71% [correlation_analysis]            |                                             |...........................          |  74%                                   |                                             |............................         |  76% [principal_component_analysis]    |                                             |.............................        |  79%                                   |                                             |..............................       |  81% [bivariate_distribution_header]   |                                             |...............................      |  83%                                   |                                             |................................     |  86% [plot_response_boxplot]           |                                             |.................................    |  88%                                   |                                             |.................................    |  90% [plot_by_boxplot]                 |                                             |..................................   |  93%                                   |                                             |...................................  |  95% [plot_response_scatterplot]       |                                             |.................................... |  98%                                   |                                             |.....................................| 100% [plot_by_scatterplot]

## output file: C:/Users/karee/OneDrive/Documentos/report.knit.md

## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:\Users\karee\OneDrive\Documentos\report.knit.md" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc3a284b1e1cda.html --lua-filter "C:\Users\karee\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\karee\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\karee\AppData\Local\R\win-library\4.4\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\karee\AppData\Local\Temp\Rtmpsd9PRG\rmarkdown-str3a2877135fa5.html"

## 
## Output created: report.html

introduce(spotify)

##   rows columns discrete_columns continuous_columns all_missing_columns
## 1 4600      29               22                  6                   1
##   total_missing_values complete_rows total_observations memory_usage
## 1                 7941             0             133400      5679272

plot_intro(spotify)

#plot_boxplot(spotify)
plot_missing(spotify)

plot_histogram(spotify)

plot_bar(spotify)

## 22 columns ignored with more than 50 categories.
## Track: 4370 categories
## Album.Name: 4005 categories
## Artist: 2000 categories
## Release.Date: 1562 categories
## ISRC: 4598 categories
## All.Time.Rank: 4577 categories
## Spotify.Streams: 4426 categories
## Spotify.Playlist.Count: 4208 categories
## Spotify.Playlist.Reach: 4479 categories
## YouTube.Views: 4291 categories
## YouTube.Likes: 4284 categories
## TikTok.Posts: 3319 categories
## TikTok.Likes: 3616 categories
## TikTok.Views: 3617 categories
## YouTube.Playlist.Reach: 3459 categories
## AirPlay.Spins: 3268 categories
## SiriusXM.Spins: 690 categories
## Deezer.Playlist.Reach: 3559 categories
## Pandora.Streams: 3492 categories
## Pandora.Track.Stations: 2976 categories
## Soundcloud.Streams: 1266 categories
## Shazam.Counts: 4003 categories

plot_correlation(spotify)

## Warning in dummify(data, maxcat = maxcat): Ignored all discrete features since
## `maxcat` set to 20 categories!

## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_text()`).

#Conclusiones Tras analizar las canciones más reproducidas en Spotify en 2024, se observan correlaciones fuertes entre varias variables, lo que sugiere posibles problemas de multicolinealidad en modelos predictivos. Es crucial revisar estas relaciones antes de modelar para garantizar la estabilidad y precisión del modelo. El análisis exploratorio sugiere que se deben considerar ajustes para evitar sesgos en los resultados futuros.

Data Explorer - Ejemplo “Most Streamed Spotify Songs 2024”

Karen Maldonado - A01384635

2024-08-12

R Markdown