Simulating song shuffle:
We can simulate a shuffle of the playlist
Experiment 1
experiment1 <- sample_n(playlist_data_clean, 10)
experiment1
## # A tibble: 10 × 3
## genre artist title
## <fct> <chr> <chr>
## 1 Genre 1 - hiphop/rap MF DOOM Doomsday
## 2 Genre 4 - rnb/soul Frank ocean pyramids
## 3 Genre 2 - pop/kpop/Latin Post Malone Goodbyes
## 4 Genre 4 - rnb/soul Lauryn Hill Tell Him
## 5 Genre 1 - hiphop/rap Kendrick Lamar Money Trees
## 6 Genre 1 - hiphop/rap Clipse Ma, I don't love her
## 7 Genre 4 - rnb/soul sza snooze
## 8 Genre 2 - pop/kpop/Latin olivia.R pretty isnt pretty
## 9 Genre 5 - alt/indie/folk Cigarettes after sex Sunsetz
## 10 Genre 1 - hiphop/rap Isaiah Rashad Headshots
experiment1 %>% count(genre)
## # A tibble: 4 × 2
## genre n
## <fct> <int>
## 1 Genre 1 - hiphop/rap 4
## 2 Genre 2 - pop/kpop/Latin 2
## 3 Genre 4 - rnb/soul 3
## 4 Genre 5 - alt/indie/folk 1
experiment1_count <- experiment1 %>% count(genre)
mutate(experiment1_count, genre_prob1 = experiment1_count$n/10)
## # A tibble: 4 × 3
## genre n genre_prob1
## <fct> <int> <dbl>
## 1 Genre 1 - hiphop/rap 4 0.4
## 2 Genre 2 - pop/kpop/Latin 2 0.2
## 3 Genre 4 - rnb/soul 3 0.3
## 4 Genre 5 - alt/indie/folk 1 0.1
ex1_prob <- mutate(experiment1_count, genre_prob1 = experiment1_count$n/10)
prob1 <- ex1_prob %>% select(genre, genre_prob1)
Experiment 2
experiment2 <- sample_n(playlist_data_clean, 30)
experiment2
## # A tibble: 30 × 3
## genre artist title
## <fct> <chr> <chr>
## 1 Genre 2 - pop/kpop/Latin Alec Benjamin Devil Doesn't Bargain
## 2 Genre 5 - alt/indie/folk Cherry Glazerr Soft Drink
## 3 Genre 4 - rnb/soul Drake Look what You’ve done
## 4 Genre 4 - rnb/soul SZA Kill Bill
## 5 Genre 4 - rnb/soul Queen Naija Medicine
## 6 Genre 6 - country/rock Dolly parton Jolene
## 7 Genre 5 - alt/indie/folk alkaline trio Scars
## 8 Genre 3 - house/ska KAYTRANDA BUS RIDE
## 9 Genre 5 - alt/indie/folk Fleetwood Mac Landslide
## 10 Genre 2 - pop/kpop/Latin Franglish Trop Parler
## # ℹ 20 more rows
experiment2 %>% count(genre)
## # A tibble: 6 × 2
## genre n
## <fct> <int>
## 1 Genre 1 - hiphop/rap 3
## 2 Genre 2 - pop/kpop/Latin 4
## 3 Genre 3 - house/ska 1
## 4 Genre 4 - rnb/soul 11
## 5 Genre 5 - alt/indie/folk 10
## 6 Genre 6 - country/rock 1
experiment2_count <- experiment2 %>% count(genre)
mutate(experiment2_count, genre_prob2 = experiment2_count$n/30)
## # A tibble: 6 × 3
## genre n genre_prob2
## <fct> <int> <dbl>
## 1 Genre 1 - hiphop/rap 3 0.1
## 2 Genre 2 - pop/kpop/Latin 4 0.133
## 3 Genre 3 - house/ska 1 0.0333
## 4 Genre 4 - rnb/soul 11 0.367
## 5 Genre 5 - alt/indie/folk 10 0.333
## 6 Genre 6 - country/rock 1 0.0333
ex2_prob <- mutate(experiment2_count, genre_prob2 = experiment2_count$n/30)
prob2 <- ex2_prob %>% select(genre, genre_prob2)
Experiment 3
experiment3 <- sample_n(playlist_data_clean, 75)
experiment3
## # A tibble: 75 × 3
## genre artist title
## <fct> <chr> <chr>
## 1 Genre 4 - rnb/soul Summer Walker To summer, from cole
## 2 Genre 5 - alt/indie/folk The Internet Under Control
## 3 Genre 2 - pop/kpop/Latin shinee Replay
## 4 Genre 2 - pop/kpop/Latin ariana grande everytime
## 5 Genre 1 - hiphop/rap Russ Handsomer
## 6 Genre 5 - alt/indie/folk The Shins New Slang
## 7 Genre 5 - alt/indie/folk Nirvana Come As You Are
## 8 Genre 5 - alt/indie/folk Ginger Root Loretta
## 9 Genre 4 - rnb/soul Mariah the Scientist From A Woman
## 10 Genre 4 - rnb/soul Fountains of Wayne Halley's Waitress
## # ℹ 65 more rows
experiment3 %>% count(genre)
## # A tibble: 6 × 2
## genre n
## <fct> <int>
## 1 Genre 1 - hiphop/rap 11
## 2 Genre 2 - pop/kpop/Latin 11
## 3 Genre 3 - house/ska 4
## 4 Genre 4 - rnb/soul 21
## 5 Genre 5 - alt/indie/folk 19
## 6 Genre 6 - country/rock 9
experiment3_count <- experiment3 %>% count(genre)
mutate(experiment3_count, genre_prob3 = experiment3_count$n/75)
## # A tibble: 6 × 3
## genre n genre_prob3
## <fct> <int> <dbl>
## 1 Genre 1 - hiphop/rap 11 0.147
## 2 Genre 2 - pop/kpop/Latin 11 0.147
## 3 Genre 3 - house/ska 4 0.0533
## 4 Genre 4 - rnb/soul 21 0.28
## 5 Genre 5 - alt/indie/folk 19 0.253
## 6 Genre 6 - country/rock 9 0.12
ex3_prob <- mutate(experiment3_count, genre_prob3 = experiment3_count$n/75)
prob3 <- ex3_prob %>% select(genre, genre_prob3)
We need to create a table that shows all the probabilities that we
have gathered.
full_join(theoretical, ex1_prob, by = "genre")
## # A tibble: 6 × 5
## genre n.x genre_probability n.y genre_prob1
## <fct> <int> <dbl> <int> <dbl>
## 1 Genre 1 - hiphop/rap 26 0.181 4 0.4
## 2 Genre 2 - pop/kpop/Latin 22 0.153 2 0.2
## 3 Genre 3 - house/ska 5 0.0347 NA NA
## 4 Genre 4 - rnb/soul 43 0.299 3 0.3
## 5 Genre 5 - alt/indie/folk 34 0.236 1 0.1
## 6 Genre 6 - country/rock 14 0.0972 NA NA
theory_1 <- full_join(theoretical, ex1_prob, by = "genre")
full_join(theory_1, ex2_prob, by = "genre")
## # A tibble: 6 × 7
## genre n.x genre_probability n.y genre_prob1 n genre_prob2
## <fct> <int> <dbl> <int> <dbl> <int> <dbl>
## 1 Genre 1 - hiphop/… 26 0.181 4 0.4 3 0.1
## 2 Genre 2 - pop/kpo… 22 0.153 2 0.2 4 0.133
## 3 Genre 3 - house/… 5 0.0347 NA NA 1 0.0333
## 4 Genre 4 - rnb/so… 43 0.299 3 0.3 11 0.367
## 5 Genre 5 - alt/ind… 34 0.236 1 0.1 10 0.333
## 6 Genre 6 - country… 14 0.0972 NA NA 1 0.0333
theory_1_2 <- full_join(theory_1, ex2_prob, by = "genre")
full_join(theory_1_2, ex3_prob, by = "genre")
## # A tibble: 6 × 9
## genre n.x genre_probability n.y genre_prob1 n.x.x genre_prob2 n.y.y
## <fct> <int> <dbl> <int> <dbl> <int> <dbl> <int>
## 1 Genre 1 - h… 26 0.181 4 0.4 3 0.1 11
## 2 Genre 2 - p… 22 0.153 2 0.2 4 0.133 11
## 3 Genre 3 - … 5 0.0347 NA NA 1 0.0333 4
## 4 Genre 4 - … 43 0.299 3 0.3 11 0.367 21
## 5 Genre 5 - a… 34 0.236 1 0.1 10 0.333 19
## 6 Genre 6 - c… 14 0.0972 NA NA 1 0.0333 9
## # ℹ 1 more variable: genre_prob3 <dbl>
theory_1_2_3 <- full_join(theory_1_2, ex3_prob, by = "genre")
theory_vs_experiments <- theory_1_2_3%>%select(genre, genre_probability, genre_prob1, genre_prob2, genre_prob3)
theory_vs_experiments
## # A tibble: 6 × 5
## genre genre_probability genre_prob1 genre_prob2 genre_prob3
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Genre 1 - hiphop/rap 0.181 0.4 0.1 0.147
## 2 Genre 2 - pop/kpop/Latin 0.153 0.2 0.133 0.147
## 3 Genre 3 - house/ska 0.0347 NA 0.0333 0.0533
## 4 Genre 4 - rnb/soul 0.299 0.3 0.367 0.28
## 5 Genre 5 - alt/indie/folk 0.236 0.1 0.333 0.253
## 6 Genre 6 - country/rock 0.0972 NA 0.0333 0.12
The bigger the sample is, the closer the experimental probability
gets to the theoretical probability.
Analyze and synthesize:
To answer the unit question, yes, the songs you hear on shuffle ARE
representative of the genres on the playlist, but it becomes more true
when you play more songs.
Theoretical probability is from the whole data set, whereas
experimental probability is using data from actual samples. In this
project we saw theoretical probability when we calculated the
probability of each genre in the entire playlist. Then we saw
experimental probability when we calculated probability in the
experiments we conducted.
Here is how we used the data science process in this project:
As questions: We asked the unit question to guide us.
Gather and organize data: We collected songs from everyone in the
class and sorted them by genre.
Model: We used R studio to model the data in tables.
Analyze and synthesize: We looked at the results and considered the
implications of the results.
Reflection:
Using a simulation to answer the question helped me get an idea of
what it would be like to play a random selection of songs, but with the
random selection being simulated with a simple code. You could use
coding in this same way to simulate numbers on dice being rolled, or
other situations with a set of possible outcomes.
The most dificult part to code for me was using mutate to add
columns showing probability. It took me a while to figure out the syntax
properly.
The part of my coding I’m most proud of is the table I created
comparing the theoretical probability and all the experimental
probabilities. It was an idea I had and became ambitious to figure it
out. It took a lot of trial and error, of course, but with a lot of
editing I got it figured out.
I would like to learn more about creating nice looking graphics such
as bar graphs and scatter plots. I want to learn how to mess with the
aesthetics of it.
I don’t think I’d do this project any differently because I think I
learned things along the way and I’m proud of what I learned.