PCA to a dataSet of fictional character personalities.

In this post, I am going to apply Principal Component Analysis (PCA) to a dataset of fictional character personalities [213600 11] .

PCA is a common technique for dimensionality reduction, which you might want to do if you are, say, trying to put together a classification model and you have a dataset with a lot of variables.

The dataset I am using is of crowdsourced scores of personality traits for 800 fictional characters from books/movies/TV shows like: StarTrek, Game of Thrones, Pride and Prejudice, and The Lion King.

I will take a TV Series I really love: Two and Half Men. Its a TV Series full of joke. My favourite character is Berta, is really really genius!!!! I love to see her.

## Registering fonts with R
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
## 
## ✔ broom        1.0.0     ✔ rsample      1.1.0
## ✔ dials        1.0.0     ✔ tune         1.0.0
## ✔ infer        1.0.3     ✔ workflows    1.0.0
## ✔ modeldata    1.0.0     ✔ workflowsets 1.0.0
## ✔ parsnip      1.0.1     ✔ yardstick    1.0.0
## ✔ recipes      1.0.1     
## 
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ recipes::fixed()  masks stringr::fixed()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step()   masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
## 
## 
## Attaching package: 'plotly'
## 
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## 
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## 
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## Rows: 213600 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (7): character_code, fictional_work, character_name, gender, spectrum, s...
## dbl (3): mean, ratings, sd
## lgl (1): is_emoji
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data inspection

##  [1] "character_code" "fictional_work" "character_name" "gender"        
##  [5] "spectrum"       "spectrum_low"   "spectrum_high"  "is_emoji"      
##  [9] "mean"           "ratings"        "sd"
## [1] 213600     11
## spec_tbl_df [213,600 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ character_code: chr [1:213600] "A/4" "A/4" "A/4" "A/4" ...
##  $ fictional_work: chr [1:213600] "Alien" "Alien" "Alien" "Alien" ...
##  $ character_name: chr [1:213600] "Ash" "Ash" "Ash" "Ash" ...
##  $ gender        : chr [1:213600] "male" "male" "male" "male" ...
##  $ spectrum      : chr [1:213600] "BAP1" "BAP2" "BAP3" "BAP4" ...
##  $ spectrum_low  : chr [1:213600] "playful" "shy" "cheery" "masculine" ...
##  $ spectrum_high : chr [1:213600] "serious" "bold" "sorrowful" "feminine" ...
##  $ is_emoji      : logi [1:213600] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ mean          : num [1:213600] 41.4 11.1 22.4 -16.9 23.1 3.8 -30.4 -32.8 -24.8 28.9 ...
##  $ ratings       : num [1:213600] 51 63 78 71 72 60 76 74 72 70 ...
##  $ sd            : num [1:213600] 10.9 27.3 14 22.3 25.2 32.9 23.7 20.9 29.1 23.9 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   character_code = col_character(),
##   ..   fictional_work = col_character(),
##   ..   character_name = col_character(),
##   ..   gender = col_character(),
##   ..   spectrum = col_character(),
##   ..   spectrum_low = col_character(),
##   ..   spectrum_high = col_character(),
##   ..   is_emoji = col_logical(),
##   ..   mean = col_double(),
##   ..   ratings = col_double(),
##   ..   sd = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
##  character_code     fictional_work     character_name        gender         
##  Length:213600      Length:213600      Length:213600      Length:213600     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    spectrum         spectrum_low       spectrum_high       is_emoji      
##  Length:213600      Length:213600      Length:213600      Mode :logical  
##  Class :character   Class :character   Class :character   FALSE:188000   
##  Mode  :character   Mode  :character   Mode  :character   TRUE :25600    
##                                                                          
##                                                                          
##                                                                          
##       mean             ratings             sd       
##  Min.   :-49.0000   Min.   :   2.0   Min.   : 0.00  
##  1st Qu.:-17.4000   1st Qu.:  41.0   1st Qu.:21.10  
##  Median : -0.6000   Median : 131.0   Median :25.50  
##  Mean   : -0.3822   Mean   : 206.2   Mean   :24.76  
##  3rd Qu.: 16.7000   3rd Qu.: 292.0   3rd Qu.:28.90  
##  Max.   : 49.4000   Max.   :2459.0   Max.   :45.50

The main fields we’re interested in are: spectrum_low, spectrum_high, and mean:

The spectrum fields tell us what trait is on each end of the spectrum being considered mean is a score (from -50 to +50), where a score closer to -50 means the character is more like the spectrum_low trait and a score closer to +50 means the character is more like the spectrum_high trait. Lets see the live results!!!!

Let’s look at an example character: Charlie Harper from WARNER TV Series:Two and Half Men.

character_code fictional_work character_name gender spectrum spectrum_low spectrum_high is_emoji mean ratings sd
THM/1 Two and Half Men Charlie Harper male BAP1 playful serious FALSE -36.9 114 14.8
THM/1 Two and Half Men Charlie Harper male BAP4 masculine feminine FALSE -35.6 128 16.6
THM/1 Two and Half Men Charlie Harper male BAP5 charming awkward FALSE -29.1 119 23.8
THM/1 Two and Half Men Charlie Harper male BAP6 lewd tasteful FALSE -16.8 115 30.6
THM/1 Two and Half Men Charlie Harper male BAP3 cheery sorrowful FALSE -3.1 98 29.0
THM/1 Two and Half Men Charlie Harper male BAP7 intellectual physical FALSE 29.8 109 26.0
THM/1 Two and Half Men Charlie Harper male BAP8 strict lenient FALSE 35.3 115 21.3
THM/1 Two and Half Men Charlie Harper male BAP2 shy bold FALSE 44.1 116 9.5

Wow Excellent for a drunk-alcoholic-sex addict!!!!!

Data inspection

Charlies` brother: Alan Harper

Let’s look at an example character: Alan Harper from WARNER TV Series Two and Half Men.

character_code fictional_work character_name gender spectrum spectrum_low spectrum_high is_emoji mean ratings sd
THM/2 Two and Half Men Alan Harper male BAP7 intellectual physical FALSE -32.3 106 19.7
THM/2 Two and Half Men Alan Harper male BAP2 shy bold FALSE -19.5 115 23.8
THM/2 Two and Half Men Alan Harper male BAP8 strict lenient FALSE -6.6 122 30.3
THM/2 Two and Half Men Alan Harper male BAP6 lewd tasteful FALSE -3.2 107 30.4
THM/2 Two and Half Men Alan Harper male BAP3 cheery sorrowful FALSE 9.3 132 27.0
THM/2 Two and Half Men Alan Harper male BAP1 playful serious FALSE 15.9 135 27.3
THM/2 Two and Half Men Alan Harper male BAP4 masculine feminine FALSE 20.0 116 19.6
THM/2 Two and Half Men Alan Harper male BAP5 charming awkward FALSE 35.9 96 16.7

Jake Harper Charlie Harper`s nephew

character_code fictional_work character_name gender spectrum spectrum_low spectrum_high is_emoji mean ratings sd
THM/3 Two and Half Men Jake Harper male BAP1 playful serious FALSE -28.6 113 22.9
THM/3 Two and Half Men Jake Harper male BAP6 lewd tasteful FALSE -17.3 81 26.7
THM/3 Two and Half Men Jake Harper male BAP4 masculine feminine FALSE -14.3 107 21.2
THM/3 Two and Half Men Jake Harper male BAP3 cheery sorrowful FALSE -6.0 110 28.4
THM/3 Two and Half Men Jake Harper male BAP2 shy bold FALSE 10.7 118 27.8
THM/3 Two and Half Men Jake Harper male BAP7 intellectual physical FALSE 20.0 120 27.1
THM/3 Two and Half Men Jake Harper male BAP5 charming awkward FALSE 22.7 115 24.5
THM/3 Two and Half Men Jake Harper male BAP8 strict lenient FALSE 36.9 114 15.6

Let’s look at an example character: Berta from WARNER TV Series:Two and Half Men.

character_code fictional_work character_name gender spectrum spectrum_low spectrum_high is_emoji mean ratings sd
THM/4 Two and Half Men Berta female BAP1 playful serious FALSE -19.2 112 24.6
THM/4 Two and Half Men Berta female BAP4 masculine feminine FALSE -17.6 104 19.8
THM/4 Two and Half Men Berta female BAP6 lewd tasteful FALSE -16.3 93 23.9
THM/4 Two and Half Men Berta female BAP5 charming awkward FALSE -1.2 119 29.9
THM/4 Two and Half Men Berta female BAP3 cheery sorrowful FALSE 4.2 128 28.1
THM/4 Two and Half Men Berta female BAP7 intellectual physical FALSE 4.3 111 31.2
THM/4 Two and Half Men Berta female BAP8 strict lenient FALSE 5.8 87 29.7
THM/4 Two and Half Men Berta female BAP2 shy bold FALSE 39.2 109 15.7

The mother: Evelyn Nora Harper

Let’s look at an example character: Evelyn Nora Harper from WARNER TV Series Two and Half Men.

character_code fictional_work character_name gender spectrum spectrum_low spectrum_high is_emoji mean ratings sd
THM/5 Two and Half Men Evelyn Harper female BAP5 charming awkward FALSE -19.5 109 24.9
THM/5 Two and Half Men Evelyn Harper female BAP8 strict lenient FALSE -19.0 111 29.3
THM/5 Two and Half Men Evelyn Harper female BAP7 intellectual physical FALSE -12.1 101 30.4
THM/5 Two and Half Men Evelyn Harper female BAP1 playful serious FALSE -3.3 103 27.6
THM/5 Two and Half Men Evelyn Harper female BAP3 cheery sorrowful FALSE -2.4 99 29.7
THM/5 Two and Half Men Evelyn Harper female BAP6 lewd tasteful FALSE 1.1 96 31.8
THM/5 Two and Half Men Evelyn Harper female BAP4 masculine feminine FALSE 22.7 117 28.5
THM/5 Two and Half Men Evelyn Harper female BAP2 shy bold FALSE 43.0 90 8.7

Judith Harper-Melnick Alan`s former wife

character_code fictional_work character_name gender spectrum spectrum_low spectrum_high is_emoji mean ratings sd
THM/6 Two and Half Men Judith Harper-Melnick female BAP8 strict lenient FALSE -29.5 90 19.7
THM/6 Two and Half Men Judith Harper-Melnick female BAP7 intellectual physical FALSE -9.7 74 26.5
THM/6 Two and Half Men Judith Harper-Melnick female BAP5 charming awkward FALSE 2.9 104 27.3
THM/6 Two and Half Men Judith Harper-Melnick female BAP6 lewd tasteful FALSE 8.7 86 24.2
THM/6 Two and Half Men Judith Harper-Melnick female BAP3 cheery sorrowful FALSE 13.2 80 22.6
THM/6 Two and Half Men Judith Harper-Melnick female BAP4 masculine feminine FALSE 20.0 102 26.0
THM/6 Two and Half Men Judith Harper-Melnick female BAP2 shy bold FALSE 27.2 92 18.0
THM/6 Two and Half Men Judith Harper-Melnick female BAP1 playful serious FALSE 29.6 102 21.1

Now Will look each character’s eight strongest traits. I’ve included the trait on the opposite end of the spectrum in parentheses for reference.