Do these “cluster”?

data <- read.csv("/home/laur/Desktop/vultology_signals_pcs/full_DB/full_DB_export-Energetic_part.csv", header=T, row.names = 1, sep="\t")
# Fill in any blank cells with zeroes
data[is.na(data)] <- 0
# Column names are the people's names
names(data)

##  [1] "R1_Rigid_Posture_Copy"   "R2_Face_Centric"        
##  [3] "R3_Punctuated_Motions"   "R4_Vertical_Movements"  
##  [5] "R5_Subordinate_Fluidity" "F1_Fluid_Posture"       
##  [7] "F2_Eye_Centric"          "F3_Gliding_Motions"     
##  [9] "F4_Horizontal_Movements" "F5_Subordinate_Rigidity"
## [11] "PF1_Restless_Momentum"   "PF2_Toggling_Eyes"      
## [13] "PF3_Body_Swaying"        "PF4_Casual_Hands"       
## [15] "PF5_Alert_Perk_Ups"      "RF1_Viscous_Inertia"    
## [17] "RF2_Fixed_Gaze"          "RF3_Diagonal_Eye_Drifts"
## [19] "RF4_Searching_Scowling"  "RF5_Narrow_Head_Zoning" 
## [21] "PR1_Head_Pushes"         "PR2_Head_Shakes"        
## [23] "PR3_Shoulder_Shrugs"     "PR4_Fluent_Articulation"
## [25] "PR5_Projecting_Hands"    "RR1_Poised_Receding"    
## [27] "RR2_Disengaging_Eyes"    "RR3_Exerted_Pushes"     
## [29] "RR4_Momentum_Halting"    "RR5_Meticulous_Hands"

Histograms give us a visual idea of the prevalences of energetic signals among the population in the database

There are really a lot of 0’s in all of the above, more than any other score actually.

Objective: Get a person’s “Rigid score” and “Fluid score”, which would be sum of R1-5, and sum of F1-5, respectively.

Use first person (Coer de Pirate) as example:

rigid_score_Coer <- sum(data[1,1:5])
fluid_score_Coer <- sum(data[1,6:10])
rigid_fluid_scores <- c(rigid_score_Coer, fluid_score_Coer)
df <- data.frame(name=c("rigid_score","fluid_score"), value=rigid_fluid_scores)
barplot(height=df$value, names=df$name, main="Coer de Pirate")

Visualize trends in tendencies for continuous trait distribution (i.e. bimodality)

That is, show that persons tend to be either predominant in Rigid signals or in Fluid signals. First, make a percent, like PctRigid. For example, Coer would be negative (or below 1, because they are fluid, i.e. more fluid than Rigid) e.g, pctRigid = 12/(12+18) and then pctFluid = 18/(12+18)

## [1] 0.4

## [1] 0.6

##       Cœur de Pirate      Michael Gervais            Joan Jett 
##                   12                   32                   12 
##            Ben Stein Billie Joe Armstrong       Dean Schneider 
##                    6                   21                    8

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   6.593  10.000  35.000

fluid_scores <- colSums(tdata[6:10, ])
percent_rigid <- rigid_scores/(rigid_scores + fluid_scores)
percent_fluid <- fluid_scores/(rigid_scores + fluid_scores)
hist(percent_fluid) ### Clear bimodality here

hist(percent_rigid) ### Clear bimoadality here

summary(percent_rigid)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.1875  0.3704  0.4685  0.7838  1.0000     553

The Summary stats and Boxplots clearly show that many entries are blank (no data for many persons in the database). It is better, then, to elimnate samples who have a total of 0 for their energetic signal total points. That is not realistic and therefore represents missing data. So remove them. Let nz be a cut of the dataframe for only non zero-sum people:

nz <- tdata[ ,colSums(tdata)!=0]

# Now redo stats for the nonzero cut
rigid_scores <- colSums(nz[1:5,])
hist(rigid_scores)

summary(rigid_scores)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00   10.00   13.77   23.50   35.00

boxplot(rigid_scores) #that looks a lot better

fluid_scores <- colSums(nz[6:10, ])
#
percent_rigid <- rigid_scores/(rigid_scores + fluid_scores)
percent_fluid <- fluid_scores/(rigid_scores + fluid_scores)
hist(percent_fluid) ### still look good

hist(percent_rigid) ### still look good

summary(percent_rigid)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.1875  0.3704  0.4685  0.7838  1.0000       1

boxplot(fluid_scores)

Now, to see “clusters”, begin by plotting the rigid scores for people in a simple x-y

##  [1] "R1_Rigid_Posture_Copy"   "R2_Face_Centric"        
##  [3] "R3_Punctuated_Motions"   "R4_Vertical_Movements"  
##  [5] "R5_Subordinate_Fluidity" "F1_Fluid_Posture"       
##  [7] "F2_Eye_Centric"          "F3_Gliding_Motions"     
##  [9] "F4_Horizontal_Movements" "F5_Subordinate_Rigidity"
## [11] "PF1_Restless_Momentum"   "PF2_Toggling_Eyes"      
## [13] "PF3_Body_Swaying"        "PF4_Casual_Hands"       
## [15] "PF5_Alert_Perk_Ups"      "RF1_Viscous_Inertia"    
## [17] "RF2_Fixed_Gaze"          "RF3_Diagonal_Eye_Drifts"
## [19] "RF4_Searching_Scowling"  "RF5_Narrow_Head_Zoning" 
## [21] "PR1_Head_Pushes"         "PR2_Head_Shakes"        
## [23] "PR3_Shoulder_Shrugs"     "PR4_Fluent_Articulation"
## [25] "PR5_Projecting_Hands"    "RR1_Poised_Receding"    
## [27] "RR2_Disengaging_Eyes"    "RR3_Exerted_Pushes"     
## [29] "RR4_Momentum_Halting"    "RR5_Meticulous_Hands"

As we can see in these histograms, people are scored at either 0, 2, 4, or 7 for these signals (none, lo, med, hi?). Now, let’s see the co-occurrence patterns of people’s scores in each of the signals.

Paired bar plots for each person for their rigid and fluid scores might be insightful. First for a small sample:

Try whole dataset (not done yet)

row.names(data) <- gsub(" ", "_", row.names(data))

#Add categorical variable for if signals or rigid or fluid
data$RigidScore <- rowSums(data[,1:5])
data$FluidScore <- rowSums(data[,6:10])
#Reshape to long form
#reshape(data, direction="long", varying = list(names(data)[31:32]) )

Principal Component Analysis, to explore co-occurrence of signals

nz.pca <- prcomp(nz)
library(ggfortify)
nz.pca.plot <- autoplot(nz.pca, data=nz)
nz.pca.plot

variety <- c(rep("rigid",5), rep("fluid",5), rep("PF",5), rep("RF",5), rep("PR",5), rep("RR",5) )

nzvar <- cbind.data.frame(nz, variety)

pca_res <- prcomp(nz, scale=TRUE)
p <- autoplot(pca_res, data=nzvar, color='variety', label=TRUE, shape=TRUE)
p

# Looks good. 
# TODO: Next do, one with only Rigid and Fluid, and also one with only PR, RR, PF, and RF.

Energetics_visualisations

laur

2025-06-04

Read in the Energetic part of the Vultology database and get a feeling for the data

Some questions to explore are, what is the distribution of the energetic signals among people in the database?