data <- read.csv("/home/laur/Desktop/vultology_signals_pcs/full_DB/full_DB_export-Energetic_part.csv", header=T, row.names = 1, sep="\t")
# Fill in any blank cells with zeroes
data[is.na(data)] <- 0
# Column names are the people's names
names(data)
## [1] "R1_Rigid_Posture_Copy" "R2_Face_Centric"
## [3] "R3_Punctuated_Motions" "R4_Vertical_Movements"
## [5] "R5_Subordinate_Fluidity" "F1_Fluid_Posture"
## [7] "F2_Eye_Centric" "F3_Gliding_Motions"
## [9] "F4_Horizontal_Movements" "F5_Subordinate_Rigidity"
## [11] "PF1_Restless_Momentum" "PF2_Toggling_Eyes"
## [13] "PF3_Body_Swaying" "PF4_Casual_Hands"
## [15] "PF5_Alert_Perk_Ups" "RF1_Viscous_Inertia"
## [17] "RF2_Fixed_Gaze" "RF3_Diagonal_Eye_Drifts"
## [19] "RF4_Searching_Scowling" "RF5_Narrow_Head_Zoning"
## [21] "PR1_Head_Pushes" "PR2_Head_Shakes"
## [23] "PR3_Shoulder_Shrugs" "PR4_Fluent_Articulation"
## [25] "PR5_Projecting_Hands" "RR1_Poised_Receding"
## [27] "RR2_Disengaging_Eyes" "RR3_Exerted_Pushes"
## [29] "RR4_Momentum_Halting" "RR5_Meticulous_Hands"
There are really a lot of 0’s in all of the above, more than any other
score actually.
Use first person (Coer de Pirate) as example:
rigid_score_Coer <- sum(data[1,1:5])
fluid_score_Coer <- sum(data[1,6:10])
rigid_fluid_scores <- c(rigid_score_Coer, fluid_score_Coer)
df <- data.frame(name=c("rigid_score","fluid_score"), value=rigid_fluid_scores)
barplot(height=df$value, names=df$name, main="Coer de Pirate")
That is, show that persons tend to be either predominant in Rigid signals or in Fluid signals. First, make a percent, like PctRigid. For example, Coer would be negative (or below 1, because they are fluid, i.e. more fluid than Rigid) e.g, pctRigid = 12/(12+18) and then pctFluid = 18/(12+18)
## [1] 0.4
## [1] 0.6
## Cœur de Pirate Michael Gervais Joan Jett
## 12 32 12
## Ben Stein Billie Joe Armstrong Dean Schneider
## 6 21 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 6.593 10.000 35.000
fluid_scores <- colSums(tdata[6:10, ])
percent_rigid <- rigid_scores/(rigid_scores + fluid_scores)
percent_fluid <- fluid_scores/(rigid_scores + fluid_scores)
hist(percent_fluid) ### Clear bimodality here
hist(percent_rigid) ### Clear bimoadality here
summary(percent_rigid)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.1875 0.3704 0.4685 0.7838 1.0000 553
The Summary stats and Boxplots clearly show that many entries are blank (no data for many persons in the database). It is better, then, to elimnate samples who have a total of 0 for their energetic signal total points. That is not realistic and therefore represents missing data. So remove them. Let nz be a cut of the dataframe for only non zero-sum people:
nz <- tdata[ ,colSums(tdata)!=0]
# Now redo stats for the nonzero cut
rigid_scores <- colSums(nz[1:5,])
hist(rigid_scores)
summary(rigid_scores)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 4.00 10.00 13.77 23.50 35.00
boxplot(rigid_scores) #that looks a lot better
fluid_scores <- colSums(nz[6:10, ])
#
percent_rigid <- rigid_scores/(rigid_scores + fluid_scores)
percent_fluid <- fluid_scores/(rigid_scores + fluid_scores)
hist(percent_fluid) ### still look good
hist(percent_rigid) ### still look good
summary(percent_rigid)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.1875 0.3704 0.4685 0.7838 1.0000 1
boxplot(fluid_scores)
## [1] "R1_Rigid_Posture_Copy" "R2_Face_Centric"
## [3] "R3_Punctuated_Motions" "R4_Vertical_Movements"
## [5] "R5_Subordinate_Fluidity" "F1_Fluid_Posture"
## [7] "F2_Eye_Centric" "F3_Gliding_Motions"
## [9] "F4_Horizontal_Movements" "F5_Subordinate_Rigidity"
## [11] "PF1_Restless_Momentum" "PF2_Toggling_Eyes"
## [13] "PF3_Body_Swaying" "PF4_Casual_Hands"
## [15] "PF5_Alert_Perk_Ups" "RF1_Viscous_Inertia"
## [17] "RF2_Fixed_Gaze" "RF3_Diagonal_Eye_Drifts"
## [19] "RF4_Searching_Scowling" "RF5_Narrow_Head_Zoning"
## [21] "PR1_Head_Pushes" "PR2_Head_Shakes"
## [23] "PR3_Shoulder_Shrugs" "PR4_Fluent_Articulation"
## [25] "PR5_Projecting_Hands" "RR1_Poised_Receding"
## [27] "RR2_Disengaging_Eyes" "RR3_Exerted_Pushes"
## [29] "RR4_Momentum_Halting" "RR5_Meticulous_Hands"
As we can see in these histograms, people are scored at either 0, 2, 4, or 7 for these signals (none, lo, med, hi?). Now, let’s see the co-occurrence patterns of people’s scores in each of the signals.
Paired bar plots for each person for their rigid and fluid scores
might be insightful. First for a small sample:
row.names(data) <- gsub(" ", "_", row.names(data))
#Add categorical variable for if signals or rigid or fluid
data$RigidScore <- rowSums(data[,1:5])
data$FluidScore <- rowSums(data[,6:10])
#Reshape to long form
#reshape(data, direction="long", varying = list(names(data)[31:32]) )
nz.pca <- prcomp(nz)
library(ggfortify)
nz.pca.plot <- autoplot(nz.pca, data=nz)
nz.pca.plot
variety <- c(rep("rigid",5), rep("fluid",5), rep("PF",5), rep("RF",5), rep("PR",5), rep("RR",5) )
nzvar <- cbind.data.frame(nz, variety)
pca_res <- prcomp(nz, scale=TRUE)
p <- autoplot(pca_res, data=nzvar, color='variety', label=TRUE, shape=TRUE)
p
# Looks good.
# TODO: Next do, one with only Rigid and Fluid, and also one with only PR, RR, PF, and RF.