So here’s an example. Like I said, I am trying to compare the STABILITY measure within the couples. So for instance, a plot of interest would be the STABILITY measure for all the couples, with STABILITY for one person in the relationship on one axis, and the other person’s STABILITY on another axis. Is there a quick/dirty way to do this?? Thanks for your help!
Below, I’ve outlined the steps to analyze your code, from importing your data, cleaning it up and finally making the plots you requested. You can copy paste this code into R to replicate the results and let me know if you have any question!
Josh
danData <- read.csv("NRI_toydata Josh.csv")
summary(danData)
## STABILITY_Romantic CoupleID
## Min. :-2.000 Min. : 4.0
## 1st Qu.: 1.267 1st Qu.:604.0
## Median : 2.233 Median :643.0
## Mean : 1.973 Mean :577.3
## 3rd Qu.: 2.767 3rd Qu.:675.0
## Max. : 4.000 Max. :900.0
In this section, I create a new variable called partnerNum. This is either set to 1 or 2, in the order in which they first appear in the dataset. I could also imagine this being a different binary characteristic such as Male / Female. In any case, you need some variable with which to choose the first and second partner (even if it’s artificially created as it is here).
# create new variable partnerNum and set all values equal to -1
danData$partnerNum <- -1
#set partner number equal to 1 or 2, based on whether or not it's a duplicate
danData[!duplicated(danData$CoupleID), "partnerNum"] <- 1
danData[duplicated(danData$CoupleID), "partnerNum"] <- 2
head(danData)
## STABILITY_Romantic CoupleID partnerNum
## 1 1.4000000 4 1
## 2 3.2000000 7 1
## 3 0.3333333 4 2
## 4 3.0000000 5 1
## 5 2.4333333 5 2
## 6 1.0000000 8 1
Melting and casting is a great way to reshape data sets, such as you need to do here. I don’t do a full expaination here since other people have done this already. I’d read this article and then Google around for ‘reshape2 melt cast’
#load reshape library
library(reshape2)
dan.melt <- melt(danData, id.vars = c("CoupleID", "partnerNum"))
dan.cast <- dcast(dan.melt, formula = CoupleID ~ partnerNum)
#fix names of columns
names(dan.cast) <- c("CoupleID", "Partner1", "Partner2")
head(dan.cast)
## CoupleID Partner1 Partner2
## 1 4 1.400000 0.3333333
## 2 5 3.000000 2.4333333
## 3 7 3.200000 -0.8000000
## 4 8 1.000000 2.2333333
## 5 12 2.300000 2.2333333
## 6 201 1.666667 2.8000000
plot(dan.cast[,c("Partner1","Partner2")])
library(ggplot2)
ggplot(dan.cast, aes(x = Partner1, y = Partner2)) +
geom_point(aes(size=4, alpha=0.7)) +
theme_minimal() +
ggtitle("Comparison of values by CoupleID") +
scale_size(guide=FALSE) + # suppress output of size legend
scale_alpha(guide=FALSE, range = c(0.7)) # suppress output of opacity legend
## Warning: Removed 1 rows containing missing values (geom_point).
m <- lm(dan.cast$Partner2 ~ dan.cast$Partner1)
a <- signif(coef(m)[1], digits = 2)
b <- signif(coef(m)[2], digits = 2)
textlab <- paste("y = ",b,"x + ",a, sep="")
ggplot(dan.cast, aes(x = Partner1, y = Partner2)) +
geom_point(aes(size=4, alpha=0.7)) +
theme_minimal() +
geom_smooth(method = "lm") +
annotate("text", x = -.6, y = 3.5, label = textlab, color="black", size = 8, parse=FALSE) +
labs(title = "Comparison of Stability by CoupleID", x = "Partner 1 Stability", y = "Partner 2 Stability") +
scale_size(guide=FALSE) + # suppress output of size legend
scale_alpha(guide=FALSE, range = c(0.7)) # suppress output of opacity legend
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).