Code for DS

My code and notes

1) Import data

danData <- read.csv("NRI_toydata Josh.csv")
summary(danData)

##  STABILITY_Romantic    CoupleID    
##  Min.   :-2.000     Min.   :  4.0  
##  1st Qu.: 1.267     1st Qu.:604.0  
##  Median : 2.233     Median :643.0  
##  Mean   : 1.973     Mean   :577.3  
##  3rd Qu.: 2.767     3rd Qu.:675.0  
##  Max.   : 4.000     Max.   :900.0

2) Add variable for partner 1 or 2

In this section, I create a new variable called partnerNum. This is either set to 1 or 2, in the order in which they first appear in the dataset. I could also imagine this being a different binary characteristic such as Male / Female. In any case, you need some variable with which to choose the first and second partner (even if it’s artificially created as it is here).

# create new variable partnerNum and set all values equal to -1
danData$partnerNum <- -1 

#set partner number equal to 1 or 2, based on whether or not it's a duplicate
danData[!duplicated(danData$CoupleID), "partnerNum"] <- 1
danData[duplicated(danData$CoupleID), "partnerNum"] <- 2

head(danData)

##   STABILITY_Romantic CoupleID partnerNum
## 1          1.4000000        4          1
## 2          3.2000000        7          1
## 3          0.3333333        4          2
## 4          3.0000000        5          1
## 5          2.4333333        5          2
## 6          1.0000000        8          1

3) Melt and cast data set

Melting and casting is a great way to reshape data sets, such as you need to do here. I don’t do a full expaination here since other people have done this already. I’d read this article and then Google around for ‘reshape2 melt cast’

#load reshape library
library(reshape2)

dan.melt <- melt(danData, id.vars = c("CoupleID", "partnerNum"))
dan.cast <- dcast(dan.melt, formula = CoupleID ~ partnerNum)

#fix names of columns
names(dan.cast) <- c("CoupleID", "Partner1", "Partner2")

head(dan.cast)

##   CoupleID Partner1   Partner2
## 1        4 1.400000  0.3333333
## 2        5 3.000000  2.4333333
## 3        7 3.200000 -0.8000000
## 4        8 1.000000  2.2333333
## 5       12 2.300000  2.2333333
## 6      201 1.666667  2.8000000

4) Create scatter plot of data

plot(dan.cast[,c("Partner1","Partner2")])

5) Plot in ggplot

library(ggplot2)
ggplot(dan.cast, aes(x = Partner1, y = Partner2)) +
  geom_point(aes(size=4, alpha=0.7)) +
  theme_minimal() +
  ggtitle("Comparison of values by CoupleID") +
  scale_size(guide=FALSE) + # suppress output of size legend
  scale_alpha(guide=FALSE, range = c(0.7)) # suppress output of opacity legend

## Warning: Removed 1 rows containing missing values (geom_point).

6) Example with estimation

m <- lm(dan.cast$Partner2 ~ dan.cast$Partner1)
a <- signif(coef(m)[1], digits = 2)
b <- signif(coef(m)[2], digits = 2)
textlab <- paste("y = ",b,"x + ",a, sep="")

ggplot(dan.cast, aes(x = Partner1, y = Partner2)) +
  geom_point(aes(size=4, alpha=0.7)) +
  theme_minimal() +
  geom_smooth(method = "lm") +
  annotate("text", x = -.6, y = 3.5, label = textlab, color="black", size = 8, parse=FALSE) +
  labs(title = "Comparison of Stability by CoupleID", x = "Partner 1 Stability", y = "Partner 2 Stability") +
  scale_size(guide=FALSE) + # suppress output of size legend
  scale_alpha(guide=FALSE, range = c(0.7)) # suppress output of opacity legend

## Warning: Removed 1 rows containing missing values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).