Motivation

I downloaded iMessage data to make a text mining Valentine for my LDR SO in February. I’ve recently been wondering how frequent our texts are when we are apart vs. together. I figured I could visualize this by plotting daily counts of messages and shading time intervals when we were in the same city.

How to get a csv of iMessage data!

Here are the steps:

  1. Download zip from here: now you have a iMessage-Export-master folder. Put that wherever you want on your computer.

  2. On your mac: Finder>Go>Go To Folder. Type in ~/Library/Messages. Copy the chat.db file into the iMessage-Export-master folder you got from step 1.

  3. Create a contacts.txt file featuring contact numbers you want to identify with names. E.g., for the purposes of this work, this is just my and Jesse’s numbers. The text file just contains the following (numbers partially X’d for privacy reasons #dataethics):

+1917XXXXXXX Alex

+1262XXXXXXX Jesse

  1. Go to Terminal. Navigate to wherever your iMessage-Export-master folder lives (via cd command). Run php contacts.php >> contacts.txt. Then run php export-csv.php.

You should now have a folder messages in iMessage-Export-master. Within that folder is a messages.csv file. Names that you identified in contacts.txt will be coded in. TA DA!

Load and shape data

Pull in the csv.

#fix time and date
# make all messages lowercase
messages<-read.csv('iMessage-Export-master/messages/messages.csv')
messages$Message<-as.character(messages$Message)
messages$Message<-tolower(messages$Message)

Subset to messages between me and Jesse.

messagesja <- messages[ which(messages$To.Name=='Jesse' | messages$From.Name=='Jesse'), ]
nrow(messagesja)
[1] 57608

This time period (5-3-17 to 4-26-18) includes 57,608 text messages! (That includes reactions to messages as well.)

messagesja<-messagesja[,c("Date", "Message")]
messagesja$Date <- as.Date(messagesja$Date)
# calculate how many messages per day and plot that over the year! 
library(dplyr)
#calculate total messages by day
jatot<-messagesja %>% 
  group_by(Date) %>% 
  summarise(n = n())
#some days have 0 and we want these to show up as 0
#so we create a dataframe of all days 5-3-17 to 4-26-18 and merge that with jatot
df <- data.frame(Date=seq(as.Date("2017/5/3"), as.Date("2018/4/26"), "days")) 
jatot<-merge(jatot,df,by="Date", all=T)
jatot$n[is.na(jatot$n)] <- 0
#save this csv so I can then add data about when together and when not together manually out of R
write.csv(jatot, "jatot.csv", row.names = F)

Let’s graph

Use my regular theme.

library(ggplot2);library(ggrepel); library(extrafont); library(ggthemes);library(reshape);library(grid);
library(scales);library(RColorBrewer);library(gridExtra);
my_theme <- function() {
  # Define colors for the chart
  palette <- brewer.pal("Greys", n=9)
  color.background = palette[2]
  color.grid.major = palette[4]
  color.panel = palette[3]
  color.axis.text = palette[9]
  color.axis.title = palette[9]
  color.title = palette[9]
  # Create basic construction of chart
  theme_bw(base_size=9, base_family="Palatino") + 
  # Set the entire chart region to a light gray color
  theme(panel.background=element_rect(fill=color.panel, color=color.background)) +
  theme(plot.background=element_rect(fill=color.background, color=color.background)) +
  theme(panel.border=element_rect(color=color.background)) +
  # Format grid
  theme(panel.grid.major=element_line(color=color.grid.major,size=.25)) +
  theme(panel.grid.minor=element_blank()) +
  theme(axis.ticks=element_blank()) +
  # Format legend
  theme(legend.position="bottom") +
  theme(legend.background = element_rect(fill=color.background)) +
  theme(legend.text = element_text(size=8,color=color.axis.title)) + 
  theme(legend.title = element_blank()) + 
  
  #Format facet labels
  theme(strip.text.x = element_text(size = 8, face="bold"))+
  # Format title and axes labels these and tick marks
  theme(plot.title=element_text(color=color.title, size=28)) +
  theme(axis.text.x=element_text(size=8)) +
  theme(axis.text.y=element_text(size=8)) +
  theme(axis.title.x=element_text(size=8)) +
  theme(axis.title.y=element_text(size=8)) +
  #Format title and facet_wrap title
  theme(strip.text = element_text(size=8), plot.title = element_text(size = 16, colour = "black", vjust = 1, hjust=0))+
    
  # Plot margins
  theme(plot.margin = unit(c(.2, .2, .2, .2), "cm"))
}

I added in data about whether we were together or not in jatot_together.csv. Let’s pull this in.

jaall<-read.csv('jatot_together.csv')
jaall$Date<-as.Date(jaall$Date, "%m/%d/%y")
nrow(subset(jaall, together==1))
[1] 184

We were together 184 days out of the year!! I didn’t know that stat until now (thus the !!). Now, we plot!

ggplot(jaall,aes(x=Date,y=n, group =1)) + geom_point(size=1)+ 
  geom_line(size=.6)+
  my_theme()+ 
  ggtitle("Text Me Back: A Year of LDR Communication", subtitle="Daily Count of iMessages between Alex and Jesse [5/3/17 - 4/26/18]") + 
  scale_x_date(labels = date_format("%b %Y"), date_breaks = "1 month")+
  scale_y_continuous(breaks = seq(0,1000,100), lim = c(0, 1000))+
  geom_rect(data = subset(jaall, jaall$together == 1), 
            aes(ymin = -Inf, ymax = Inf, xmin = Date-0.5, xmax = Date+0.5), alpha = 0.2, fill="mediumseagreen")+
  labs(y = NULL, x=NULL, caption="\nShaded green areas mark time periods when Alex and Jesse were physically in the same city!\nAlex and Jesse spent 184 days together despite living in Cambridge/SF, respectively. Graph via Alex Albright [thelittledataset.com].") 
  ggsave("LDR_year.png", width = 9, height = 5, dpi = 800)

Hypothesis confirmed!

