Motivation
I downloaded iMessage data to make a text mining Valentine for my LDR SO in February. I’ve recently been wondering how frequent our texts are when we are apart vs. together. I figured I could visualize this by plotting daily counts of messages and shading time intervals when we were in the same city.
How to get a csv
of iMessage data!
Here are the steps:
Download zip from here: now you have a iMessage-Export-master
folder. Put that wherever you want on your computer.
On your mac: Finder>Go>Go To Folder
. Type in ~/Library/Messages
. Copy the chat.db
file into the iMessage-Export-master
folder you got from step 1.
Create a contacts.txt
file featuring contact numbers you want to identify with names. E.g., for the purposes of this work, this is just my and Jesse’s numbers. The text file just contains the following (numbers partially X
’d for privacy reasons #dataethics):
+1917XXXXXXX Alex
+1262XXXXXXX Jesse
- Go to Terminal. Navigate to wherever your
iMessage-Export-master
folder lives (via cd
command). Run php contacts.php >> contacts.txt
. Then run php export-csv.php
.
You should now have a folder messages
in iMessage-Export-master
. Within that folder is a messages.csv
file. Names that you identified in contacts.txt
will be coded in. TA DA!
Load and shape data
Pull in the csv.
#fix time and date
# make all messages lowercase
messages<-read.csv('iMessage-Export-master/messages/messages.csv')
messages$Message<-as.character(messages$Message)
messages$Message<-tolower(messages$Message)
Subset to messages between me and Jesse.
messagesja <- messages[ which(messages$To.Name=='Jesse' | messages$From.Name=='Jesse'), ]
nrow(messagesja)
[1] 57608
This time period (5-3-17 to 4-26-18) includes 57,608 text messages! (That includes reactions to messages as well.)
messagesja<-messagesja[,c("Date", "Message")]
messagesja$Date <- as.Date(messagesja$Date)
# calculate how many messages per day and plot that over the year!
library(dplyr)
#calculate total messages by day
jatot<-messagesja %>%
group_by(Date) %>%
summarise(n = n())
#some days have 0 and we want these to show up as 0
#so we create a dataframe of all days 5-3-17 to 4-26-18 and merge that with jatot
df <- data.frame(Date=seq(as.Date("2017/5/3"), as.Date("2018/4/26"), "days"))
jatot<-merge(jatot,df,by="Date", all=T)
jatot$n[is.na(jatot$n)] <- 0
#save this csv so I can then add data about when together and when not together manually out of R
write.csv(jatot, "jatot.csv", row.names = F)
Let’s graph
Use my regular theme.
library(ggplot2);library(ggrepel); library(extrafont); library(ggthemes);library(reshape);library(grid);
library(scales);library(RColorBrewer);library(gridExtra);
my_theme <- function() {
# Define colors for the chart
palette <- brewer.pal("Greys", n=9)
color.background = palette[2]
color.grid.major = palette[4]
color.panel = palette[3]
color.axis.text = palette[9]
color.axis.title = palette[9]
color.title = palette[9]
# Create basic construction of chart
theme_bw(base_size=9, base_family="Palatino") +
# Set the entire chart region to a light gray color
theme(panel.background=element_rect(fill=color.panel, color=color.background)) +
theme(plot.background=element_rect(fill=color.background, color=color.background)) +
theme(panel.border=element_rect(color=color.background)) +
# Format grid
theme(panel.grid.major=element_line(color=color.grid.major,size=.25)) +
theme(panel.grid.minor=element_blank()) +
theme(axis.ticks=element_blank()) +
# Format legend
theme(legend.position="bottom") +
theme(legend.background = element_rect(fill=color.background)) +
theme(legend.text = element_text(size=8,color=color.axis.title)) +
theme(legend.title = element_blank()) +
#Format facet labels
theme(strip.text.x = element_text(size = 8, face="bold"))+
# Format title and axes labels these and tick marks
theme(plot.title=element_text(color=color.title, size=28)) +
theme(axis.text.x=element_text(size=8)) +
theme(axis.text.y=element_text(size=8)) +
theme(axis.title.x=element_text(size=8)) +
theme(axis.title.y=element_text(size=8)) +
#Format title and facet_wrap title
theme(strip.text = element_text(size=8), plot.title = element_text(size = 16, colour = "black", vjust = 1, hjust=0))+
# Plot margins
theme(plot.margin = unit(c(.2, .2, .2, .2), "cm"))
}
I added in data about whether we were together or not in jatot_together.csv
. Let’s pull this in.
jaall<-read.csv('jatot_together.csv')
jaall$Date<-as.Date(jaall$Date, "%m/%d/%y")
nrow(subset(jaall, together==1))
[1] 184
We were together 184 days out of the year!! I didn’t know that stat until now (thus the !!). Now, we plot!
ggplot(jaall,aes(x=Date,y=n, group =1)) + geom_point(size=1)+
geom_line(size=.6)+
my_theme()+
ggtitle("Text Me Back: A Year of LDR Communication", subtitle="Daily Count of iMessages between Alex and Jesse [5/3/17 - 4/26/18]") +
scale_x_date(labels = date_format("%b %Y"), date_breaks = "1 month")+
scale_y_continuous(breaks = seq(0,1000,100), lim = c(0, 1000))+
geom_rect(data = subset(jaall, jaall$together == 1),
aes(ymin = -Inf, ymax = Inf, xmin = Date-0.5, xmax = Date+0.5), alpha = 0.2, fill="mediumseagreen")+
labs(y = NULL, x=NULL, caption="\nShaded green areas mark time periods when Alex and Jesse were physically in the same city!\nAlex and Jesse spent 184 days together despite living in Cambridge/SF, respectively. Graph via Alex Albright [thelittledataset.com].")
ggsave("LDR_year.png", width = 9, height = 5, dpi = 800)
Hypothesis confirmed!
---
title: 'Text Me Back: A Year of LDR Communication'
author: "Alex Albright"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
  html_notebook: default
---

# Motivation

I downloaded iMessage data to make a text mining Valentine for my LDR SO in February. I've recently been wondering how frequent our texts are when we are apart vs. together. I figured I could visualize this by **plotting daily counts of messages and shading time intervals when we were in the same city.**  

# How to get a `csv` of iMessage data!

Here are the steps:

1. Download zip from [here](https://github.com/aaronpk/iMessage-Export): now you have a `iMessage-Export-master` folder. Put that wherever you want on your computer. 

2. On your mac: `Finder>Go>Go To Folder`. Type in `~/Library/Messages`. Copy the `chat.db` file into the `iMessage-Export-master` folder you got from step 1.

3. Create a `contacts.txt` file featuring contact numbers you want to identify with names. E.g., for the purposes of this work, this is just my and Jesse's numbers. The text file just contains the following (numbers partially `X`'d for privacy reasons #dataethics):

+1917XXXXXXX Alex

+1262XXXXXXX Jesse

4. Go to Terminal. Navigate to wherever your `iMessage-Export-master` folder lives (via `cd` command). Run `php contacts.php >> contacts.txt`. Then run `php export-csv.php`. 

You should now have a folder `messages` in `iMessage-Export-master`. Within that folder is a `messages.csv` file. Names that you identified in `contacts.txt` will be coded in. **TA DA!**

# Load and shape data

Pull in the csv. 

```{r, echo=TRUE, message=FALSE, warning=FALSE}
#fix time and date
# make all messages lowercase
messages<-read.csv('iMessage-Export-master/messages/messages.csv')
messages$Message<-as.character(messages$Message)
messages$Message<-tolower(messages$Message)
```

Subset to messages between me and Jesse.

```{r, echo=TRUE, message=FALSE, warning=FALSE}
messagesja <- messages[ which(messages$To.Name=='Jesse' | messages$From.Name=='Jesse'), ]
nrow(messagesja)
```
This time period (5-3-17 to 4-26-18) includes 57,608 text messages! (That includes reactions to messages as well.)

```{r, echo=TRUE, message=FALSE, warning=FALSE}
messagesja<-messagesja[,c("Date", "Message")]
messagesja$Date <- as.Date(messagesja$Date)
# calculate how many messages per day and plot that over the year! 

library(dplyr)
#calculate total messages by day
jatot<-messagesja %>% 
  group_by(Date) %>% 
  summarise(n = n())

#some days have 0 and we want these to show up as 0
#so we create a dataframe of all days 5-3-17 to 4-26-18 and merge that with jatot
df <- data.frame(Date=seq(as.Date("2017/5/3"), as.Date("2018/4/26"), "days")) 
jatot<-merge(jatot,df,by="Date", all=T)
jatot$n[is.na(jatot$n)] <- 0

#save this csv so I can then add data about when together and when not together manually out of R
write.csv(jatot, "jatot.csv", row.names = F)
```

# Let's graph

Use my regular theme.
```{r, message=FALSE, warning=FALSE}
library(ggplot2);library(ggrepel); library(extrafont); library(ggthemes);library(reshape);library(grid);
library(scales);library(RColorBrewer);library(gridExtra);

my_theme <- function() {

  # Define colors for the chart
  palette <- brewer.pal("Greys", n=9)
  color.background = palette[2]
  color.grid.major = palette[4]
  color.panel = palette[3]
  color.axis.text = palette[9]
  color.axis.title = palette[9]
  color.title = palette[9]

  # Create basic construction of chart
  theme_bw(base_size=9, base_family="Palatino") + 

  # Set the entire chart region to a light gray color
  theme(panel.background=element_rect(fill=color.panel, color=color.background)) +
  theme(plot.background=element_rect(fill=color.background, color=color.background)) +
  theme(panel.border=element_rect(color=color.background)) +

  # Format grid
  theme(panel.grid.major=element_line(color=color.grid.major,size=.25)) +
  theme(panel.grid.minor=element_blank()) +
  theme(axis.ticks=element_blank()) +

  # Format legend
  theme(legend.position="bottom") +
  theme(legend.background = element_rect(fill=color.background)) +
  theme(legend.text = element_text(size=8,color=color.axis.title)) + 
  theme(legend.title = element_blank()) + 
  
  #Format facet labels
  theme(strip.text.x = element_text(size = 8, face="bold"))+

  # Format title and axes labels these and tick marks
  theme(plot.title=element_text(color=color.title, size=28)) +
  theme(axis.text.x=element_text(size=8)) +
  theme(axis.text.y=element_text(size=8)) +
  theme(axis.title.x=element_text(size=8)) +
  theme(axis.title.y=element_text(size=8)) +

  #Format title and facet_wrap title
  theme(strip.text = element_text(size=8), plot.title = element_text(size = 16, colour = "black", vjust = 1, hjust=0))+
    
  # Plot margins
  theme(plot.margin = unit(c(.2, .2, .2, .2), "cm"))
}
```

I added in data about whether we were together or not in `jatot_together.csv`. Let's pull this in.
```{r}
jaall<-read.csv('jatot_together.csv')
jaall$Date<-as.Date(jaall$Date, "%m/%d/%y")

nrow(subset(jaall, together==1))
```
We were together 184 days out of the year!! I didn't know that stat until now (thus the !!).
Now, we plot!
```{r}
ggplot(jaall,aes(x=Date,y=n, group =1)) + geom_point(size=1)+ 
  geom_line(size=.6)+
  my_theme()+ 
  ggtitle("Text Me Back: A Year of LDR Communication", subtitle="Daily Count of iMessages between Alex and Jesse [5/3/17 - 4/26/18]") + 
  scale_x_date(labels = date_format("%b %Y"), date_breaks = "1 month")+
  scale_y_continuous(breaks = seq(0,1000,100), lim = c(0, 1000))+
  geom_rect(data = subset(jaall, jaall$together == 1), 
            aes(ymin = -Inf, ymax = Inf, xmin = Date-0.5, xmax = Date+0.5), alpha = 0.2, fill="mediumseagreen")+
  labs(y = NULL, x=NULL, caption="\nShaded green areas mark time periods when Alex and Jesse were physically in the same city!\nAlex and Jesse spent 184 days together despite living in Cambridge/SF, respectively. Graph via Alex Albright [thelittledataset.com].") 
  ggsave("LDR_year.png", width = 9, height = 5, dpi = 800)
```

## Hypothesis confirmed!