In the following, we examine the utility of using GPS distance
between phones to predict physical proximity. If successful, this means
we can estimate if partners are together or not at any moment in time.
Alternatively, and to acquire even more precise estimation of proximity,
we can implement logging of “nearby interaction” based on iPhone ultra
wideband (radio) chip using the Apple Developer toolkit https://developer.apple.com/documentation/nearbyinteraction.
This will give precise distance measures up to 100 meters.
The data we use is collected by students on a 4-day trip to New York.
During the trip, the students participated in with various events
together and also had free time to explore New York by them selves or in
smaller groups (self-organized). More information about the data can be
found here: https://rpubs.com/jych/audioexplorers2023
Thus,
we will examine how well the GPS distance classify if the students are
together and participating in an event or not.
To do this, we will assess:
Average distance between all students.
As GPS is noisy (especially around and inside buildings), we expect
that the average distance will be a good group proximity measure.
Distance between selected pairs.
I.e., do the continuous distance between pairs predict togetherness
throughout the day (during events + spare time)?
Pre-processing
Prior to computing the distance between GPS coordinates in meters, we
remove erroneous GPS logs (values extreme compared to medians). Then, we
aggregate data across 1 minute intervals (i.e., across 3 data points).
Next, for each 1-min average, the mutual distance between all students
are computed (using “geosphere” R package) and averaged.
Data visualizations
Noise and heart rarte
First, we inspect the logged mean noise floor and heart rate.

Group proximity
Next, we inspect the group proximity. That is, the average mutual
distance (in meters) between all students. As seen, the distance between
students rise and is highest for “Free time”, “Bus”, “Coney Island”, and
“Walk”, which are all events where the students are scattered. At the
same time, proximity is highest (lowest mutual distance) for “Dinner”
and various presentations where all students were together.

Pair proximity
Finally, we focus on the distance between a selected pair of the
students. Again, the distance measure seems to be a good proxy for
proximity across the days.

---
title: "GPS as proximity detector"
output: html_notebook
---

```{r setup, include=FALSE} 
knitr::opts_chunk$set(warning = FALSE, message = FALSE,echo = FALSE,fig.align = 'center') 
getwd()

```

In the following, we examine the utility of using GPS distance between phones to predict physical proximity. If successful, this means we can estimate if partners are together or not at any moment in time. Alternatively, and to acquire even more precise estimation of proximity, we can implement logging of "nearby interaction" based on iPhone ultra wideband (radio) chip using the Apple Developer toolkit [https://developer.apple.com/documentation/nearbyinteraction](https://developer.apple.com/documentation/nearbyinteraction){target="_blank"}. This will give precise distance measures up to 100 meters.<br>   
The data we use is collected by students on a 4-day trip to New York. During the trip, the students participated in with various events together and also had free time to explore New York by them selves or in smaller groups (self-organized). More information about the data can be found here: [https://rpubs.com/jych/audioexplorers2023](https://rpubs.com/jych/audioexplorers2023){target="_blank"} 
<br>
Thus, we will examine how well the GPS distance classify if the students are together and participating in an event or not.  
<br>
To do this, we will assess:

#### Average distance between all students. 

As GPS is noisy (especially around and inside buildings), we expect that the average distance will be a good group proximity measure.  

#### Distance between selected pairs. 

I.e., do the continuous distance between pairs predict togetherness throughout the day (during events + spare time)? 

## Pre-processing

Prior to computing the distance between GPS coordinates in meters, we remove erroneous GPS logs (values extreme compared to medians). Then, we aggregate data across 1 minute intervals (i.e., across 3 data points). Next, for each 1-min average, the mutual distance between all students are computed (using "geosphere" R package) and averaged.  

```{r datamangling}
setwd('C:/Users/jych/Demant/+Team Audio Explorer 2021 - data and competitions - Documents/Code/')
library(ggplot2)
library(grid)
library(ggthemes)
library(ggpubr)
library(lubridate)
library(ggmap)
library(leaflet)
library(mapview)
library(plotrix)
library(RColorBrewer)
library(ggrepel)
library(patchwork)
library(emmeans)
library(geosphere)

theme_set(theme_pubr())
load('C:/Users/jych/Demant/+Team Audio Explorer 2021 - data and competitions - Documents/Data/August2023.Rdata')
df.garmin$Hour <- hour(df.garmin$Timestamp)
df.garmin$Date <- date(df.garmin$Timestamp)
df.garmin <- df.garmin[df.garmin$ID!='Franz',]
df.garmin$rTime <- ceiling_date(df.garmin$Timestamp,'1 minutes')
df.garmin$Days <- date(df.garmin$rTime)
cols = c('#2db7c5','#c7017f')
```


```{r timeseries comp,echo=FALSE}
datatmp <- df.garmin[date(df.garmin$rTime)>'2023-08-30' &
                     date(df.garmin$rTime)<'2023-09-04' & 
                       df.garmin$Hour>7 &
                       df.garmin$ID!='Vincent',]

df.plot.nf <- aggregate(NF.dBA~rTime+Events,
                         data=datatmp,
                         FUN=function(x)median(x,na.rm=T))

df.plot.spl <- aggregate(SPL.dBA~rTime+Events,
                         data=datatmp,
                         FUN=function(x)median(x,na.rm=T))

df.plot.hr <- aggregate(beatsPerMinute~rTime+Events,
                        data=datatmp,
                        FUN=function(x)mean(x,na.rm=T))


df.plot.nf <- df.plot.nf %>%
  mutate(x = hour(rTime) + minute(rTime)/60 + second(rTime)/3600)
df.plot.hr <- df.plot.hr %>%
  mutate(x = hour(rTime) + minute(rTime)/60 + second(rTime)/3600)

df.plot.nf$EventsN <- as.factor(df.plot.nf$Events)
levels(df.plot.nf$EventsN) <- 1:length(levels(df.plot.nf$EventsN))
df.plot.nf$EventsN <- as.numeric(as.character(df.plot.nf$EventsN))

df.plot.nf$EventsSel <- 0
df.plot.nf$EventsSel[which(diff(c(0,df.plot.nf$EventsN))!=0)+1] <- 1


p1 <- ggplot()+
  geom_point(data=df.plot.nf,aes(x=x,y=NF.dBA),alpha=0.5,color=cols[1],size=1)+
  geom_point(data=df.plot.hr,aes(x=x,y=beatsPerMinute),alpha=0.5,color=cols[2],size=1)+
  theme_light()+
  scale_y_continuous(
    # Features of the first axis
    name = "Noise floor level (dB)",limits = c(50,110),
    # Add a second axis and specify its features
    sec.axis = sec_axis( trans=~./1, name="Heart rate (bpm)")
  )+
  theme(axis.title.y.right = element_text(color = cols[2]))+
  theme(axis.title.y.left = element_text(color = cols[1]))+
  theme(legend.position="top",
        legend.justification="left", 
        legend.box.spacing = unit(0, "pt"),
        axis.ticks.y.right = element_line(color = cols[2]),
        axis.text.y.right = element_text(color = cols[2]),
        axis.ticks.y.left  = element_line(color = cols[1]),
        axis.text.y.left = element_text(color = cols[1]))+
  labs(x='')+
  scale_x_continuous(limits = c(8,24),breaks = seq(8,24,2),labels = paste(seq(8,24,2),':00',sep = ""))+
  grids('xy')+
 geom_text_repel(data = df.plot.nf[df.plot.nf$EventsSel==1&df.plot.nf$Events!='Not known',], aes(label = Events,x=x,y=NF.dBA,color=Events),fontface = "bold",max.overlaps = 80,force = 8,nudge_y = -10,alpha=1,size=3)+
  geom_vline(data = df.plot.nf[df.plot.nf$EventsSel==1&df.plot.nf$Events!='Not known',], aes(xintercept=x,color=Events),alpha=1,linetype='dashed')+
  facet_wrap(~date(rTime),scales='free_x',ncol=1)+
  guides(color='none')


```

```{r proxy dose comp,echo=FALSE}
datatmp <- df.garmin[date(df.garmin$rTime)>'2023-08-30' &
                     date(df.garmin$rTime)<'2023-09-04' & 
                       df.garmin$Hour>7 &
                       df.garmin$ID!='Vincent',]

# filter NA GPS
datatmp$latitude[datatmp$latitude<30] <- NA
datatmp$longitude[datatmp$longitude>(-10)] <- NA


df.gps <- aggregate(.~rTime+ID+Events,
                         data=datatmp[c('rTime','ID','latitude','longitude','Events')],
                         FUN=function(x)median(x,na.rm=T))


df.gps$Dm <- NA
df.gps$Vm <- NA
for(i in 1:dim(df.gps)[1]){
 
 tmp <- df.gps[df.gps$rTime==df.gps$rTime[i],] 
 longitude <- df.gps$longitude[i]
 latitude <- df.gps$latitude[i]
 
 d <- NULL
 for(ii in 1:dim(tmp)[1]){
  longitude2 <- tmp$longitude[ii]
  latitude2 <- tmp$latitude[ii]
  d <- c(d,distm(c(longitude, latitude), c(longitude2, latitude2), fun = distGeo))
 }
 
 d[d>30000] <- NA
 d[d==0] <- NA
 df.gps$Dm[i] <- mean(d,na.rm=T)

}

df.gps <- df.gps %>%
  mutate(x = hour(rTime) + minute(rTime)/60 + second(rTime)/3600)

df.gps.plot <- aggregate(Dm~rTime+Events,FUN=function(x)median(x,na.rm=T),data=df.gps[,c('rTime','Dm','Events')])
df.gps.plot$SE <- aggregate(Dm~rTime+Events,FUN=function(x)std.error(x,na.rm=T),data=df.gps[,c('rTime','Dm','Events')])$Dm

df.gps.plot <- df.gps.plot %>%
  mutate(x = hour(rTime) + minute(rTime)/60 + second(rTime)/3600)

df.gps.plot$EventsN <- as.factor(df.gps.plot$Events)
levels(df.gps.plot$EventsN) <- 1:length(levels(df.gps.plot$EventsN))
df.gps.plot$EventsN <- as.numeric(as.character(df.gps.plot$EventsN))
df.gps.plot$EventsSel <- 0
df.gps.plot$EventsSel[which(diff(c(0,df.gps.plot$EventsN))!=0)+1] <- 1


p2 <- ggplot()+
  geom_col(data=df.gps.plot,aes(x=x,y=Dm),fill='grey60',alpha=0.8)+
  geom_col(data=df.gps.plot,aes(x=x,y=(Dm+SE)),fill='grey60',alpha=0.2)+
  theme_light()+
  theme(axis.title.y.left = element_text())+
  theme(legend.position="top",
        legend.justification="left", 
        legend.box.spacing = unit(0, "pt"))+
  labs(x='',y='Group proximity (mutual distance, meters)')+
  scale_x_continuous(limits = c(8,24),breaks = seq(8,24,2),labels = paste(seq(8,24,2),':00',sep = ""))+
  geom_text_repel(data = df.gps.plot[df.gps.plot$EventsSel==1,], aes(label = Events,x=x,y=Dm,color=Events),fontface = "bold",max.overlaps = 80,force = 8,nudge_y = 9000,alpha=1,size=3)+
  geom_vline(data = df.gps.plot[df.gps.plot$EventsSel==1,], aes(xintercept=x,color=Events),alpha=1,linetype='dashed')+
  grids('xy')+
  facet_wrap(~date(rTime),scales='free_x',ncol=1)+
  guides(color='none',fill='none')+
  coord_cartesian(ylim = c(0,max(df.gps.plot$Dm)))

```

```{r proxy dose 2 comp,echo=FALSE}
df.garmin$rTime <- round_date(df.garmin$Timestamp,'1 minutes')
datatmp <- df.garmin[date(df.garmin$rTime)>'2023-08-30' &
                     date(df.garmin$rTime)<'2023-09-04' & 
                     df.garmin$Hour>7 & 
                     (df.garmin$ID=='George' |
                     df.garmin$ID=='Emil'),]

# filter NA GPS
datatmp$latitude[datatmp$latitude<30] <- NA
datatmp$longitude[datatmp$longitude>(-10)] <- NA


df.gps <- aggregate(.~rTime+ID+Events,
                         data=datatmp[c('rTime','ID','latitude','longitude','Events')],
                         FUN=function(x)median(x,na.rm=T))

df.gps$Dm <- NA
df.gps$Vm <- NA
for(i in 1:dim(df.gps)[1]){
 
 tmp <- df.gps[df.gps$rTime==df.gps$rTime[i],] 
 longitude <- df.gps$longitude[i]
 latitude <- df.gps$latitude[i]
 
 d <- NULL
 for(ii in 1:dim(tmp)[1]){
  longitude2 <- tmp$longitude[ii]
  latitude2 <- tmp$latitude[ii]
  d <- c(d,distm(c(longitude, latitude), c(longitude2, latitude2), fun = distGeo))
 }
 
 d[d>30000] <- NA
 d[d==0] <- NA
 df.gps$Dm[i] <- mean(d,na.rm=T)

}

df.gps <- df.gps %>%
  mutate(x = hour(rTime) + minute(rTime)/60 + second(rTime)/3600)


df.gps$EventsN <- as.factor(df.gps$Events)
levels(df.gps$EventsN) <- 1:length(levels(df.gps$EventsN))
df.gps$EventsN <- as.numeric(as.character(df.gps$EventsN))
df.gps$EventsSel <- 0
df.gps$EventsSel[which(diff(c(0,df.gps$EventsN))!=0)+1] <- 1


p3 <- ggplot()+
  geom_col(data=df.gps,aes(x=x,y=Dm),fill='grey60',alpha=0.8)+
  theme_light()+
  theme(axis.title.y.left = element_text())+
  theme(legend.position="top",
        legend.justification="left", 
        legend.box.spacing = unit(0, "pt"))+
  labs(x='',y='Pair proximity (mutual distance, meters)')+
  scale_x_continuous(limits = c(8,24),breaks = seq(8,24,2),labels = paste(seq(8,24,2),':00',sep = ""))+
  geom_text_repel(data = df.gps[df.gps$EventsSel==1&df.gps$Events!='Not known',], aes(label = Events,x=x,y=Dm,color=Events),fontface = "bold",max.overlaps = 80,force = 8,nudge_y = 10000,alpha=1,size=3)+
  geom_vline(data = df.gps[df.gps$EventsSel==1&df.gps$Events!='Not known',], aes(xintercept=x,color=Events),alpha=1,linetype='dashed')+
  grids('xy')+
  facet_wrap(~date(rTime),scales='free_x',ncol=1)+
  guides(color='none',fill='none')+
  coord_cartesian(ylim = c(0,15000))

```

```{r plot align,echo=FALSE}
p <- align_patches(p1,p2,p3)
```

## Data visualizations

### Noise and heart rarte

First, we inspect the logged mean noise floor and heart rate.  

```{r plot 1,echo=FALSE,fig.height=8,fig.width=10,dpi=1200}
plot(p[[1]])
```

### Group proximity

Next, we inspect the group proximity. That is, the average mutual distance (in meters) between all students. As seen, the distance between students rise and is highest for "Free time", "Bus", "Coney Island", and "Walk", which are all events where the students are scattered. At the same time, proximity is highest (lowest mutual distance) for "Dinner" and various presentations where all students were together.  

```{r plot 2,echo=FALSE,fig.height=8,fig.width=10,dpi=1200}
plot(p[[2]])
```

### Pair proximity

Finally, we focus on the distance between a selected pair of the students. Again, the distance measure seems to be a good proxy for proximity across the days. 

```{r plot 3,echo=FALSE,fig.height=8,fig.width=10,dpi=1200}
plot(p[[3]])
```
