I recently thought it would be interesting to map where and when my dog goes to the bathroom… I know, let’s not dive too deep into why. Nevertheless, I had noticed my dog, “Lucy”, tended to pee in the same spot every day, and so I wanted to test this quantitatively. For a week, I marked where (longitude and latitude) and “what” she did during our walks - referring to the “what” as either “#1” or “#2” to maintain at least a hint of maturity with this analysis.
One problem I encountered during data collection was that, occasionally, I walked Lucy on a different route than normal. These walks were too infrequent for a reasonable sample size to test the 1s and 2s of dog walking, and so I started my analysis by sub-setting the data by only our normal walking route (route 1).
# Import the data
gps.dat<-read.csv("/Users/scottmorello/Dropbox/Archives/Personal/Random Analysis/Random_Personal_Work/Dog_Walk_Data/GPS_data.csv")
dinner.dat<-read.csv("/Users/scottmorello/Dropbox/Archives/Personal/Random Analysis/Random_Personal_Work/Dog_Walk_Data/Dinner_data.csv")
walk.dat<-read.csv("/Users/scottmorello/Dropbox/Archives/Personal/Random Analysis/Random_Personal_Work/Dog_Walk_Data/Walk_data.csv")
# Make "Type" a factor, since it reflects going #1 or #2
gps.dat$Type<-factor(gps.dat$Type)
# Just take the data from route 1
gps.dat.sub<-subset(gps.dat,Route==1)
My next step was plotting the actual locations of #1s and #2s on a map. The package “ggmap” is a wonderful tool that integrates easily accessible mapping services (e.g., Google maps) into “ggplot” visualizations. With “ggmap” you can easily constrain your map boundaries by providing an address and a zoom value (integer from 3 to 21 for continental to a building level scale respectively).
# Load ggplot and ggmap
library(ggplot2)
library(ggmap)
# Get the map surrounding my address (please note that I have hidden my actual address, even though you can gauge the general vicinityt from the map below).
local.map <- get_map(location = myaddress, zoom = 18)
#Plot it with the gps data
ggmap(local.map) +
geom_point(data=gps.dat.sub,aes(x =Longitude, y =Latitude, colour=Type),size=3) +
scale_colour_manual(values=c("black","red"))
The GPS data seemed very clustered, as I expected. To test this quantitatively however, we need the pairwise distances between all points. We calculate this just using euclidean distance, since at this small a spatial scale, we don’t need to account for the curvature of the earth.
# calculate the pairwise distances between gps points... we don't need to account for curviture of the earth at this resolution
dog.dist<-dist(gps.dat.sub[,c(6:7)],method = "euclidean",diag=TRUE)
dog.dist
5 6 7 8 9 10 11
5 0.000000e+00
6 4.897030e-04 0.000000e+00
7 8.440379e-05 5.310753e-04 0.000000e+00
8 3.036445e-05 4.632893e-04 8.792042e-05 0.000000e+00
9 8.444531e-04 3.561825e-04 8.799369e-04 8.172276e-04 0.000000e+00
10 5.661272e-05 4.530519e-04 8.220097e-05 2.941088e-05 8.048615e-04 0.000000e+00
11 1.041024e-03 6.635006e-04 1.037328e-03 1.010667e-03 4.816150e-04 9.870588e-04 0.000000e+00
15 7.126044e-04 8.501153e-04 6.386838e-04 6.975851e-04 1.057507e-03 6.708204e-04 8.972653e-04
16 8.741379e-04 3.866264e-04 9.085136e-04 8.467308e-04 3.178050e-05 8.339688e-04 4.659839e-04
17 1.610745e-04 4.096950e-04 1.446824e-04 1.336451e-04 7.482272e-04 1.047378e-04 8.933225e-04
18 7.483916e-04 2.586987e-04 7.882246e-04 7.219529e-04 1.050190e-04 7.112138e-04 5.448302e-04
19 1.032376e-04 3.875939e-04 1.518223e-04 7.570997e-05 7.416097e-04 6.963476e-05 9.464143e-04
20 5.524717e-04 7.353911e-05 5.983853e-04 5.272314e-04 3.038717e-04 5.189605e-04 6.613811e-04
21 1.101136e-04 3.812663e-04 1.564257e-04 8.220097e-05 7.350279e-04 7.443118e-05 9.391619e-04
22 5.708108e-04 1.010445e-04 6.193004e-04 5.462792e-04 2.962499e-04 5.391567e-04 6.739325e-04
23 2.024080e-04 3.388923e-04 2.081178e-04 1.721540e-04 6.775028e-04 1.480811e-04 8.390095e-04
27 6.203225e-05 5.037509e-04 2.973214e-05 5.948109e-05 8.539514e-04 5.269725e-05 1.021071e-03
28 1.206418e-03 8.202792e-04 1.202075e-03 1.176063e-03 5.975015e-04 1.152397e-03 1.654116e-04
15 16 17 18 19 20 21
5
6
7
8
9
10
11
15 0.000000e+00
16 1.072075e-03 0.000000e+00
17 5.905675e-04 7.757435e-04 0.000000e+00
18 1.016772e-03 1.367955e-04 6.611543e-04 0.000000e+00
19 7.028293e-04 7.711764e-04 1.127874e-04 6.462476e-04 0.000000e+00
20 9.181067e-04 3.353446e-04 4.809262e-04 2.010099e-04 4.518905e-04 0.000000e+00
21 7.002428e-04 7.645469e-04 1.097725e-04 6.398789e-04 7.280110e-06 4.458924e-04 0.000000e+00
22 9.472539e-04 3.279909e-04 5.047772e-04 1.916377e-04 4.713396e-04 2.915476e-05 4.655427e-04
23 6.183915e-04 7.052092e-04 7.117584e-05 5.900008e-04 1.181059e-04 4.105021e-04 1.117318e-04
27 6.519640e-04 8.827553e-04 1.280820e-04 7.614000e-04 1.223193e-04 5.704568e-04 1.270945e-04
28 1.024099e-03 5.759809e-04 1.058265e-03 6.756249e-04 1.111777e-03 8.121244e-04 1.104523e-03
22 23 27 28
5
6
7
8
9
10
11
15
16
17
18
19
20
21
22 0.000000e+00
23 4.347712e-04 0.000000e+00
27 5.910372e-04 1.864537e-04 0.000000e+00
28 8.220973e-04 1.004363e-03 1.186110e-03 0.000000e+00
Now that we have pairwise distances among all GPS points, we can compare the distributions of distances between each #1 and all other #1s, between each #2 and all other #2s, and between each #1 and all #2s. This partitions all the variance in distance among all points into variation within each group (#1 or #2), and between each group, telling us if #1s or #2s quantitatively cluster closer to themselves than they do to each other.
We summarize the distance data in a figure below, showing the mean distance for each comparison (#1s vs #1s, #2s vs #2s, #1s vs #2s) along with 95% confidence intervals. If the confidence intervals do not overlap, the distances are different.
# Figure out which gps points corrorspond to Lucy going #1 or #2
dog.1<-which(gps.dat.sub$Type==1)
dog.2<-which(gps.dat.sub$Type==2)
# make a dataset for #1 vs #1, #2 vs #2, and #1 vs #2 distances
dog.dist.1.1<-as.matrix(dog.dist)[dog.1,dog.1]
dog.dist.1.1<-dog.dist.1.1[lower.tri(dog.dist.1.1,diag=FALSE)]
dog.dist.1.1<-as.vector(dog.dist.1.1)
dog.dist.2.2<-as.matrix(dog.dist)[dog.2,dog.2]
dog.dist.2.2<-dog.dist.2.2[lower.tri(dog.dist.2.2,diag=FALSE)]
dog.dist.2.2<-as.vector(dog.dist.2.2)
dog.dist.1.2<-as.matrix(dog.dist)[dog.1,dog.2]
dog.dist.1.2<-dog.dist.1.2[lower.tri(dog.dist.1.2,diag=FALSE)]
dog.dist.1.2<-as.vector(dog.dist.1.2)
dog.dist.all<-rbind(data.frame(Comaprison=rep("#1 vs #1",times=length(dog.dist.1.1)),Distance=dog.dist.1.1),
data.frame(Comaprison=rep("#2 vs #2",times=length(dog.dist.2.2)),Distance=dog.dist.2.2),
data.frame(Comaprison=rep("#1 vs #2",times=length(dog.dist.1.2)),Distance=dog.dist.1.2))
#Now summarize the data by mean and Standard Error so we can calculate 95% confidence intervals
library(plyr)
dog.dist.all.sum<-ddply(dog.dist.all,.(Comaprison),summarize,Mean_Distance=mean(Distance),SE=(sd(Distance)/sqrt(length(Distance))))
ggplot(dog.dist.all.sum,aes(x=Comaprison,y=Mean_Distance,ymin=(Mean_Distance-(1.96*SE)),ymax=(Mean_Distance+(1.96*SE))))+
geom_pointrange(size=1.5)+
ylab("Distance (mean +/- 95% CI)")+
theme_bw()
The results tell us that, indeed, Lucy did #1s and #2s close to the same place every time, although #2s were somewhat more dispersed (slightly larger distance between them). The locations for #1s and #2s differed though, and based on the map, it seems she does #1s on the southern end of the block, and #2s on the northern end of the block. The results could easily be influenced by the direction I walked Lucy, which was south to north. It’s conceivable that she just does #1s first, #2s second, and that the time it takes her to realize she needs to do either is pretty consistent. Based on general dog behavior though, dogs prefer to go #1 where other dogs have previously. Lucy could be stuck in an loop where she’s just smelling her #1 from the prior walk, and deciding to mark the same spot.
OK… I think that’s more than enough discussion of my dog’s bathroom habits. Hope you enjoyed the analysis!