Overview

This report conducts data analysis of Xiaomi Mi Band’s data from android sqlite database.

Data

For personalized purpose, the data can be obtained from android phone’s path “/data/data/com.xiaomi.hm.health”. Use RootExplorer to copy the whole directory named “databases” to your computer. Data used for this report can be accessed from github

Load Libraries

if(!"MiBand" %in% installed.packages()){
  devtools::install_github('MiBand_R_Package','BigBorg')
}
library(MiBand)
library(ggplot2)
library(plotly)

Load data

I’ve already packaged my code for data reading and cleaning inside MiBand package. If you are interested in the code inside the package, you can access them from my github repository.

MiData <- loadMiData("./data/databases","963276123")

Exploratory Data Analysis

str(MiData)
## List of 3
##  $ data_clean:'data.frame':  193 obs. of  6 variables:
##   ..$ date       : Date[1:193], format: "2015-11-03" ...
##   ..$ sleep.light: int [1:193] NA 286 247 238 290 278 284 214 251 229 ...
##   ..$ sleep.deep : int [1:193] NA 106 153 167 134 162 151 179 178 169 ...
##   ..$ step       : int [1:193] 1015 13743 11000 14548 10582 10334 10652 16382 9026 8471 ...
##   ..$ efficiency : num [1:193] NaN 0.27 0.383 0.412 0.316 ...
##   ..$ weekday    : Factor w/ 7 levels "Sunday","Monday",..: 6 7 5 1 3 4 2 6 7 5 ...
##  $ data_week :'data.frame':  193 obs. of  6 variables:
##   ..$ date       : Date[1:193], format: "2015-11-03" ...
##   ..$ sleep.light: num [1:193] 289 286 247 238 290 ...
##   ..$ sleep.deep : num [1:193] 126 106 153 167 134 ...
##   ..$ step       : int [1:193] 1015 13743 11000 14548 10582 10334 10652 16382 9026 8471 ...
##   ..$ efficiency : num [1:193] 0.288 0.27 0.383 0.412 0.316 ...
##   ..$ weekday    : Factor w/ 7 levels "Sunday","Monday",..: 6 7 5 1 3 4 2 6 7 5 ...
##  $ avg_week  :Classes 'tbl_df', 'tbl' and 'data.frame':  7 obs. of  5 variables:
##   ..$ weekday    : Factor w/ 7 levels "Sunday","Monday",..: 1 2 3 4 5 6 7
##   ..$ sleep.light: num [1:7] 271 272 307 319 293 ...
##   ..$ sleep.deep : num [1:7] 124 128 132 128 126 ...
##   ..$ step       : num [1:7] 7621 9002 6936 5757 7110 ...
##   ..$ efficiency : num [1:7] 0.313 0.325 0.303 0.298 0.304 ...
head(MiData$data_clean)
##         date sleep.light sleep.deep  step efficiency   weekday
## 1 2015-11-03          NA         NA  1015        NaN    Friday
## 2 2015-11-04         286        106 13743  0.2704082  Saturday
## 3 2015-11-05         247        153 11000  0.3825000  Thursday
## 4 2015-11-06         238        167 14548  0.4123457    Sunday
## 5 2015-11-07         290        134 10582  0.3160377   Tuesday
## 6 2015-11-08         278        162 10334  0.3681818 Wednesday

MiData is a list. Its element “data_clean” contains missing data, “data_week” is data with missing value substituted with mean value of the same day in week(mean value groupped by weekday).

datedur<-range(MiData$data_week$date)
nrow<-nrow(MiData$data_week)
summary(MiData$data_week$sleep.light)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    61.0   257.0   288.0   292.3   325.0   468.0
summary(MiData$data_week$sleep.deep)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    44.0   109.0   126.0   126.8   146.0   205.0
summary(MiData$data_week$step)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     642    4750    7872    7574   10410   19760

The date frame records data frome 2015-11-03 to 2016-05-13 and has 193 rows. Sleep duration is recorded as count of minutes.

Plotting:
Histogram

ggplotly(miPlot(MiData,"hist","sleep"))

ggplotly(miPlot(MiData,"hist","step"))

ggplotly(miPlot(MiData,"box","sleep"))

ggplotly(miPlot(MiData,"ts","sleep"))

ggplotly(miPlot(MiData,"ts","step"))

Time Sequence Analysis

Time sequence analysis on steps:

weekly_ts_analysis <- function(data){
        tsobj <- ts(data,start=1,frequency=7)
        components <- decompose(tsobj)
        plot(components)
}
weekly_ts_analysis(MiData$data_week$step)

Time sequence analysis on deep sleep:

weekly_ts_analysis(MiData$data_week$sleep.deep)

Time sequence analysis on light sleep:

weekly_ts_analysis(MiData$data_week$sleep.light)

Which day of a week has the highest value?

ggplotly(miPlot(MiData,"week","sleep"))

ggplotly(miPlot(MiData,"week","step"))

Is the step value of scool days different from that of vacation.

MiData$data_week$month<-months(MiData$data_week$date)
vacation<-MiData$data_week[MiData$data_week$month %in% c("January","February","July","August"),]
schoolday<-MiData$data_week[!MiData$data_week$month %in% c("January","February","July","August"),]
boxplot(vacation$step,schoolday$step,names = c("vacation","school"))
title(main="Step")

As shown in the boxplot, mean step of school day is higher than that of vacation.

set.seed(0)
schoolresample<-matrix(sample(schoolday$step,1000,replace=T),nrow=100)
schoolmean<-apply(schoolresample,1,mean)
vacationresample<-matrix(sample(vacation$step,1000,replace = T),nrow = 100)
vacationmean<-apply(vacationresample,1,mean)
testresult<-t.test(schoolmean,vacationmean)
difference<-mean(schoolmean)-mean(vacationmean)

We are 1-3.749933910^{-57} confident to say step of school day is different from that of vacation. The mean difference is 3639.006(school Mean - Vacation Mean).

Is there corelationship between sleep and step?

MiData$data_week$efficiency<-with(MiData$data_week,sleep.deep/(sleep.deep+sleep.light))
cors<-with(MiData$data_week,c(
        cor(step,sleep.light),
        cor(step,sleep.deep),
        cor(step,sleep.light+sleep.deep),
        cor(step,efficiency)
        )
)
names(cors)<-c("step-sleep.light","step-sleep.deep","step-total sleep","step-efficiency")
cors
## step-sleep.light  step-sleep.deep step-total sleep  step-efficiency 
##     -0.271091602     -0.007771756     -0.252226057      0.139983026

Corelationship indicates that the longer you sleep, the fewer you are likely to walk. But such corelationship is quit weak. Note that within one row, which means on the same day, step of that day is recorded after sleep.

At which point of total sleep you get highest sleep efficiency?

# Use manipulate if you are copy-pasting code to R studio environment
# manipulate({
#        Y<-predict(loess(effciiency~I(sleep.light+sleep.deep),data=MiData$data_week),M)
#        ggplot(data=MiData$data_week,aes(sleep.light+sleep.deep,efficiency))+
#                geom_point()+geom_smooth(method="auto")+
#                geom_vline(x=M)+labs(x="Total sleep")+labs(title=paste("Efficiency: 
# ",Y,sep=""))
#    },
#    M=slider(
#            min(MiData$data_week$sleep.light+MiData$data_week$sleep.deep),
#            max(MiData$data_week$sleep.light+MiData$data_week$sleep.deep),
#            initial = min(MiData$data_week$sleep.light+MiData$data_week$sleep.deep)
#            )
#)
        ggplotly(ggplot(data=MiData$data_week,aes(sleep.light+sleep.deep,efficiency))+
                geom_point()+geom_smooth(method="auto")+
                labs(title="Efficiency"))

The efficiency is extremely high when the total sleep is very small. That might be the body trying to compensate loss of total sleep time by increasing ratio of deep sleep. Though efficiency is high when you sleep for short time, deep sleep duration is not sufficent. As total sleep increase, we see a local high efficiency. Then efficiency goes down when you sleep for too long.

Use total light sleep and deep sleep to predict step

   coefs<-summary(lm(step~sleep.light+sleep.deep,data=MiData$data_week))$coefficient

With deep sleep fixed, one minute increase of light sleep leads to -16.3624989 change of step. With light sleep fixed, one minute increase of deep sleep leads to -1.9898671 change of step.

Conclusion

The subject sleep longer on Wednesday and walk more on Monday. Step of school day is different from that of vacation. There is a weak corelation between sleep and step. Around 7 hours’ sleep has the relative high efficiency of sleep(deep sleep/total sleep).