Fewer than 50% of American people meet the minimum guidelines for moderate physical activity, but walking is the easiest and most affordable way to solve this problem, so we’d like to cast a project on the walking styles of our team members.
For this project, we are going to analyze the daily steps of three group members from 2019-9-16 to 2019-12-16. We are using data visualizations to explore the following questions.
What is the trend of daily steps of each three members? Is there a big gap?
Who has the most average daily steps?
Who has the most monthly steps?
Is daily steps related to distance from home to work?
Would holiday exert an influence on daily steps?
We acquired data through Health, an iPhone-based App that keeps track of our steps. All three members exported the raw xml data from their phones and converted them to csv. Given that iPhone recorded steps on a hourly basis, we used pivot table to calculate daily steps. We then cleaned unnecessary data such as time stamps and data type. At this point, we were left with date, source (aka.each member’s name) and steps. Finally, we combined our files.
In order to help readers get a more comprehensive and clearer understanding of the data, we are using different charts for different questions.
What is the trend of daily steps of each three members? Is there a big gap? Line chart, which is a good way to display time-series data.
Who has the most average daily steps? Bar chart with error bars.
Who has the most monthly steps? Pie chart, which is a nice way display percentages.
Is daily steps related to distance from home to work? Bar chart and scatter plot. We are going to use the steps and distance as two variables of interest.
Would holiday exert an influence on daily steps? Bar chart. We are going to select the daily steps of 9/28, 10/28, and 11/28 of three members and make a comparison.
---
title: "Data Visualization: Steps Analysis"
output:
flexdashboard::flex_dashboard:
source_code: embed
---
Summary
=========================================
Fewer than 50% of American people meet the minimum guidelines for moderate physical activity, but walking is the easiest and most affordable way to solve this problem, so we’d like to cast a project on the walking styles of our team members.
For this project, we are going to analyze the daily steps of three group members from 2019-9-16 to 2019-12-16. We are using data visualizations to explore the following questions.
- What is the trend of daily steps of each three members? Is there a big gap?
- Who has the most average daily steps?
- Who has the most monthly steps?
- Is daily steps related to distance from home to work?
- Would holiday exert an influence on daily steps?
We acquired data through Health, an iPhone-based App that keeps track of our steps. All three members exported the raw xml data from their phones and converted them to csv. Given that iPhone recorded steps on a hourly basis, we used pivot table to calculate daily steps. We then cleaned unnecessary data such as time stamps and data type. At this point, we were left with date, source (aka.each member's name) and steps. Finally, we combined our files.
In order to help readers get a more comprehensive and clearer understanding of the data, we are using different charts for different questions.
- What is the trend of daily steps of each three members? Is there a big gap?
Line chart, which is a good way to display time-series data.
- Who has the most average daily steps?
Bar chart with error bars.
- Who has the most monthly steps?
Pie chart, which is a nice way display percentages.
- Is daily steps related to distance from home to work?
Bar chart and scatter plot. We are going to use the steps and distance as two variables of interest.
- Would holiday exert an influence on daily steps?
Bar chart. We are going to select the daily steps of 9/28, 10/28, and 11/28 of three members and make a comparison.
Analysis {.storyboard}
=========================================
```{r setup, include=FALSE}
library(flexdashboard)
library(ggplot2)
library(readr)
library(dygraphs)
library(dplyr)
library(xts)
library(wesanderson)
library(ggthemes)
library(lubridate)
library(ggpubr)
stepsall<-read_csv("C:/Users/Yuxuan Qiu/Desktop/HU/Homework/stepsall.csv")
```
---
### First and for most, let's track daily steps of each group member!
```{r}
stepsall$date<-as.Date(stepsall$date,format="%m/%d/%Y")
q<-stepsall%>%
filter(source=="Yuxuan Qiu")
q<-arrange(q,date)
t<-stepsall%>%
filter(source=="Yongting Tan")
t<-arrange(t,date)
h<-stepsall%>%
filter(source=="Ruting Huang")
h<-arrange(h,date)
all<-data.frame(date=q$date,Yuxuan=q$steps,Yongting=t$steps,Ruting=h$steps)
allxts<-xts(x=all[,-1],order.by=all$date)
dygraph(allxts,main="Daily Steps From 2019/9/16 to 2019/12/16",ylab="Steps",xlab="Date")%>%
dyLegend(width=400)%>%
dyRangeSelector()
```
***
- It can be observed that Yongting usually ranked first in terms of daily steps. This is probably due to her being a New Yorker and therefore public transportation is her primary way to commute.
- Yuxuan, however, had the least daily steps for most of the time. She lives in Ohio and drives a
lot.
- Ruting, who resides in Chicago, was in medium place. According to her, she would either walk or drive. In this case, we assume that steps might be affected by ways to commute.
### What does average daily steps for each group member look like? Who heads the first?
```{r}
avg<-ggplot(stepsall,aes(x=source,y=steps,fill=source))
avg<-avg+stat_summary(fun.y=mean, geom="bar",position="dodge")+stat_summary(fun.data=mean_cl_normal, geom="errorbar", position=position_dodge(width=0.5),width=0.2)+theme_hc()+xlab("Member")+ylab("Average Daily Steps")+scale_fill_manual(values=wes_palette("Cavalcanti1"))+theme(legend.position="none")
avg
```
***
- Undoubtedly, Yongting Tan ranks the first speaking of daily steps. The result is pretty much the same as what we observed from the prior line chart.
- Futhermore, we calculated the precise average steps for each of us. Yongting Tan scores 5293.88, Ruting Huang scores 2449.53, and Yuxuan Qiu scores 1954.62.
### Let's view steps on a monthly basis. Who is the monthly steps champion?
```{r}
stepsall2<-stepsall%>%
filter(date>="2019-9-16"&date<"2019-10-16")%>%
mutate(month= "Sep to Oct")
stepsall3<-stepsall%>%
filter(date>="2019-10-16"&date<"2019-11-16")%>%
mutate(month= "Oct to Nov")
stepsall4<-stepsall%>%
filter(date>="2019-11-16"&date<="2019-12-16")%>%
mutate(month= "Nov to Dec")
stepsall_2<-rbind(stepsall2,stepsall3,stepsall4)
stepsall_2$month<-factor(stepsall_2$month,levels=c("Sep to Oct","Oct to Nov","Nov to Dec"))
k<-ggplot(stepsall_2)+geom_col(aes(x=1,y=steps,fill=source),position="fill")+coord_polar(theta="y")+scale_fill_manual(values=wes_palette("Moonrise3"))
k1<-k+facet_wrap(~month)+theme_bw()+theme(axis.title=element_blank(),axis.text=element_blank(),axis.ticks=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),panel.border=element_blank())+guides(fill=guide_legend(title="Member"))+ggtitle("Monthly Steps for Each Group Member")
k1
```
***
- Now we're viewing steps in terms of month. Basically, we summed up daily steps of each member in time windows of September 19th, 2019 to October 15th, 2019, October 16th, 2019 to November 15h, 2019 and November 16th, 2019 to December 16th, 2019.
- Just as what we concluded from the line chart, Yongting won the monthly championship in a row. In the second period, Yuxuan and Ruting seems to have equal shares. While aside of that, Ruting is the second the Yuxuan is the last.
### Does daily steps have anything to do with distance from home to work? If yes, how?
```{r}
distance<-data.frame("source"=c("Ruting Huang","Yongting Tan","Yuxuan Qiu"),"distance"=c(1.5,11,12))
averagesteps<-stepsall%>%
group_by(source)%>%
summarise(average=mean(steps))
distance1<-merge(distance,averagesteps,by="source")
dis<-ggplot(distance,aes(x=source,y=distance,fill=source))+geom_bar(stat="identity",width=0.8)+theme_hc()+xlab("Member")+ylab("Distance (Miles)")+ylim(0,12)+scale_fill_manual(values=wes_palette("Cavalcanti1", 5))+theme(legend.position="none")+theme(axis.text.x=element_text(angle=45,hjust=1))+scale_fill_manual(values=wes_palette("Darjeeling1"))
avg1<-avg+theme(axis.text.x=element_text(angle=45,hjust=1))+scale_fill_manual(values=wes_palette("Darjeeling1"))
dot<-ggplot(distance1,aes(x=distance,y=average))+geom_point(shape=8)+stat_smooth(method="lm", se=FALSE,lty=2)+xlab("Distance")+ylab("Average Daily Steps")+theme_hc()+ylim(1500,5500)
figure<-ggarrange(avg1, dis,dot,ncol=3,nrow=1)
figure
```
***
- In the analysis of the first line chart, we suggested that daily steps might be affected by whether one lives far from workplace. To further investigate this, we collected the distance from home to work of each member.
- From the bar charts, we could observe that the distance from home to work for Yuxuan Qiu and Yongting Tan is almost the same, around 11 miles, and that the distance from home to work for Ruting Huang is much less.
- The scatter plot displays the relationship between average steps and distance. Considering the sample size of three, the regression line should not have much reliability. From the distribution of dots we may conclude that at least for the three group members, daily steps has no significant relationship with the distance from home to work.
### Does Holiday have any influence on daily steps? Would it witness a significant increse since it's usually a time for fmaily outing and shopping?
```{r}
m<-stepsall%>%
filter(date=="2019-9-28"|date=="2019-10-28"|date=="2019-11-28")
n<-ggplot(m,aes(x=source,y=steps,fill=source))+geom_bar(stat="identity",width=0.6)
n+scale_fill_manual(values=wes_palette("GrandBudapest1"))+xlab("Member")+ylab("Steps")+theme_hc()+theme(axis.text.x=element_text(angle=45,hjust=1))+theme(legend.position="none")+facet_wrap(~date)
```
***
- We chose Thanksgiving day, which fell on 11/28/2019, as an example. For comparison, we pulled data on 10/28/2019 and 9/28/2019 at the same time.
- To our suprise, none of us experienced any assumed huge increase in daily steps. We figured out later that we all chose to stay at home.
- Yuxuan had the most steps on that day, a bit over 2000. This is because she went shopping groceries for Thanksgiving dinner.
- Therefore, we conclude that at least for the Thanksgiving day in 2019, holiday would not raise our daily steps in any way. In contrast, we would observe a decrease as we would rather stay indoors.