Business Analytics Using R

Project: Activity Analysis

Objectives of this Assignment

Understand Basic R Concept
Reading A File / Writing A File
Data Imputation

knitr Global Options

# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')

Load Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Reading the activity csv file

cat("\014")

setwd("/Users/snehakshatriya/Desktop/R-BA/R-Scripts")
dfrActivity<- read.csv("./data/activity.csv", header=T, stringsAsFactors=F)
#dfrActivity <- data.table(dfrActivity)
nrow(dfrActivity)
## [1] 17568

Number of NA records before data imputing process.

dfrNA<- sapply(dfrActivity, function(x) sum(is.na(x)))
dfrNA<- as.data.frame(dfrNA)
class(dfrNA)
## [1] "data.frame"
View(dfrNA)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

Find mean dropping NA values.

dfrmean <- summarise(group_by(dfrActivity,interval),mean(steps,na.rm=TRUE))
View(dfrmean)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

Creating a new dfr with a new column MEANVALUE

cat("\014")

dfr1 <- mutate(dfrActivity,
               MEANVALUE=(NA)
)
View(dfr1)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

Changing NA values in steps to corresponding mean values

dfr1$steps[is.na(dfr1$steps)] <- dfrmean$`mean(steps, na.rm = TRUE)`
View(dfr1)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

Removing the NA in MEANVALUE column

dfr1$MEANVALUE <- dfr1$steps
View(dfr1)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

Checking again for NA values

dfrNA2<- sapply(dfr1, function(x) sum(is.na(x)))
dfrNA2<- as.data.frame(dfrNA2)
class(dfrNA2)
## [1] "data.frame"
View(dfrNA2)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org

Summary

Initially the column steps had 2304 NA values.
At the end of the process the data has zero NA values

Approach Used
First identifying the NA values
Finding the mean value of steps by grouping on the basis of interval
Creating a new data frame with a new coloumn called mean value having NA values
Changing the NA value of the steps to the corresponding mean values based on dfr(mean)
Removing the NA values in the MEANVALUE column of dfr1 by making it equal to the value in the steps column of data frame
Finally checking the data set again for errors