Understand Basic R Concept
Reading A File / Writing A File
Data Imputation
knitr Global Options
# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
cat("\014")
setwd("/Users/snehakshatriya/Desktop/R-BA/R-Scripts")
dfrActivity<- read.csv("./data/activity.csv", header=T, stringsAsFactors=F)
#dfrActivity <- data.table(dfrActivity)
nrow(dfrActivity)
## [1] 17568
dfrNA<- sapply(dfrActivity, function(x) sum(is.na(x)))
dfrNA<- as.data.frame(dfrNA)
class(dfrNA)
## [1] "data.frame"
View(dfrNA)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
dfrmean <- summarise(group_by(dfrActivity,interval),mean(steps,na.rm=TRUE))
View(dfrmean)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
cat("\014")
dfr1 <- mutate(dfrActivity,
MEANVALUE=(NA)
)
View(dfr1)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
dfr1$steps[is.na(dfr1$steps)] <- dfrmean$`mean(steps, na.rm = TRUE)`
View(dfr1)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
dfr1$MEANVALUE <- dfr1$steps
View(dfr1)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
dfrNA2<- sapply(dfr1, function(x) sum(is.na(x)))
dfrNA2<- as.data.frame(dfrNA2)
class(dfrNA2)
## [1] "data.frame"
View(dfrNA2)
## Error in check_for_XQuartz(): X11 library is missing: install XQuartz from xquartz.macosforge.org
Initially the column steps had 2304 NA values.
At the end of the process the data has zero NA values
Approach Used
First identifying the NA values
Finding the mean value of steps by grouping on the basis of interval
Creating a new data frame with a new coloumn called mean value having NA values
Changing the NA value of the steps to the corresponding mean values based on dfr(mean)
Removing the NA values in the MEANVALUE column of dfr1 by making it equal to the value in the steps column of data frame
Finally checking the data set again for errors