Objective

To undertsand how a functions can be interlinked ##set working directory## setwd(“F:/R-BA/scripts/data”)

read activity file

df <- read.csv(“activity.csv”, header=T, stringsAsFactors=F) View(df)

counting NA in the data

df1count <- sum(is.na(df$steps)) df1count View(df1count)

Method 2 to find out mean according to Interval

library(dplyr) df1<- subset(df, steps !=“NA”) df1 <- summarise(group_by(df1, interval), steps=mean(steps))

Data Imputation process Method 3

replace_na <- function(step, interval) { ifelse(is.na(step), df1[df1$interval == interval, ]$steps, step) }

dfnew <- df dfnew$steps <- mapply(replace_na, dfnew$steps, dfnew$interval) head(dfnew)

df2count after data imputation

df2count <-sum(is.na(dfnew$steps)) df2count View(df2count)

summary

The file contained 17569 values which had 2304 NA’s after filtering the data in R using the code two data frames were created which had one data frame which had only 2304 NA’s In the original df file means were given of some values and hence using group by I found the means of all the NA values .Later by using Replace I substituted all the values of NA’S and hence created a new data frame which had all the mean values of NA’s

Assignment 4

Yashvi