R Markdown file for Session3 - Store Data dataset

This is an R Markdown document which contains commands used in session 3 for data set handling

First, set the working directory to the location of your dataset and R code

setwd("/home/users/shikharkohli/code/DAM");

Next, read the data from the dataset, and store in the data term

store.df<-read.csv("datasets/StoreData.csv")

Display the storeNum column

table(store.df$storeNum);

Display all the years and weeks in the dataset

unique(store.df$Year);
unique(store.df$Week);

Summarise the sales and draw a boxplot of the same

summary(store.df$p1sales);
boxplot(store.df$p1sales, xlab="Product 1 Sale", ylab="p1", main="Sale of product 1", horizontal = TRUE);

Use the aggregate function to split the dataset based on parameters specified in the second param. Additionally, compute summary stats for the same

## total sales of product 2 by country
aggregate(store.df$p2sales, by=list(country=store.df$country),sum);
## Average sales of project 1 by store
aggregate(store.df$p1sales, by=list(StoreID = store.df$storeNum),mean);
## Average sales of project 1 by store

Use the by function to split the data set

by(store.df$p1sales, store.df$storeNum, mean);
# Average sales of project 1 by store and year (2001,2002)
by(store.df$p1sales, list(store.df$storeNum, store.df$Year),mean);

Apply a user defined function to segments of the data set

# Average of store.df columns 2-9
apply(store.df[, 2:9],MARGIN = 2, FUN = mean);
#Applying a user defined function
apply(store.df[, 2:9], 2, function(x) {mean(x) - median(x)});

View(store.df);