First R markdown Project

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Here we work with the Sales dataset of an online retailer and will generate very simple reports regarding the summary statistics. These are some of the factors of this dataset:

head(sales.df)

##        age gender   income kids ownHome subscribe    Segment
## 1 47.31613   Male 49482.81    2   ownNo     subNo Suburb mix
## 2 31.38684   Male 35546.29    1  ownYes     subNo Suburb mix
## 3 43.20034   Male 44169.19    0  ownYes     subNo Suburb mix
## 4 37.31700 Female 81041.99    1   ownNo     subNo Suburb mix
## 5 40.95439 Female 79353.01    3  ownYes     subNo Suburb mix
## 6 43.03387   Male 58143.36    4  ownYes     subNo Suburb mix

Lets have a look at some of its variables by using the “str” function and calculating the summary statistics.

str(sales.df)

## 'data.frame':    300 obs. of  7 variables:
##  $ age      : num  47.3 31.4 43.2 37.3 41 ...
##  $ gender   : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 2 1 1 ...
##  $ income   : num  49483 35546 44169 81042 79353 ...
##  $ kids     : int  2 1 0 1 3 4 3 0 1 0 ...
##  $ ownHome  : Factor w/ 2 levels "ownNo","ownYes": 1 2 2 1 2 2 1 1 1 2 ...
##  $ subscribe: Factor w/ 2 levels "subNo","subYes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Segment  : Factor w/ 4 levels "Moving up","Suburb mix",..: 2 2 2 2 2 2 2 2 2 2 ...

Now working with the summary statistics

describe(sales.df)

##            vars   n     mean       sd   median  trimmed      mad      min
## age           1 300    41.20    12.71    39.49    40.40    10.43    19.26
## gender*       2 300     1.48     0.50     1.00     1.47     0.00     1.00
## income        3 300 50936.54 20137.55 52014.35 50661.88 16186.80 -5183.35
## kids          4 300     1.27     1.41     1.00     1.07     1.48     0.00
## ownHome*      5 300     1.47     0.50     1.00     1.46     0.00     1.00
## subscribe*    6 300     1.13     0.34     1.00     1.04     0.00     1.00
## Segment*      7 300     2.37     1.02     2.00     2.33     1.48     1.00
##                  max     range skew kurtosis      se
## age            80.49     61.23 0.56    -0.17    0.73
## gender*         2.00      1.00 0.09    -2.00    0.03
## income     114278.26 119461.61 0.14     0.36 1162.64
## kids            7.00      7.00 1.11     0.91    0.08
## ownHome*        2.00      1.00 0.12    -1.99    0.03
## subscribe*      2.00      1.00 2.15     2.62    0.02
## Segment*        4.00      3.00 0.17    -1.09    0.06

Working with some plots now

par(mfrow=c(1,1))
histogram(~subscribe | Segment, data = sales.df,
          layout=c(4,1),type="count",
          col= c("burlywood","darkolivegreen"))

A histogram of count of subscribers for different segments

A more complex plot with 3 discrete variables

par(mfrow=c(1,1))
histogram(~subscribe | Segment + ownHome, data = sales.df,
          layout=c(4,2),type="count",
          col= c("burlywood" ,"red"))

Histogram of count subscribers from different segments owing a cable TV subscription from our Company

Now working on continous variables

par(mfrow=c(1,1))
seg.mean1 <- aggregate(income ~ Segment + ownHome,data = sales.df,mean)

barchart( income ~ Segment, data = seg.mean1,groups=ownHome, 
          auto.key=TRUE,par.settings=simpleTheme(col=c("gray95","gray50")))

Plot of income vs segment and whether they have a subscription or not

Some box plots to end

bwplot(Segment ~ income | ownHome, data= sales.df,horizontal=TRUE,
       xlab="INCOME")

2 boxplots depicting whether you own a subcription across different segments