This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Here we work with the Sales dataset of an online retailer and will generate very simple reports regarding the summary statistics. These are some of the factors of this dataset:
head(sales.df)
## age gender income kids ownHome subscribe Segment
## 1 47.31613 Male 49482.81 2 ownNo subNo Suburb mix
## 2 31.38684 Male 35546.29 1 ownYes subNo Suburb mix
## 3 43.20034 Male 44169.19 0 ownYes subNo Suburb mix
## 4 37.31700 Female 81041.99 1 ownNo subNo Suburb mix
## 5 40.95439 Female 79353.01 3 ownYes subNo Suburb mix
## 6 43.03387 Male 58143.36 4 ownYes subNo Suburb mix
Lets have a look at some of its variables by using the “str” function and calculating the summary statistics.
str(sales.df)
## 'data.frame': 300 obs. of 7 variables:
## $ age : num 47.3 31.4 43.2 37.3 41 ...
## $ gender : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 2 1 1 ...
## $ income : num 49483 35546 44169 81042 79353 ...
## $ kids : int 2 1 0 1 3 4 3 0 1 0 ...
## $ ownHome : Factor w/ 2 levels "ownNo","ownYes": 1 2 2 1 2 2 1 1 1 2 ...
## $ subscribe: Factor w/ 2 levels "subNo","subYes": 1 1 1 1 1 1 1 1 1 1 ...
## $ Segment : Factor w/ 4 levels "Moving up","Suburb mix",..: 2 2 2 2 2 2 2 2 2 2 ...
Now working with the summary statistics
describe(sales.df)
## vars n mean sd median trimmed mad min
## age 1 300 41.20 12.71 39.49 40.40 10.43 19.26
## gender* 2 300 1.48 0.50 1.00 1.47 0.00 1.00
## income 3 300 50936.54 20137.55 52014.35 50661.88 16186.80 -5183.35
## kids 4 300 1.27 1.41 1.00 1.07 1.48 0.00
## ownHome* 5 300 1.47 0.50 1.00 1.46 0.00 1.00
## subscribe* 6 300 1.13 0.34 1.00 1.04 0.00 1.00
## Segment* 7 300 2.37 1.02 2.00 2.33 1.48 1.00
## max range skew kurtosis se
## age 80.49 61.23 0.56 -0.17 0.73
## gender* 2.00 1.00 0.09 -2.00 0.03
## income 114278.26 119461.61 0.14 0.36 1162.64
## kids 7.00 7.00 1.11 0.91 0.08
## ownHome* 2.00 1.00 0.12 -1.99 0.03
## subscribe* 2.00 1.00 2.15 2.62 0.02
## Segment* 4.00 3.00 0.17 -1.09 0.06
Working with some plots now
par(mfrow=c(1,1))
histogram(~subscribe | Segment, data = sales.df,
layout=c(4,1),type="count",
col= c("burlywood","darkolivegreen"))
A histogram of count of subscribers for different segments
A more complex plot with 3 discrete variables
par(mfrow=c(1,1))
histogram(~subscribe | Segment + ownHome, data = sales.df,
layout=c(4,2),type="count",
col= c("burlywood" ,"red"))
Histogram of count subscribers from different segments owing a cable TV subscription from our Company
Now working on continous variables
par(mfrow=c(1,1))
seg.mean1 <- aggregate(income ~ Segment + ownHome,data = sales.df,mean)
barchart( income ~ Segment, data = seg.mean1,groups=ownHome,
auto.key=TRUE,par.settings=simpleTheme(col=c("gray95","gray50")))
Plot of income vs segment and whether they have a subscription or not
Some box plots to end
bwplot(Segment ~ income | ownHome, data= sales.df,horizontal=TRUE,
xlab="INCOME")
2 boxplots depicting whether you own a subcription across different segments