title: “Business Analytics Lab Worksheet 04 (bsad_lab04)”
author: “Sahithya Mahadevan”
date: “February 19th, 2019”
output:
html_document: default
html_notebook: default
pdf_document: default
subtitle: CME Group Foundation Business Analytics Lab

About

Qualitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.

Quantitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.

In this lab, we will explore both analytics using the data set provided.

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

===========================================

PART I: QUANTITATIVE ANALYSIS

Begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure it is read in correctly.

mydata = read.csv(file="data/marketing.csv")
head(mydata)
##   case_number sales radio paper  tv pos
## 1           1 11125    65    89 250 1.3
## 2           2 16121    73    55 260 1.6
## 3           3 16440    74    58 270 1.7
## 4           4 16876    75    82 270 1.3
## 5           5 13965    69    75 255 1.5
## 6           6 14999    70    71 255 2.1

Now calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘radio’.

maxRadio = max(mydata$radio) #This computes the maximum value in radio data and put it in variable maxRadio.
maxRadio   #This calls variable maxRadio to display its value. 
## [1] 89
minRadio = min(mydata$radio) #This calculates the minimum value in radio data and put it in variable minRadio.
minRadio  #This calls variable minRadio to display its value.
## [1] 65
rangeRadio = maxRadio-minRadio #This calculates the rangeRadio which is the difference between max and min of the radio data.
rangeRadio #This displays the rangeRadio value.
## [1] 24
meanRadio = mean(mydata$radio)  #This calculates the mean of radio data.
meanRadio  #This displays the meanRadio value.
## [1] 76.1
sdRadio = sd(mydata$radio)  #This calculates the standard deviation of radio data.
sdRadio #This displays the sdRadio value.
## [1] 7.354912
varRadio = var(mydata$radio) #This calculates the variance of radio data.
varRadio  #This displays the varRadio value.
## [1] 54.09474

TASK 1A: Repeat the calculations of max, min, range, mean, standard deviation, and variance for the data in paper column. Name the variables as maxPaper, minPaper, rangePaper, meanPaper, sdPaper, and varPaper consecutively. Next, call all of these variables to display their values.

maxPaper = max(mydata$paper)
maxPaper
## [1] 89
minPaper = min(mydata$paper)
minPaper
## [1] 35
rangePaper = maxPaper - minPaper
rangePaper
## [1] 54
meanPaper = mean(mydata$paper)
meanPaper
## [1] 62.3
sdPaper = sd(mydata$paper)
sdPaper
## [1] 15.35921
varPaper = var(mydata$paper)
varPaper
## [1] 235.9053

TASK 1B: Repeat the calculations of max, min, range, mean, standard deviation, and variance for the data in tv column. Name the variables as maxTV, minTV, rangeTV, meanTV, sdTV, and varTV consecutively. Next, call all of these variables to display their values.

maxTV = max(mydata$tv)
maxTV
## [1] 280
minTV = min(mydata$tv)
minTV
## [1] 250
rangeTV = maxTV - minTV
rangeTV
## [1] 30
meanTV = mean(mydata$tv)
meanTV
## [1] 266.6
sdTV = sd(mydata$tv)
sdTV
## [1] 11.3388
varTV = var(mydata$tv)
varTV
## [1] 128.5684

TASK 2A: Repeat the calculations of max, min, range, mean, standard deviation, and variance for the data in pos column. Name the variables as maxPOS, minPOS, rangePOS, meanPOS, sdPOS, and varPOS consecutively. Next, call all of these variables to display their values.

maxPOS = max(mydata$pos)
maxPOS
## [1] 3
minPOS = min(mydata$pos)
minPOS
## [1] 0
rangePOS = maxPOS - minPOS
rangePOS
## [1] 3
meanPOS = mean(mydata$pos)
meanPOS
## [1] 1.535
sdPOS = sd(mydata$pos)
sdPOS
## [1] 0.7499298
varPOS = var(mydata$pos)
varPOS
## [1] 0.5623947

TASK 2B: Repeat the calculations of max, min, range, mean, standard deviation, and variance for the data in sales column. Name the variables as maxSales, minSales, rangeSales, meanSales, sdSales, and varSales consecutively. Next, call all of these variables to display their values.

maxSales = max(mydata$sales)
maxSales
## [1] 20450
minSales = min(mydata$sales)
minSales
## [1] 11125
rangeSales = maxSales - minSales
rangeSales
## [1] 9325
meanSales = mean(mydata$sales)
meanSales
## [1] 16717.2
sdSales = sd(mydata$sales)
sdSales
## [1] 2617.052
varSales = var(mydata$sales)
varSales
## [1] 6848961

An easy way to calculate some of the above statistics of all of these variables is with the summary() function. Below is an example.

summary(mydata)
##   case_number        sales           radio           paper      
##  Min.   : 1.00   Min.   :11125   Min.   :65.00   Min.   :35.00  
##  1st Qu.: 5.75   1st Qu.:15175   1st Qu.:70.00   1st Qu.:53.75  
##  Median :10.50   Median :16658   Median :74.50   Median :62.50  
##  Mean   :10.50   Mean   :16717   Mean   :76.10   Mean   :62.30  
##  3rd Qu.:15.25   3rd Qu.:18874   3rd Qu.:81.75   3rd Qu.:75.50  
##  Max.   :20.00   Max.   :20450   Max.   :89.00   Max.   :89.00  
##        tv             pos       
##  Min.   :250.0   Min.   :0.000  
##  1st Qu.:255.0   1st Qu.:1.200  
##  Median :270.0   Median :1.500  
##  Mean   :266.6   Mean   :1.535  
##  3rd Qu.:276.2   3rd Qu.:1.800  
##  Max.   :280.0   Max.   :3.000
summary(mydata$radio) #This is to get the summary of radio column
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   65.00   70.00   74.50   76.10   81.75   89.00

TASK 3A: Repeat the above command to get the summary of paper column.

summary(mydata$paper)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.00   53.75   62.50   62.30   75.50   89.00

TASK 3B: Repeat the above command to get the summary of tv column.

summary(mydata$tv)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   250.0   255.0   270.0   266.6   276.2   280.0

TASK 3C: Repeat the above command to get the summary of pos column.

summary(mydata$pos)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.200   1.500   1.535   1.800   3.000

TASK 3D: Repeat the above command to get the summary of sales column.

summary(mydata$sales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11125   15175   16658   16717   18874   20450

TASK 3E: There are some statistics not calculated with the summary() function. Specify which statistics you computed in TASKS 1 and 2 that are not calculated with the summary() function.

Answer: Standard Deviation, Range and Variance

================================================

PART II: QUALITATIVE ANALYSIS

Now, we will produce a basic blot of the ‘sales’ variable . Here we utilize the plot function and within the plot() function we call the variable we want to plot.

plot(mydata$sales)

We can customize the plot by adding labels to the x- and y- axis.

#xlab labels the x axis, ylab labels the y axis
plot(mydata$sales, type="b", xlab = "Case Number-sales", ylab = "Sales in $1,000") 

There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.

Now, lets plot the other variables’ graphs alongside each other.

layout(matrix(1:4,2,2)) #Layout allows us to see all the 4 graphs that you're creating below on one screen. Make sure to run the codes for TASKS 4A, 4B, 5A, and 5B in the same chunk so they are on the same layout.

##TASK 4A: Plot of radio data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$radio, type = "b", xlab = "Case number: Radio", ylab = "Amount spent in $1,000")

##TASK 4B: Plot of paper data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$paper, type = "b", xlab = "Case number: Paper", ylab = "Amount spent in $1000")

##TASK 5A: Plot of tv data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$tv, type = "b", xlab = "Case number: Tv", ylab = "Amount spent in $1000")


##TASK 5B: Plot of pos data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$pos, type = "b", xlab = "Case number: Pos", ylab = "Amount spent in $1,000")

When looking at these plots it is hard to see a particular trend. One way to observe any possible trend in the sales data would be to re-order the data from low to high. The 20 months case studies are in no particular chronological time sequence. The 20 case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them.

#Re-order sales data from low to high, and save re-ordered data in a new set called  newSales. As sales data is re-reorded associated other column fields follow.
newSales = mydata[order(mydata$sales),]
head(newSales)
##    case_number sales radio paper  tv pos
## 1            1 11125    65    89 250 1.3
## 19          19 12369    65    37 250 2.5
## 20          20 13882    68    80 252 1.4
## 5            5 13965    69    75 255 1.5
## 6            6 14999    70    71 255 2.1
## 11          11 15234    70    66 255 1.5
#Plot of sales data from the sorted newSales.
plot(newSales$sales, type="b", xlab = "Case Number-sales", ylab = "Sales in $1,000") 

layout(matrix(1:4,2,2)) #Layout allows us to see all the 4 graphs that you're creating below on one screen. Make sure to run the codes for TASKS 6A, 6B, 7A, and 7B in the same chunk so they are on the same layout.

##TASK 6A: Re-order radio data from low to high, and save re-ordered data in a new set called  newRadio. Plot the radio data from the sorted newRadio. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newRadio = mydata[order(mydata$sales),]
plot(newRadio$radio, type = "b",xlab = "Case number: Radio", ylab = "Amount spent in $1,000")


 

##TASK 6B: Re-order paper data from low to high, and save re-ordered data in a new set called  newPaper. Plot the paper data from the sorted newPaper. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newPaper = mydata[order(mydata$paper),]
plot(newPaper$paper, type = "b", xlab = "Case number: Paper", ylab = "Amount spent in $1,000")



##TASK 7A: Re-order tv data from low to high, and save re-ordered data in a new set called  newTV. Plot the tv data from the sorted newTV Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newTV = mydata[order(mydata$tv),]
plot(newTV$tv, type = "b", xlab = "Case number: TV", ylab = "Amount spent in $1000")


##TASK 7B: Re-order pos data from low to high, and save re-ordered data in a new set called  newPOS. Plot of pos data from the sorted newPOS. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newPOS = mydata[order(mydata$pos),]
plot(newPOS$pos, type = "b", xlab = "Case number: POS", ylab = "Amount spent in $1000")

The new plots reveal that there is a staedy increase in sales of Paper, TV, Radio and POS

PART III: STANDARIZED Z-VALUE

Given a sales value of $5000, calculate the corresponding z value or z score using the mean and standard deviation of sales data that you have calculated (in TASK 2B). We know that the formular to calculate z score is: zScore = (x - mean)/sd where in this case x is the sales value of $5000, mean is the mean of sales data, and sd is the standard deviation of the sales data.

TASK 9: Show your codes to calculate the z score for the sales value of $5000 and use the variable zScore to keep the z score value. Next, call the variable to display its value. Write your code below:

zScore = (5000 - meanSales)/sdSales
zScore
## [1] -4.477252

TASK 10: Answer the following question: Based on the z value above, how would you rate a $5000 sales value: very poor, poor, average, good, or very good performance? Explain your logic.

Answer:
Very Poor as zScore has a negative value