| title: “Business Analytics Lab Worksheet 04 (bsad_lab04)” |
| author: “Sahithya Mahadevan” |
| date: “February 19th, 2019” |
| output: |
| html_document: default |
| html_notebook: default |
| pdf_document: default |
| subtitle: CME Group Foundation Business Analytics Lab |
Qualitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.
Quantitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.
In this lab, we will explore both analytics using the data set provided.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
===========================================
Begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure it is read in correctly.
mydata = read.csv(file="data/marketing.csv")
head(mydata)
## case_number sales radio paper tv pos
## 1 1 11125 65 89 250 1.3
## 2 2 16121 73 55 260 1.6
## 3 3 16440 74 58 270 1.7
## 4 4 16876 75 82 270 1.3
## 5 5 13965 69 75 255 1.5
## 6 6 14999 70 71 255 2.1
Now calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘radio’.
maxRadio = max(mydata$radio) #This computes the maximum value in radio data and put it in variable maxRadio.
maxRadio #This calls variable maxRadio to display its value.
## [1] 89
minRadio = min(mydata$radio) #This calculates the minimum value in radio data and put it in variable minRadio.
minRadio #This calls variable minRadio to display its value.
## [1] 65
rangeRadio = maxRadio-minRadio #This calculates the rangeRadio which is the difference between max and min of the radio data.
rangeRadio #This displays the rangeRadio value.
## [1] 24
meanRadio = mean(mydata$radio) #This calculates the mean of radio data.
meanRadio #This displays the meanRadio value.
## [1] 76.1
sdRadio = sd(mydata$radio) #This calculates the standard deviation of radio data.
sdRadio #This displays the sdRadio value.
## [1] 7.354912
varRadio = var(mydata$radio) #This calculates the variance of radio data.
varRadio #This displays the varRadio value.
## [1] 54.09474
maxPaper = max(mydata$paper)
maxPaper
## [1] 89
minPaper = min(mydata$paper)
minPaper
## [1] 35
rangePaper = maxPaper - minPaper
rangePaper
## [1] 54
meanPaper = mean(mydata$paper)
meanPaper
## [1] 62.3
sdPaper = sd(mydata$paper)
sdPaper
## [1] 15.35921
varPaper = var(mydata$paper)
varPaper
## [1] 235.9053
maxTV = max(mydata$tv)
maxTV
## [1] 280
minTV = min(mydata$tv)
minTV
## [1] 250
rangeTV = maxTV - minTV
rangeTV
## [1] 30
meanTV = mean(mydata$tv)
meanTV
## [1] 266.6
sdTV = sd(mydata$tv)
sdTV
## [1] 11.3388
varTV = var(mydata$tv)
varTV
## [1] 128.5684
maxPOS = max(mydata$pos)
maxPOS
## [1] 3
minPOS = min(mydata$pos)
minPOS
## [1] 0
rangePOS = maxPOS - minPOS
rangePOS
## [1] 3
meanPOS = mean(mydata$pos)
meanPOS
## [1] 1.535
sdPOS = sd(mydata$pos)
sdPOS
## [1] 0.7499298
varPOS = var(mydata$pos)
varPOS
## [1] 0.5623947
maxSales = max(mydata$sales)
maxSales
## [1] 20450
minSales = min(mydata$sales)
minSales
## [1] 11125
rangeSales = maxSales - minSales
rangeSales
## [1] 9325
meanSales = mean(mydata$sales)
meanSales
## [1] 16717.2
sdSales = sd(mydata$sales)
sdSales
## [1] 2617.052
varSales = var(mydata$sales)
varSales
## [1] 6848961
An easy way to calculate some of the above statistics of all of these variables is with the summary() function. Below is an example.
summary(mydata)
## case_number sales radio paper
## Min. : 1.00 Min. :11125 Min. :65.00 Min. :35.00
## 1st Qu.: 5.75 1st Qu.:15175 1st Qu.:70.00 1st Qu.:53.75
## Median :10.50 Median :16658 Median :74.50 Median :62.50
## Mean :10.50 Mean :16717 Mean :76.10 Mean :62.30
## 3rd Qu.:15.25 3rd Qu.:18874 3rd Qu.:81.75 3rd Qu.:75.50
## Max. :20.00 Max. :20450 Max. :89.00 Max. :89.00
## tv pos
## Min. :250.0 Min. :0.000
## 1st Qu.:255.0 1st Qu.:1.200
## Median :270.0 Median :1.500
## Mean :266.6 Mean :1.535
## 3rd Qu.:276.2 3rd Qu.:1.800
## Max. :280.0 Max. :3.000
summary(mydata$radio) #This is to get the summary of radio column
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 65.00 70.00 74.50 76.10 81.75 89.00
summary(mydata$paper)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35.00 53.75 62.50 62.30 75.50 89.00
summary(mydata$tv)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 250.0 255.0 270.0 266.6 276.2 280.0
summary(mydata$pos)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.200 1.500 1.535 1.800 3.000
summary(mydata$sales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11125 15175 16658 16717 18874 20450
Answer: Standard Deviation, Range and Variance
================================================
Now, we will produce a basic blot of the ‘sales’ variable . Here we utilize the plot function and within the plot() function we call the variable we want to plot.
plot(mydata$sales)
We can customize the plot by adding labels to the x- and y- axis.
#xlab labels the x axis, ylab labels the y axis
plot(mydata$sales, type="b", xlab = "Case Number-sales", ylab = "Sales in $1,000")
There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.
Now, lets plot the other variables’ graphs alongside each other.
layout(matrix(1:4,2,2)) #Layout allows us to see all the 4 graphs that you're creating below on one screen. Make sure to run the codes for TASKS 4A, 4B, 5A, and 5B in the same chunk so they are on the same layout.
##TASK 4A: Plot of radio data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$radio, type = "b", xlab = "Case number: Radio", ylab = "Amount spent in $1,000")
##TASK 4B: Plot of paper data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$paper, type = "b", xlab = "Case number: Paper", ylab = "Amount spent in $1000")
##TASK 5A: Plot of tv data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$tv, type = "b", xlab = "Case number: Tv", ylab = "Amount spent in $1000")
##TASK 5B: Plot of pos data. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
plot(mydata$pos, type = "b", xlab = "Case number: Pos", ylab = "Amount spent in $1,000")
When looking at these plots it is hard to see a particular trend. One way to observe any possible trend in the sales data would be to re-order the data from low to high. The 20 months case studies are in no particular chronological time sequence. The 20 case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them.
#Re-order sales data from low to high, and save re-ordered data in a new set called newSales. As sales data is re-reorded associated other column fields follow.
newSales = mydata[order(mydata$sales),]
head(newSales)
## case_number sales radio paper tv pos
## 1 1 11125 65 89 250 1.3
## 19 19 12369 65 37 250 2.5
## 20 20 13882 68 80 252 1.4
## 5 5 13965 69 75 255 1.5
## 6 6 14999 70 71 255 2.1
## 11 11 15234 70 66 255 1.5
#Plot of sales data from the sorted newSales.
plot(newSales$sales, type="b", xlab = "Case Number-sales", ylab = "Sales in $1,000")
layout(matrix(1:4,2,2)) #Layout allows us to see all the 4 graphs that you're creating below on one screen. Make sure to run the codes for TASKS 6A, 6B, 7A, and 7B in the same chunk so they are on the same layout.
##TASK 6A: Re-order radio data from low to high, and save re-ordered data in a new set called newRadio. Plot the radio data from the sorted newRadio. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newRadio = mydata[order(mydata$sales),]
plot(newRadio$radio, type = "b",xlab = "Case number: Radio", ylab = "Amount spent in $1,000")
##TASK 6B: Re-order paper data from low to high, and save re-ordered data in a new set called newPaper. Plot the paper data from the sorted newPaper. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newPaper = mydata[order(mydata$paper),]
plot(newPaper$paper, type = "b", xlab = "Case number: Paper", ylab = "Amount spent in $1,000")
##TASK 7A: Re-order tv data from low to high, and save re-ordered data in a new set called newTV. Plot the tv data from the sorted newTV Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newTV = mydata[order(mydata$tv),]
plot(newTV$tv, type = "b", xlab = "Case number: TV", ylab = "Amount spent in $1000")
##TASK 7B: Re-order pos data from low to high, and save re-ordered data in a new set called newPOS. Plot of pos data from the sorted newPOS. Label the x axis properly, and label the y axis as "Amount spent in $1,000". Write your code below:
newPOS = mydata[order(mydata$pos),]
plot(newPOS$pos, type = "b", xlab = "Case number: POS", ylab = "Amount spent in $1000")
Given a sales value of $5000, calculate the corresponding z value or z score using the mean and standard deviation of sales data that you have calculated (in TASK 2B). We know that the formular to calculate z score is: zScore = (x - mean)/sd where in this case x is the sales value of $5000, mean is the mean of sales data, and sd is the standard deviation of the sales data.
zScore = (5000 - meanSales)/sdSales
zScore
## [1] -4.477252
Answer:
Very Poor as zScore has a negative value