Qualitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.
Quantitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.
In this lab, we will explore both analytics using the data set provided.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
For your assignment you may be using different data sets than what is included here. Read carefully the instructions on Sakai.
Begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure it is read in correctly.
mydata = read.csv(file="data/marketing.csv")
head(mydata)
## case_number sales radio paper tv pos
## 1 1 11125 65 89 250 1.3
## 2 2 16121 73 55 260 1.6
## 3 3 16440 74 58 270 1.7
## 4 4 16876 75 82 270 1.3
## 5 5 13965 69 75 255 1.5
## 6 6 14999 70 71 255 2.1
Now calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘sales’.
Sales
sales = mydata$sales
#Max Sales
max = max(sales)
max
## [1] 20450
#Min Sales
min = min(sales)
min
## [1] 11125
#Range
max-min
## [1] 9325
#Mean
mean(sales)
## [1] 16717.2
#Standard Deviation
sd(sales)
## [1] 2617.052
#Variance
var(sales)
## [1] 6848961
#Repeat the above calculations for radio, paper, tv, and pos.
Radio
radio= mydata$radio
#Max radio
maxR = max(radio)
maxR
## [1] 89
#Min radio
minR = min(radio)
minR
## [1] 65
#Range radio
RangeR = maxR-minR
RangeR
## [1] 24
#Standard Deviation Radio
sdR= sd(radio)
sdR
## [1] 7.354912
#Variance Radio
varR= var(radio)
varR
## [1] 54.09474
Paper
paper= mydata$paper
#Max Paper
maxP = max(paper)
maxP
## [1] 89
#Min Paper
minP = min(paper)
minP
## [1] 35
#Range Paper
RangeP = maxP - minP
RangeP
## [1] 54
#Standard Deviation Paper
sdP = sd(paper)
sdP
## [1] 15.35921
#Variance Paper
varP= var(paper)
varP
## [1] 235.9053
TV
tv= mydata$tv
#Max TV
maxTV = max(tv)
maxTV
## [1] 280
#Min TV
minTV = min(tv)
minTV
## [1] 250
#Range TV
RangeTV = maxTV - minTV
RangeTV
## [1] 30
#Standard Deviation TV
sdTV= sd(tv)
sdTV
## [1] 11.3388
#Variance TV
varTV= var(tv)
varTV
## [1] 128.5684
POS
pos= mydata$pos
#Max POS
maxPos= max(pos)
maxPos
## [1] 3
#Min POS
minPos= min(pos)
minPos
## [1] 0
#Range Pos
RangePos= maxPos - minPos
RangePos
## [1] 3
#Standard Variation POS
sdPos = sd(pos)
sdPos
## [1] 0.7499298
#Variance POS
varPos = var(pos)
varPos
## [1] 0.5623947
An easy way to calculate all of these statistics of all of these variables is with the summary() function. Below is an example.
summary(mydata)
## case_number sales radio paper
## Min. : 1.00 Min. :11125 Min. :65.00 Min. :35.00
## 1st Qu.: 5.75 1st Qu.:15175 1st Qu.:70.00 1st Qu.:53.75
## Median :10.50 Median :16658 Median :74.50 Median :62.50
## Mean :10.50 Mean :16717 Mean :76.10 Mean :62.30
## 3rd Qu.:15.25 3rd Qu.:18874 3rd Qu.:81.75 3rd Qu.:75.50
## Max. :20.00 Max. :20450 Max. :89.00 Max. :89.00
## tv pos
## Min. :250.0 Min. :0.000
## 1st Qu.:255.0 1st Qu.:1.200
## Median :270.0 Median :1.500
## Mean :266.6 Mean :1.535
## 3rd Qu.:276.2 3rd Qu.:1.800
## Max. :280.0 Max. :3.000
#Repeat the above for the varialble sales. There are some statistics not calculated with the summary() function Specify which.
Summary Sales
summary(sales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11125 15175 16658 16717 18874 20450
## The statistics not calculated with this function are: Standard deviation, Range and Variance.
Now, we will produce a basic blot of the ‘sales’ variable . Here we utilize the plot function and within the plot function we call the variable we want to plot.
plot(sales)
We can customize the plot by adding labels to the x- and y- axis.
#xlab labels the x axis, ylab labels the y axis
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")
There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.
Now, lets plot the sales graph, alongside radio, paper, and tv which you will code. Make sure to run the code in the same chunk so they are on the same layout.
#Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))
#Example of how to plot the sales variable
plot(sales, type="b", xlab = "Case Number", ylab = "Total Sales in $1,000")
#Plot of Radio. Label properly
plot(radio, type="b", xlab = "Case Number", ylab = "Radio Sales in $10")
#Plot of Paper. Label properly
plot(paper, type="b", xlab = "Case Number", ylab = "Paper Sales in $10")
#Plot of TV. Label properly
plot(tv, type = "b", xlab = "Case Number", ylab = "Tv Sales in $100")
When looking at these plots it is hard to see a particular trend. One way to observe any possible trend in the sales data would be to re-order the data from low to high. The 20 months case studies are in no particular chronological time sequence. The 20 case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them.
#Re-order sales from low to high, and save re-ordered data in a new set. As sales data is re-reorded associated other column fields follow.
newdata = mydata[order(sales),]
head(newdata)
## case_number sales radio paper tv pos
## 1 1 11125 65 89 250 1.3
## 19 19 12369 65 37 250 2.5
## 20 20 13882 68 80 252 1.4
## 5 5 13965 69 75 255 1.5
## 6 6 14999 70 71 255 2.1
## 11 11 15234 70 66 255 1.5
# Redefine the new variables
newsales = newdata$sales
newradio = newdata$radio
newtv = newdata$tv
newpaper = newdata$paper
#Repeat the 4 graphs layout with proper labeling using instead the four new variables for sales, radio, tv, and paper.
#Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))
#Plot of newSales.
plot(newsales, type="b", xlab = "Case Number", ylab = "Total Sales in $1,000")
#Plot of newRadio. Label properly
plot(newradio, type="b", xlab = "Case Number", ylab = "Radio Sales in $10")
#Plot of newPaper. Label properly
plot(newpaper, type="b", xlab = "Case Number", ylab = "Paper Sales in $10")
#Plot of newTV. Label properly
plot(newtv, type = "b", xlab = "Case Number", ylab = "Tv Sales in $100")
##Shares your observations on what the new plots are revealing in terms of trending relationship.
##There is a positive relation between total sales and radio and tv sales. That is, when Tv sales and radio sales go up total sales will increase. On the other hand, paper sales behavior is independent from the others, that means if the sales are increasing it can not be necessarily assumed that Paper also increased as observed in case number 19: Sales increased as tv and radio increased but paper decreased. It is important to remark, that there is the possibility that if paper increase sales are going to increase too, but its downturn wont necessarily hurt (significantly) total sales.
Given a sales value of $25000, calculate the corresponding z-value or z-score using the mean and standard deviation calculations conducted in task 1. We know that z-score = (x - mean)/sd.
# Show calculations here
sales= mydata$sales
#mean
meanSales= mean(sales)
meanSales
## [1] 16717.2
#SD
SDSales = sd(sales)
SDSales
## [1] 2617.052
x<- 25000
x
## [1] 25000
#Z-score
ZScore=(x-meanSales)/SDSales
ZScore
## [1] 3.164935
## Based on the z-value, how would you rate a `$25000` sales value: poor, average, good, or very good performance? Explain your logic. -- A Z-Value of 3.164935 means such value is 3 standard deviations away from the mean. In other words $25000 is a very good performance since it is much higher than the average. The probability or frequency that this amount of sales will occur is 0,13% and also establishes that 99% of the population performed under this amount.