About

Qualitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.

Quantitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.

In this lab, we will explore both analytics using the data set provided.

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Read carefully the instructions on Sakai.


Task 1: Quantitative Analysis

Begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure it is read in correctly.

mydata = read.csv(file="data/marketing.csv")
head(mydata)
##   case_number sales radio paper  tv pos
## 1           1 11125    65    89 250 1.3
## 2           2 16121    73    55 260 1.6
## 3           3 16440    74    58 270 1.7
## 4           4 16876    75    82 270 1.3
## 5           5 13965    69    75 255 1.5
## 6           6 14999    70    71 255 2.1

Now calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘sales’.

Sales

sales = mydata$sales
#Max Sales
sales_max = max(sales)
sales_max
## [1] 20450
#Min Sales
sales_min = min(sales)
sales_min
## [1] 11125
#Range
sales_max-sales_min
## [1] 9325
#Mean
sales_mean = mean(sales)
sales_mean
## [1] 16717.2
#Standard Deviation
sales_sd = sd(sales)
sales_sd
## [1] 2617.052
#Variance
sales_var = var(sales)
sales_var
## [1] 6848961
#Repeat the above calculations for radio, paper, tv, and pos. 

Radio

radio = mydata$radio
#Max Radio
radio_max = max (radio)
radio_max
## [1] 89
#Min Radio
radio_min = min (radio)
radio_min
## [1] 65
#Range 
radio_max-radio_min
## [1] 24
#Mean 
radio_mean = mean(radio)
radio_mean
## [1] 76.1
#Standard Deviation
radio_sd = sd(radio)
radio_sd
## [1] 7.354912
#Variance
radio_var = var(radio)
radio_var
## [1] 54.09474

Paper

paper = mydata$paper
#Max Paper
paper_max = max(paper)
paper_max
## [1] 89
#Min Paper
paper_min = min(paper)
paper_min 
## [1] 35
#Range
paper_max-paper_min
## [1] 54
#Mean
paper_mean = mean(paper)
paper_mean
## [1] 62.3
#Standard Deviation 
paper_sd = sd(paper)
paper_sd
## [1] 15.35921
#Variance
paper_var = var(paper)
paper_var
## [1] 235.9053

TV

tv = mydata$tv
#Max TV
tv_max = max(tv)
tv_max
## [1] 280
#Min TV
tv_min = min(tv)
tv_min 
## [1] 250
#Range
tv_max-tv_min
## [1] 30
#Mean
tv_mean = mean(tv)
tv_mean
## [1] 266.6
#Standard Deviation
tv_sd = sd(tv)
tv_sd
## [1] 11.3388
#Variance
tv_var = var(tv)
tv_var
## [1] 128.5684

POS

pos = mydata$pos
#Max Pos
pos_max = max(pos)
pos_max
## [1] 3
#Min Pos
pos_min = min(pos)
pos_min
## [1] 0
#Range
pos_max-pos_min
## [1] 3
#Mean
pos_mean = mean(pos)
pos_mean
## [1] 1.535
#Standard Deviation 
pos_sd = sd(pos)
pos_sd
## [1] 0.7499298
#Variance
pos_var = var(pos)
pos_var
## [1] 0.5623947

An easy way to calculate all of these statistics of all of these variables is with the summary() function. Below is an example.

summary(mydata)
##   case_number        sales           radio           paper      
##  Min.   : 1.00   Min.   :11125   Min.   :65.00   Min.   :35.00  
##  1st Qu.: 5.75   1st Qu.:15175   1st Qu.:70.00   1st Qu.:53.75  
##  Median :10.50   Median :16658   Median :74.50   Median :62.50  
##  Mean   :10.50   Mean   :16717   Mean   :76.10   Mean   :62.30  
##  3rd Qu.:15.25   3rd Qu.:18874   3rd Qu.:81.75   3rd Qu.:75.50  
##  Max.   :20.00   Max.   :20450   Max.   :89.00   Max.   :89.00  
##        tv             pos       
##  Min.   :250.0   Min.   :0.000  
##  1st Qu.:255.0   1st Qu.:1.200  
##  Median :270.0   Median :1.500  
##  Mean   :266.6   Mean   :1.535  
##  3rd Qu.:276.2   3rd Qu.:1.800  
##  Max.   :280.0   Max.   :3.000
#Repeat the above for the varialble sales. There are some statistics not calculated with the summary() function  Specify which.
summary(sales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11120   15180   16660   16720   18870   20450

The summary function does not include the following statistics: range, standard variation, and variance. Although range can be determined from the minimum and maximum provided in the summary, it is not given.

Task 2: Qualitative Analysis

Now, we will produce a basic blot of the ‘sales’ variable . Here we utilize the plot function and within the plot function we call the variable we want to plot.

plot(sales)

We can customize the plot by adding labels to the x- and y- axis.

#xlab labels the x axis, ylab labels the y axis
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000") 

There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.

Now, lets plot the sales graph, alongside radio, paper, and tv which you will code. Make sure to run the code in the same chunk so they are on the same layout.

#Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))

#Example of how to plot the sales variable
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000") 

#Plot of Radio. Label properly
plot(radio, type="b", xlab = "Case Number", ylab = "Radio in $1,000")

#Plot of Paper. Label properly
plot(paper, type="b", xlab = "Case Number", ylab = "Paper in $1,000")

#Plot of TV. Label properly
plot(tv, type="b", xlab = "Case Number", ylab = "TV in $1,000")

When looking at these plots it is hard to see a particular trend. One way to observe any possible trend in the sales data would be to re-order the data from low to high.

The 20 months case studies are in no particular chronological time sequence. The 20 case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them.

#Re-order sales from low to high, and save re-ordered data in a new set. As sales data is re-reorded associated other column fields follow.
newdata = mydata[order(sales),]
head(newdata)
##    case_number sales radio paper  tv pos
## 1            1 11125    65    89 250 1.3
## 19          19 12369    65    37 250 2.5
## 20          20 13882    68    80 252 1.4
## 5            5 13965    69    75 255 1.5
## 6            6 14999    70    71 255 2.1
## 11          11 15234    70    66 255 1.5
# Redefine the new variables 
newsales = newdata$sales
newradio = newdata$radio
newtv = newdata$tv
newpaper = newdata$paper
#Repeat the 4 graphs layout with proper labeling using instead the four new variables for sales, radio, tv, and paper.

#Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))

#Example of how to plot the sales variable
plot(newsales, type="b", xlab = "Case Number", ylab = "Sales in $1,000") 

#Plot of Radio. Label properly
plot(newradio, type="b", xlab = "Case Number", ylab = "Radio in $1,000")

#Plot of Paper. Label properly
plot(newpaper, type="b", xlab = "Case Number", ylab = "Paper in $1,000")

#Plot of TV. Label properly
plot(newtv, type="b", xlab = "Case Number", ylab = "TV in $1,000")

Share your observations on what the new plots are revealing in terms of trending relationship.

The new plots are much more revealing and better represent the actual correlation for each category.

Re-ordering the data (which is possible in this case because each case number is independent), allows us to see that there is a strong positive correlation in both the “Sales” plot and the “Radio” plot. Sales is increasing and the budget spent on promotion through radio is increasing. In other words, both plots show an upward trend. From this, we can then assume that there is a possibility of a strong relationship between the budget spent on this particular type of advertising and the behavior of sales.

Based on the new “TV” plot, we can see that there is a low positive correlation. From this observation, we can then conclude that a change in the budget spent on the promotion through TV might, but will not necessarily have an effect on sales.

The “Paper” plot shows low negative correlation at the beginning and no correlation towards the middle and end as the data tends to be more spreaded and shows no obvious trend. Based on this plot, one can then assume that there might be no relationship between the dollars spent on the promotion through paper and the sales performance.


Task 3: Standarized Z-Value

Given a sales value of $25000, calculate the corresponding z-value or z-score using the mean and standard deviation calculations conducted in task 1. We know that z-score = (x - mean)/sd.

#  Show calculations here
# (x - mean(sales)) / sd(sales)
zscore = (25000 - mean(sales))/sd(sales)
zscore
## [1] 3.164935

Based on the z-value, how would you rate a $25000 sales value: poor, average, good, or very good performance? Explain your logic.

Based on the z-value, which is approximately 3.16, the $25000 sales value could be considered an outlier and I would rate it at a very good performance (because it is on the positive, right-hand side/high end). Since 3.16 is greater than 3, this means that it is more than three standard deviations away from the mean. This also means that the $25000 value is performing way better than the mean and most values from the data set. To be precise, it can be assumed that it is performing better than about 99.7% of the data for sales from this specific sample, but that the $25000 sales value is also rare.