Qualitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.
Quantitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.
In this lab, we will explore both analytics using the data set provided.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
For your assignment you may be using different data sets than what is included here. Read carefully the instructions on Sakai.
Begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure it is read in correctly.
mydata = read.csv(file="data/marketing.csv")
head(mydata)
Now calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘sales’.
Sales
sales = mydata$sales
radio=mydata$radio
paper=mydata$paper
tv=mydata$tv
pos=mydata$pos
#Max Sales
max_s = max(sales)
max_r = max(radio)
max_pap = max(paper)
max_tv= max(tv)
max_pos=max(pos)
max_s
[1] 20450
max_r
[1] 89
max_pap
[1] 89
max_tv
[1] 280
max_pos
[1] 3
#Min Sales
min_s = min(sales)
min_r = min(radio)
min_pap = min(paper)
min_tv= min(tv)
min_pos=min(pos)
min_s
[1] 11125
min_r
[1] 65
min_pap
[1] 35
min_tv
[1] 250
min_pos
[1] 0
#Range
max_s-min_s
[1] 9325
max_r-min_r
[1] 24
max_pap-min_pap
[1] 54
max_tv-min_tv
[1] 30
max_pos-min_pos
[1] 3
#Mean
mean(sales)
[1] 16717.2
mean(radio)
[1] 76.1
mean(paper)
[1] 62.3
mean(tv)
[1] 266.6
mean(pos)
[1] 1.535
#Standard Deviation
sd(sales)
[1] 2617.052
sd(radio)
[1] 7.354912
sd(paper)
[1] 15.35921
sd(tv)
[1] 11.3388
sd(pos)
[1] 0.7499298
#Variance
var(sales)
[1] 6848961
var(radio)
[1] 54.09474
var(paper)
[1] 235.9053
var(tv)
[1] 128.5684
var(pos)
[1] 0.5623947
#Repeat the above calculations for radio, paper, tv, and pos.
An easy way to calculate all of these statistics of all of these variables is with the summary() function. Below is an example.
summary(mydata)
case_number sales radio paper tv pos
Min. : 1.00 Min. :11125 Min. :65.00 Min. :35.00 Min. :250.0 Min. :0.000
1st Qu.: 5.75 1st Qu.:15175 1st Qu.:70.00 1st Qu.:53.75 1st Qu.:255.0 1st Qu.:1.200
Median :10.50 Median :16658 Median :74.50 Median :62.50 Median :270.0 Median :1.500
Mean :10.50 Mean :16717 Mean :76.10 Mean :62.30 Mean :266.6 Mean :1.535
3rd Qu.:15.25 3rd Qu.:18874 3rd Qu.:81.75 3rd Qu.:75.50 3rd Qu.:276.2 3rd Qu.:1.800
Max. :20.00 Max. :20450 Max. :89.00 Max. :89.00 Max. :280.0 Max. :3.000
#Repeat the above for the varialble sales. There are some statistics not calculated with the summary() function Specify which.
Now, we will produce a basic blot of the ‘sales’ variable . Here we utilize the plot function and within the plot function we call the variable we want to plot.
plot(sales)
We can customize the plot by adding labels to the x- and y- axis.
#xlab labels the x axis, ylab labels the y axis
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")
There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.
Now, lets plot the sales graph, alongside radio, paper, and tv which you will code. Make sure to run the code in the same chunk so they are on the same layout.
#Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))
#Example of how to plot the sales variable
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")
#Plot of Radio. Label properly
plot(radio, type = "b",xlab = "Case Number", ylab = "Radio in 10")
#Plot of Paper. Label properly
plot(paper, type = "b",xlab = "Case Number", ylab = "Paper in 20")
#Plot of TV. Label properly
plot(tv, type = "b",xlab = "Case Number", ylab = "tv in 15")
When looking at these plots it is hard to see a particular trend. One way to observe any possible trend in the sales data would be to re-order the data from low to high. The 20 months case studies are in no particular chronological time sequence. The 20 case numbers are independent sequentially generated numbers. Since each case is independent, we can reorder them.
#Re-order sales from low to high, and save re-ordered data in a new set. As sales data is re-reorded associated other column fields follow.
newdata = mydata[order(sales),]
head(newdata)
# Redefine the new variables
newsales = newdata$sales
newradio = newdata$radio
newtv = newdata$tv
newpaper = newdata$paper
#Repeat the 4 graphs layout with proper labeling using instead the four new variables for sales, radio, tv, and paper. #Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))
#Example of how to plot the sales variable
plot(newsales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")
#Plot of Radio. Label properly
plot(newradio, type = "b",xlab = "Case Number", ylab = "Radio in 10")
#Plot of Paper. Label properly
plot(newpaper, type = "b",xlab = "Case Number", ylab = "Paper in 20")
#Plot of TV. Label properly
plot(newtv, type = "b",xlab = "Case Number", ylab = "tv in 15")
Shares your observations on what the new plots are revealing in terms of trending relationship.
Given a sales value of $25000, calculate the corresponding z-value or z-score using the mean and standard deviation calculations conducted in task 1. We know that z-score = (x - mean)/sd.
# Show calculations here
sales_mean= mean(sales)
sales_sd=sd(sales)
z_s_sales=(25000 - sales_mean)/sales_sd
z_s_sales
[1] 3.164935
Based on the z-value, how would you rate a $25000 sales value: poor, average, good, or very good performance? Explain your logic. ## this z-score is very good performace finding that the average sales value is 16717.2 we can corrolate that with the a ztable and see that the score of 3.169 falls between .9990 to .9993 which is falls into the one percentile this is why we consider this vary good.