Qualitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts, and is more exploratory in nature.
Quantitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers, and is more explanatory in nature.
In this lab, we will explore both analytics using the data set provided.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work in Sakai as detailed in previous notes.
For your assignment you may be using different data sets than what is included here. Always read carefully the instructions provided, before executing any included code chunks and/or adding your own code. For clarity, tasks/questions to be completed/answered are highlighted in red color and numbered according to their particular placement in the task section. The red color is only apparent when in Preview mode. Quite often you will need to add your own code chunk.
Execute all code chunks (already included and own added), preview, check integrity, and submit final work (\(html\) file) in Sakai.
Begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure it is read in correctly.
mydata = read.csv(file="marketing.csv")
head(mydata)
Now let’s calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘sales’.
sales = mydata$sales
#Max Sales
max = max(sales)
max
[1] 20450
#Min Sales
min = min(sales)
min
[1] 11125
#Range
max-min
[1] 9325
#Mean
mean(sales)
[1] 16717.2
#Standard Deviation
sd(sales)
[1] 2617.052
#Variance
var(sales)
[1] 6848961
##### 1A) Repeat the above statistics for the variable radio (2pts)
radio = mydata$radio
#Max Radio
max = max(radio)
max
[1] 89
#Min Radio
min = min(radio)
min
[1] 65
#Range
max-min
[1] 24
#Mean
mean(radio)
[1] 76.1
#Standard Deviation
sd(radio)
[1] 7.354912
#Variance
var(radio)
[1] 54.09474
An easy way to calculate many of these statistics is with the summary() function. Below is an example.
summary(sales)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11125 15175 16658 16717 18874 20450
##### 1B) Calculate the interquartile range for sales, and determine if there are any outliers (4pts)
quantile(sales)
0% 25% 50% 75% 100%
11125.00 15175.25 16658.00 18874.25 20450.00
lowerq = quantile(sales)[2]
upperq = quantile(sales)[4]
iqr = upperq - lowerq
#Interquartile Range
upperq-lowerq
75%
3699
upperthreshold = (iqr * 1.5) + upperq
upperthreshold
75%
24422.75
lowerthreshold = lowerq - (iqr * 1.5)
lowerthreshold
25%
9626.75
boxplot(sales)
Now, we will produce a basic plot of the ‘sales’ variable . Here we call the plot function and within the plot function we refer the variable we want to plot.
plot(sales)
We can customize the plot by connecting the dots and adding labels to the x- and y- axis.
#xlab labels the x axis, ylab labels the y axis
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")
There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.
Now, lets plot the sales graph, alongside radio, paper, and tv which you will code. Make sure to run the code in the same chunk so they are on the same layout.
#Layout allows us to see all 4 graphs on one screen
layout(matrix(1:4,2,2))
sales=mydata$sales
plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")
radio=mydata$radio
plot(radio, type="b", xlab = "Case Number", ylab = "Cost of Radio Ads in $1,000")
paper=mydata$paper
plot(paper, type="b", xlab = "Case Number", ylab = "Cost of Paper Ads in $1,000")
tv=mydata$tv
plot(tv, type="b", xlab = "Case Number", ylab = "Cost of TV Ads in $1,000")
`
#Re-order sales from low to high, and save re-ordered data in a new set. As sales data is reordered, the associated other column fields follow.
newdata = mydata[order(sales),]
head(newdata)
# Redefine the new variables
newsales = newdata$sales
newradio = newdata$radio
newtv = newdata$tv
newpaper = newdata$paper
layout(matrix(1:4,2,2))
plot(newsales, type="b", xlab = "Sales Rank (by $ amount Low to High)", ylab = "Sales in $1,000")
plot(newradio, type="b", xlab = "Sales Rank (by $ amount Low to High)", ylab = "Cost of Radio Ads in $1,000")
plot(newpaper, type="b", xlab = "Sales Rank (by $ amount Low to High)", ylab = "Cost of News Ads in $1,000")
plot(newtv, type="b", xlab = "Sales Rank (by $ amount Low to High)", ylab = "Cost of TV Ads in $1,000")
##### 2B) Repeat the previous 4 graphs layout exercise using instead the above defined four new variables for sales, radio, tv, and paper (4pts) See Above
##### 2C) Explain what the new plots are revealing in terms of trending relationships (4pts) # The plot is based on the Sales in $1,000, ranked from lowest to highest. These show the makeup of the top sales based on the marketing types: TV, paper, or radio. These relationships show correlation not causation. The Sales graph shows a positive relationship with between sales rank and total sales amount. This means that as the sales rank increases, the total sales amount increases.
For the Radio expenditures graph, there is a positive and very linear relationship. This means that the sales rank will increase as the expenditures on radio ads goes up. Because it is linear, each sales rank will increase proportionally with the increase of radio expenditures.
The cost of news ads graph shows a negative correllation. This means that as the sales rank decreases, the expenditure on news ads increases. This may mean that news ads are not as effective, but we cannot prove this. There is only a negative correlation.
You are given a sales value of $25000.
##### 3A) Calculate the z-value. Based on your result, would you rate a $25000 in sales as poor, average, good, or very good performance? Explain your logic (2pts)
# z-value = (value - mean)/ standard deviation
(25000-mean(sales))/sd(sales)
[1] 3.164935
# Dollars away from the mean
3.164935*sd(sales)
[1] 8282.799
The z score shows how many standard deviations the observed value is from the mean. With a sales value of $25,000, the z-value is 3.164935. This z-value shows the sales performance is an outlier because anything more than 3 standard deviations away from the mean is a potential outlier. With this, I am assuming this is a real outlier, not a data entry error. This value shows that the sales performance is a lot better than the average, by about 8282.80 dollars better than the average sale. This means it is a very good performance. To put this in perspective, a z score of 3.164935 is in the top 99th (99.9236) percentile of sales. This means that only .0764% of sales are better than this $25,000 sale. Because of this, it is a very good performance.