About

In this section, we will be using Tableau to learn concepts on data outliers, seasonality effect, and the relationships and impacts. There is no R coding in this lab session.

Setup

This worksheet will be used to capture your images from Tableau and to share your observations. Example of capturing and including an image is included at the end of this sheet for your reference. You will need to log onto Tableau and Connect/Import the file EuroStore.xls found in the ‘bsad_lab10’ folder.

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.


mydata = read.csv("data/EuroStore.csv")
head(mydata)
##   Week Sales   TV Radio Fuel.Volume Fuel.Price Temp Holiday
## 1   26 24864 74.5  66.5       61825     104.24 27.9       1
## 2   27 23809 74.5  66.5       62617     103.97 27.7       1
## 3   28 24476 90.0  75.0       60227     107.48 29.1       1
## 4   29 25279 90.0  75.0       63273     111.75 30.0       1
## 5   30 26263 90.0  75.0       65196     109.08 29.3       1
## 6   31 24299 90.0  75.0       64789     105.36 28.1       1

PART A: DATA OUTLIERS AND SEASONALITY EFFECT

First get familiar with the data and what each columns represent. A description of the data is provided in a seperate sheet called ‘Desc’ in the same Excel file. Refer to Lab05 for early exercise using Tableau.

In a new Tableau sheet

TASK 1: Plot Sales (Rows) versus Week (Columns). Include a snapshot here. (To save the image go to Worksheet, select Export, Image, Save.) Analyse the data source and explain in clear words the behavior you observe.

ANSWER TASK 1:

This data represents a line graph that shows the sum of sale over about a 55 week period. Between the weeks of 0-21 there was a steady increase of sales from 42k to 50k. But between weeks 21-23 there was a dramatic decrease in sales dropping all the way down to 22k. From week 23-25 there was only a slight increase in sales. But from weeks 25-26 there was a surge in sales and increased back up to 50k. Between weeks 26-37 there was a slight increase but then a continous decrease ending at 41k. With the final weeks 37-55 averaging about 42k.

TASK 2: Switch from SUM(Sales) to Average AVG(Sales) by clicking the arrow on the right of Sum(Sales), select Measure and Average. Change the Sales scale to be more reflective of the data (double click on the Avg. Sales axis, under Range select Fixed, change the Fixed Start and Fixed End then close the window). Include a snapshot here. Explain the new behavior relative to Task 1).

ANSWER TASK 2:

This data represetns a line graph as well but this time having the average of sales over about a 55 week peroid. There are two differences between the sum of sales and average of sales. The main difference betweent the two graphs with the sum of sales and the average of sales is there is no significant drop for the average sales. Between weeks 21-24 for the average sales there is minor dip in the line graph but nothing significant. In regards to the general data of the average sales. The second difference is the sales on the average graph is lower compared to the sum of sales. It’s normal data where the average sales slowly increases till week 30. After week 30 the average sales slowly decreases and averages around 23k.

TASK 3: Add Temp to the Color scale found in Marks. Change SUM(Temp) to AVG(Temp). Edit the color legend to be more reflective of hot and cold temperatures (by clicking on Color, Edit Color, select relevant color, Applu, OK). Include a snapshot here. Explain the combined behavior of sales and temperature.

ANSWER TASK 3:

This data represents a line graph based on the average sales to determine when the best time to buy somerhing on sales. The inclusion on the temperature colors help identitfy the best time buy. Looking at the graph the best time to buy is in the middle with the downward trending line represented with the red temperature color.


PART B: RELATIONSHIPS AND IMPACTS

In a seperate Tableau sheet

TASK 4: Plot Sales (Rows) versus TV (Columns). Switch both measures from SUM() to Dimension. The plot should look more like a scatter plot. Include a snapshot here. Explain the behavior of Sales versus TV. How much you think is the upper limit amount that should be invested in TV ads?

ASNWER TASK 4:

The data that is represented in this scatter plot data is the advertising spent on tvs. The data shows us that to get the variation of Tv’s is the 90 TV’s. The largest amoung of spending is related to no spending on no advertising compared to the most expensive spendings. Based off the given information, the upper limit should be about 90 TV’s, this is because the return on spending more money on advertising turns out to be more costly than profitable.

TASK 5: Overlay Radio to the previous plot using the Size scale found in Marks. Include a snapshot here. Explain how the additional Radio ads to Tv ads is impacting Sales.

ANSWER TASK 5:

With the addtion of Radio to the advertising of TV’s, the radio points are shown with larger bubble dot points. As shown, there is a corrolation between higher spending on advertising with relation to radio advertising. Higher amount of spending is more effective when as it relates to ratio with higher amount spent of advertising.

In a separate Tableau sheet

TASK 6: Plot Sales versus Fuel Volume. Switch both measures from SUM() to Dimension. Explain behavior.

ANSWER TASK 6:

The data represented in this graph is between the sales to the fuel volume. From the data points that is represented, in order to generate sales one must increase the amount of fuel sold. The best amount of spend fuel sold is between 63k-66k, with a return on sales between 23k-29k.

TASK 7: Overlay Temperature using the Color scale. Follow TASK 3 for temperature settings. Explain the new combined behavior and the impact of temperature.

ANSWER TASK 7:

With the inclusion of the temperature color scale, it is better understood where the best sales are for sold fuel. Having the red data points represent the best sales of fule ranging between 26k-28k. While the blue data points represent the lower end of sales of fuel raning betwene 19k-24k.

TASK 8: Overlay Holiday using the Label scale. Include a snapshot here. Explain the new combined behavior and the impact of Holiday.

ASNWER TASK 8:

The addtion of the sum of holidays represents the times in which fuel was bought on days that fell on a holiday. This is clearly represented with both the temperature coloring and the 0 and 1 points. 1 points representing the days that people bought gas. This can tell us that the sales of fuel is up more because more people decide to travel more when they have more time to.

In a separate sheet

TASK 9: Use a Tree Map to best show the combined effect of Sales, Fuel Volume, Temp, and Holiday. A sample view is shown below. Consider using the Quick Filter on Holiday and Temp to isolate and better view the impact of each. You can have more than one filter at a time. Include a snapshot here.
TASK 10: Write a small paragraph summarizing your final conclusions on what you think most affect Sales and under what conditions.

ASNWER TASK 10:

The things that affected sales the most under the conditions used were the colored temperature to demonstrate the best and worst times for sales, holidays to better figure out if holidays made an impact or not, and radio to demonstrate that the higher money spend on radio ads correlated to higher return. These conditions looked as if they had the biggest relationships to sales and demonstrated the best trends.