About

In this section, we will be using Tableau to learn concepts on data outliers, seasonality effect, and the relationships and impacts. There is no R coding in this lab session.

Setup

This worksheet will be used to capture your images from Tableau and to share your observations. Example of capturing and including an image is included at the end of this sheet for your reference. You will need to log onto Tableau and Connect/Import the file EuroStore.xls found in the ‘bsad_lab10’ folder.

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.


Task 1: Data Outliers and Seasonality Effect

First get familiar with the data and what each columns represent. A description of the data is provided in a seperate sheet called ‘Desc’ in the same Excel file. Refer to Lab05 for early exercise using Tableau.

In a new Tableau sheet

1A) Plot Sales (Rows) versus Week (Columns). Include a snapshot here. Analyse the data source and explain in clear words the behavior you observe.

Sum of Sales versus Week The line graph plots the sum of sales per week. Based on further analysis of the graph, it is clear that there is a gradual increase in sales until Week 22. Weeks 23-25 then encounter an extreme decrease in sales with the numbers dropping from 50,000 Euros at Week 22 to approximately 23,000 Euros at Week 23 (which is a stark contrast from the usual range of sales). Further analysis of the Excel data reveals that we are missing data for each Weeks 23-25. While all the other weeks have two data inputs, Weeks 23-25 only has one data point for the week, which explains why the sales is half the normal range of 40,000 - 50,000 Euros for the given data.

1B) Switch from SUM(Sales) to Average AVG(Sales). Change the Sales scale to be more reflective of the data. Include a snapshot here. Explain the new behavior relative to 1A).

Average Sales versus Week The graph above representing average sales per week is more reflective of the data in the Excel sheet than the previous graph plotting the sum of sales. In Weeks 23-25, for instance, it can be observed that there is no longer an extreem dip and that the sales during these weeks still remain withing the range. In fact, the sales are actually among the highest during these specific weeks. In conclusion, by using the average instead of the sum, we were able to work with the fact that there are some missing data and still produce a more accurate representation of the data. It can also be observed that Week 30 has the highest sales in both graphs.

1C) Add Temp to the Color scale found in Marks. Change SUM(Temp) to AVG(Temp). Edit the color legend to be more reflective of hot and cold temperatures. Include a snapshot here. Explain the combined behavior of sales and temperature.

Average Sales versus Week and Temperature The graph above, which represents the effect of temperature on sales, shows that there is a relationship between the two. We can observe that as temperatures are high (i.e. hot), which are represented by the red/pink color, sales tend to be high as well. In particular, towards the peak of the graph, we notice that shade/hue of red is darker than rest of the line graph. On other hand, as temperatures are cooler (blue part of line), sales tend to be decreasing or lower than when temperatures are warmer. In other words, we can assume that sales are higher during the summmer, which can be attributed to many factors, including vacations and travels.


Task 2: Relationships and Impacts

In a seperate Tableau sheet

2A) Plot Sales (Rows) versus TV (Columns). Switch both measures from SUM() to Dimension. The plot should look more like a scatter plot. Include a snapshot here. Explain the behavior of Sales versus TV. How much you think is the upper limit amount that should be invested in TV ads?

Sales vs. TV The scatter plot represents the numbers of sales that occur in relation to the total volume of target audience reached in GRP units (one unit represents one percent of target audience reached). Therefore, it can be concluded that there is no to little relationship between the two. Although sales can potentially be higher (as seen in the graph) when there exists a higher percentage of audience reached through TV, sales still occur with 0 percentage of audience reached and can be relatively high (highest being roughthly about 27,500 Euros, which is not far from 28,000 Euroes). Meaning that sales is not necessarily dependent or even related to the percentage reach through TV. Since reaching the audience above 100% would be advertising to the same people repeatedly, the upper limit should be 100. Moreover, one of the highest point of sales occur when 98% of audience targeted is reached, which would be worth reaching if the costs of doing so are not unreasonable.

2B) Overlay Radio to the previous plot using the Size. scale found in Marks. Include a snapshot here. Explain how the additional Radio ads to Tv ads is impacting Sales.

Sales vs. TV and Radio Overlaying radio to the previous plot shows that a greater percentage of audience reached per week might increase sales. It is somewhat clearer than the previous plot showing only the correlation between reach by TV and sales because it shows a greater correlation to a certain extent between advertisement and sales. It can be observed that the the size of the circles on the upper right side (higher sales) of the graph (which represents greater percentage reach through radio) tend to be larger than the lower left side (lower sales). Nevertheless, the relationship is still not very certain/strong because even when no percentage of audience is reached (TV and Radio = 0 GRP units), sales still occur. In other words, even though percentage of audience reached by TV or Radio could increase sales, sales still occur with 0 percentage of audience reached through either sources. Therefore, it can be concluded that percentage of audience reached has little to no relationship to sales based on the data given.

In a separate Tableau sheet

2C) Plot Sales versus Fuel Volume. Explain behavior.

Sales vs. Fuel Volume The correlation between Sales and Fuel Volume is a positive one. As more volume of fuel is sold, more sales happen.

2D) Overlay Temperature using the Color scale. Follow 1C) for temperature settings. Explain the new combined behavior and the impact of temperature.

Sales vs. Fuel Volume and Temperature Overlaying temperature shows that as temperatures are nearer to 30 degrees Celsius (closer to the red color), more volumes of fuel are sold and more sales occur. More specifically, when temperatures are high, sales are usually within 23,000 - 29,000 Euros and sell approximately 60,000 - 69,000 liters.

2E) Overlay Holiday using the Label scale. Include a snapshot here. Explain the new combined behavior and the impact of Holiday.

Sales vs. Fuel Volume, Temperature, and Holiday Overlaying holidays further shows that holidays certainly play a role in higher sales, especially during hotter weather (i.e.summer). It is quite obvious that the upper right hand, where the higher sales occur, is also clustered with 1s, which represents a public holiday during the week. This means that sales are particularly high during warm weather and weeks containing at least one holiday. It is also important to notice that even during cold weather with holiday days in the week, sales tend to be higher (although not as high as in summer) than cold days without holidays, with some exceptions.

In a separate sheet

2F) Use a Tree Map to best show the combined effect of Sales, Fuel Volume, Temp, and Holiday. A sample view is shown below. Consider using the Quick Filter on Holiday and Temp to isolate and better view the impact of each. You can have more than one filter at a time. Include a snapshot here.
Tree Map

Tree Map

2G) Write a small paragraph summarizing your final conclusions on what you think most affect Sales and under what conditions.

My final conclusion is that sales are most affected by temperature and holidays. This further supports previous conclusions and analyses from the plots in the tasks before. As we can see from the tree map the first couple of columns represent the highest sales. We then notice that a majority of these “boxes” are red, reflecting the high temperatures during the summer. Moreover, these boxes also mostly contain the number 1 to show that at least one public holiday occured during that particular week. Therefore, we can conclude that hot summer weeks with at least one public holiday will likely result in higher sales and more volume of fuel sold. We can also conclude that the audience reach through TV and radio do not have as much of an impact and direct relationship to sales. Although a higher reach can definitely help increase sales, no percentage of target audience reached also results in sales.