In this section, we will be using Tableau to learn concepts on data outliers, seasonality effect, and the relationships and impacts. There is no R coding in this lab session.
This worksheet will be used to capture your images from Tableau and to share your observations. Example of capturing and including an image is included at the end of this sheet for your reference. You will need to log onto Tableau and Connect/Import the file EuroStore.xls found in the ‘bsad_lab10’ folder.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.
First get familiar with the data and what each columns represent. A description of the data is provided in a seperate sheet called ‘Desc’ in the same Excel file. Refer to Lab05 for early exercise using Tableau.
In a new Tableau sheet
The line graph plots the sum of sales per week. Based on further analysis of the graph, it is clear that there is a gradual increase in sales until Week 22. Weeks 23-25 then encounter an extreme decrease in sales with the numbers dropping from 50,000 Euros at Week 22 to approximately 23,000 Euros at Week 23 (which is a stark contrast from the usual range of sales). Further analysis of the Excel data reveals that we are missing data for each Weeks 23-25. While all the other weeks have two data inputs, Weeks 23-25 only has one data point for the week, which explains why the sales is half the normal range of 40,000 - 50,000 Euros for the given data.
The graph above representing average sales per week is more reflective of the data in the Excel sheet than the previous graph plotting the sum of sales. In Weeks 23-25, for instance, it can be observed that there is no longer an extreem dip and that the sales during these weeks still remain withing the range. In fact, the sales are actually among the highest during these specific weeks. In conclusion, by using the average instead of the sum, we were able to work with the fact that there are some missing data and still produce a more accurate representation of the data. It can also be observed that Week 30 has the highest sales in both graphs.
The graph above, which represents the effect of temperature on sales, shows that there is a relationship between the two. We can observe that as temperatures are high (i.e. hot), which are represented by the red/pink color, sales tend to be high as well. In particular, towards the peak of the graph, we notice that shade/hue of red is darker than rest of the line graph. On other hand, as temperatures are cooler (blue part of line), sales tend to be decreasing or lower than when temperatures are warmer. In other words, we can assume that sales are higher during the summmer, which can be attributed to many factors, including vacations and travels.
In a seperate Tableau sheet
The scatter plot represents the numbers of sales that occur in relation to the total volume of target audience reached in GRP units (one unit represents one percent of target audience reached). Therefore, it can be concluded that there is no to little relationship between the two. Although sales can potentially be higher (as seen in the graph) when there exists a higher percentage of audience reached through TV, sales still occur with 0 percentage of audience reached and can be relatively high (highest being roughthly about 27,500 Euros, which is not far from 28,000 Euroes). Meaning that sales is not necessarily dependent or even related to the percentage reach through TV. Since reaching the audience above 100% would be advertising to the same people repeatedly, the upper limit should be 100. Moreover, one of the highest point of sales occur when 98% of audience targeted is reached, which would be worth reaching if the costs of doing so are not unreasonable.
Overlaying radio to the previous plot shows that a greater percentage of audience reached per week might increase sales. It is somewhat clearer than the previous plot showing only the correlation between reach by TV and sales because it shows a greater correlation to a certain extent between advertisement and sales. It can be observed that the the size of the circles on the upper right side (higher sales) of the graph (which represents greater percentage reach through radio) tend to be larger than the lower left side (lower sales). Nevertheless, the relationship is still not very certain/strong because even when no percentage of audience is reached (TV and Radio = 0 GRP units), sales still occur. In other words, even though percentage of audience reached by TV or Radio could increase sales, sales still occur with 0 percentage of audience reached through either sources. Therefore, it can be concluded that percentage of audience reached has little to no relationship to sales based on the data given.
In a separate Tableau sheet
The correlation between Sales and Fuel Volume is a positive one. As more volume of fuel is sold, more sales happen.
Overlaying temperature shows that as temperatures are nearer to 30 degrees Celsius (closer to the red color), more volumes of fuel are sold and more sales occur. More specifically, when temperatures are high, sales are usually within 23,000 - 29,000 Euros and sell approximately 60,000 - 69,000 liters.
Overlaying holidays further shows that holidays certainly play a role in higher sales, especially during hotter weather (i.e.summer). It is quite obvious that the upper right hand, where the higher sales occur, is also clustered with 1s, which represents a public holiday during the week. This means that sales are particularly high during warm weather and weeks containing at least one holiday. It is also important to notice that even during cold weather with holiday days in the week, sales tend to be higher (although not as high as in summer) than cold days without holidays, with some exceptions.
In a separate sheet
Tree Map
My final conclusion is that sales are most affected by temperature and holidays. This further supports previous conclusions and analyses from the plots in the tasks before. As we can see from the tree map the first couple of columns represent the highest sales. We then notice that a majority of these “boxes” are red, reflecting the high temperatures during the summer. Moreover, these boxes also mostly contain the number 1 to show that at least one public holiday occured during that particular week. Therefore, we can conclude that hot summer weeks with at least one public holiday will likely result in higher sales and more volume of fuel sold. We can also conclude that the audience reach through TV and radio do not have as much of an impact and direct relationship to sales. Although a higher reach can definitely help increase sales, no percentage of target audience reached also results in sales.