About

In this section, we will be using Tableau to learn concepts on data outliers, seasonality effect, and the relationships and impacts. There is no R coding in this lab session.

Setup

This worksheet will be used to capture your images from Tableau and to share your observations. Example of capturing and including an image is included at the end of this sheet for your reference. You will need to log onto Tableau and Connect/Import the file EuroStore.xls found in the ‘bsad_lab10’ folder.

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.


Task 1: Data Outliers and Seasonality Effect

First get familiar with the data and what each columns represent. A description of the data is provided in a seperate sheet called ‘Desc’ in the same Excel file. Refer to Lab05 for early exercise using Tableau.

Before using Tableau and starting analysis, it is a good idea to get familiar with the data you will be using. In this EuroStore excel sheet there are 101 records with 8 variables: week, sales in regards to the convenient stores (in euros), TV ads (in GRP units), Radio Ads (in GRP units), Fuel volume (in liters), fuel price (in cents per liters), temperature (in celsius), and holiday. The excel sheet is made up of two sheets- one sheet containing the actual data, and the other sheet includes a more detailed description of the variables (their units, how to interpret, etc.). By looking at this data, one can infer about the objective function and its decision variables. In this case, we would be looking to optimize sales in regards to the decision variables of TV ads and Radio Ads over many weeks. Other factors that might affect the objective function is the fuel volume, fuel price, temperature, and whether there was a public holiday that week.

In a new Tableau sheet

1A) Plot Sales (Rows) versus Week (Columns). Include a snapshot here. Analyse the data source and explain in clear words the behavior you observe.

Tablaeu automatically assumes which variables are dimensions (mostly categorical values) and which are measures (numerical). Though weeks is a numeric value, it would not make sense in an aggregate function which is why it is a dimension and appears in the x-axis. Sum(sales) appears in the y-axis. Looking at the line graph and conducting a qualitative analysis, one notices the strange behavior occurring in weeks 23-25: either the data is wrong or something is missing in the data set. To resolve this issue we must look back at the actual data set. The first record starts at Week 26, which makes it hard to analyze what exactly is going wrong, so we sort the data by week. When we do this we see that there are two records for week 1 and so on (ideas as to why there are two records: either two store or two years worth of data-more plausible), however, for weeks 23, 24, and 25, we only have 1 record, which explains the low sales in the graph (the graph shows the SUM of sales for the two records).

1B) Switch from SUM(Sales) to Average AVG(Sales). Change the Sales scale to be more reflective of the data. Include a snapshot here. Explain the new behavior relative to 1A).

By changing the y-axis from portraying the Sum of Sales for the two records of each week to portraying the average Sales, we resolve the issue in weeks 23-25. By changing the scale of the y-axis, the line graph is much easier to analyze because the data is magnified.

1C) Add Temp to the Color scale found in Marks. Change SUM(Temp) to AVG(Temp). Edit the color legend to be more reflective of hot and cold temperatures. Include a snapshot here. Explain the combined behavior of sales and temperature.

It is important to note that the temperature is represented in celsius, and the scale is represented respectively. Also keep in mind, that this data is taken from Marseille, France where 60 degrees celsius is very hot. By analyzing the data using the line graph above, one can come to the conclusion that in summer (~middle of the year), sales increase because the temperature is higher. The average sales per week per c-store location is at its max of 26,774 euros in week 30 with an average temperature of 30 degrees celsius. The min of 19,764 euros per week per c-store location occurs in week 3 with an average temperature of 11.25 celsius, which is cold.


Task 2: Relationships and Impacts

In a seperate Tableau sheet

2A) Plot Sales (Rows) versus TV (Columns). Switch both measures from SUM() to Dimension. The plot should look more like a scatter plot. Include a snapshot here. Explain the behavior of Sales versus TV. How much you think is the upper limit amount that should be invested in TV ads?

This scatter plot portrays Sales and TV ads as dimensions. In the case that there are 0 TV ads, there are so many instances of Sales (all varying in value). I think the upper limit should be 95 GRPs because after that point, the sales aren’t as high consistently, so it would not be cost effective to spend more on TV ads.

2B) Overlay Radio to the previous plot using the Size. scale found in Marks. Include a snapshot here. Explain how the additional Radio ads to Tv ads is impacting Sales.

When overlaying Radio to the previous plot, you see the effects that Radio ads has on the sales in relation to TV ads. After the upper limit of 95 decided in the previous question, the circles are very large showing that radio ads have a big impact on Sales (not as much so before that point).

In a separate Tableau sheet

2C) Plot Sales versus Fuel Volume. Explain behavior.

This scatter plot displays sales in relation to fuel volume (both as dimensions). By looking at the graph, one sees that generally as the fuel volume increases, sales increase.

2D) Overlay Temperature using the Color scale. Follow 1C) for temperature settings. Explain the new combined behavior and the impact of temperature.

This scatter plot is the same as above however the temperature is now overlayed and displayed, reinforcing that as fuel volume increases, sales increase. However now, we see that our analysis from before is true in that as the temperature is warmer (in the summer weeks of the year), the sales are higher. From this, one can make the connection that as the weather gets warmer, the fuel volume being sold is increasing meaning there is more travel.

2E) Overlay Holiday using the Label scale. Include a snapshot here. Explain the new combined behavior and the impact of Holiday.

This scatter plot is taken from above however with the addition of whether or not there was a holiday present during the week. As the fuel volume and sales increase, there seems to be a higher presence of holidays. One can make the conclusion: as the temperature grows warmer (with the presence of summer) and there is a holiday during the week, fuel volume being sold is increased possibly due to travel and with that, the sales of the local convenient stores increase.

In a separate sheet

2F) Use a Tree Map to best show the combined effect of Sales, Fuel Volume, Temp, and Holiday. A sample view is shown below. Consider using the Quick Filter on Holiday and Temp to isolate and better view the impact of each. You can have more than one filter at a time. Include a snapshot here.

With both filters in the upper right hand corner, one can move the scales and see the impact of what the tree map is portraying. Currently displayed is the full range of the temperature and the full range of the presence of a holiday during the week (0-1). As our conclusions made in previous questions, the warmer the weather, the higher the sales.

The above tree map portrays the variables with the filter of holiday on 1, and the full range of temperature. In other words, the data above is where there was at least one holiday during the week. The size of the boxes is determined on the sales volume and the the color is determined by the temperature. With holidays present in all cases above, warmer temperatures correlate with higher sales.

2G) Write a small paragraph summarizing your final conclusions on what you think most affect Sales and under what conditions.

Based on the graphs displayed throughout this lab, I would conclude that both TV ads and Radio ads influenced sales, however, many other factors play a significant role. I think the temperature played a big part in determining sales, followed then by the presence of a public holiday during the week, which I think influenced fuel volume. It would appear that during the summer months, there is a higher frequency of public holidays per week than during the other seasons, which may allow for more travel, which would result in the higher fuel volume, and with the more frequent stops at gas stations, the higher the sales at the local convenient store. Of course, even with analyzing the data, there may be other factors that need to be studied and data context is key, but the above could be a possible contender and plausible explanation.