About

In this section, we will be using Tableau to learn concepts on data outliers, seasonality effect, and the relationships and impacts. There is no R coding in this lab session.

Setup

This worksheet will be used to capture your images from Tableau and to share your observations. Example of capturing and including an image is included at the end of this sheet for your reference. You will need to log onto Tableau and Connect/Import the file EuroStore.xls found in the ‘bsad_lab10’ folder.

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.


Task 1: Data Outliers and Seasonality Effect

First get familiar with the data and what each columns represent. A description of the data is provided in a seperate sheet called ‘Desc’ in the same Excel file. Refer to Lab05 for early exercise using Tableau.

In a new Tableau sheet

1A) Plot Sales (Rows) versus Week (Columns). Include a snapshot here. Analyse the data source and explain in clear words the behavior you observe.

The above image is a snapshot of the variables sales vs the variable weeks. As one can observe, there is a dip in sales in the middle of the year, while all other variables/factors are still consistent regarding past and subsequent weeks.

The below image may provide insight as to why the dip is present.

When looking at the data in Tableau, every week has two records, while weeks 23, 24, and 25 only have one. This could be the source of the drastic downward change in the plotted relationship.

1B) Switch from SUM(Sales) to Average AVG(Sales). Change the Sales scale to be more reflective of the data. Include a snapshot here. Explain the new behavior relative to 1A).

The above photo depicts the relationship between average sales and the week of the year. This is a much better depiction of true sales because the average combines the values where there are duplicates (all weeks except 23, 24, and 25). Also, by scaling the axis to a minimum and maximum close to the minimum and maximum of the sales data, it provides a better scope than 1A. Additionally, this plot is a better representation of sales than the one in 1A.

1C) Add Temp to the Color scale found in Marks. Change SUM(Temp) to AVG(Temp). Edit the color legend to be more reflective of hot and cold temperatures. Include a snapshot here. Explain the combined behavior of sales and temperature.

This image is the same as the previous except the color of the line represents the temperature for the given week. The deeper the blue, the colder the temperature (measured in Celsius) and the deeper the red, the hotter the temperature. The legend to the right of the graph shows the colors and the transitions from cold to warm. The coloring of the line is accurate due to the fact that in the beginning and end of the year (winter tundra), the line is a deep blue, representing colder temperatures. Likewise, in the middle of the year, around week 26, the color shifts to a deep scarlet, signifying hot, summery weather. I chose the red and blue colors because that is what is usually the indicator for hot and cold. It appears that temperature does not affect sales. No matter the weather, sales are different along each point in the curve even during drastic changes in temperature. It seems like sales pick up during the summer months, and fall once the season transitions to Fall/Winter, then begin to pick up again in the Spring.


Task 2: Relationships and Impacts

In a seperate Tableau sheet

2A) Plot Sales (Rows) versus TV (Columns). Switch both measures from SUM() to Dimension. The plot should look more like a scatter plot. Include a snapshot here. Explain the behavior of Sales versus TV. How much you think is the upper limit amount that should be invested in TV ads?

Analyzing the data, it appears as if there is no correlation between Sales and TV ads. No matter the number of TV ads, Sales are scattered across the entire plot, thus establishing that there is little to no correlation. I believe the upper limit amount that should be invested in TV ads is about 95. The majority of the data sits inside of the 0 to 95 boundary.

2B) Overlay Radio to the previous plot using the Size. scale found in Marks. Include a snapshot here. Explain how the additional Radio ads to Tv ads is impacting Sales.

Looking at this data visualization, I still do not see a significant correlation, if at all any correlation between the added varibles and sales. It appears as if radio may have a slight impact on sales, being that the path the data sits on is a bit more linear.

In a separate Tableau sheet

2C) Plot Sales versus Fuel Volume. Explain behavior.

The above image reflects the relationship between sales and fuel volume. As one can observe, there is a strong, direct, positive relationship between the sales earned and the volume of fuel sold per week. While this is the strongest relationship yet, it is a bit intuitive. Sales at the store are derived from the product they sell, which happens to be fuel. So, of course, when fuel volume increases, so will sales.

2D) Overlay Temperature using the Color scale. Follow 1C) for temperature settings. Explain the new combined behavior and the impact of temperature.

After looking at the plot, it appears as if the volume of fuel sold per week increases as the weather gets warmer. Once again, a deeper red represents a hotter temperature, while a deeper blue represents the colder temperatures. Most of the red is gathered toward the top end of the graph, and the blue opposite, signaling that volume of fuel sales is dependent on the weather. Thus, since sales are dependent on fuel volume, sales are dependent on the temperature. With this, one would see that higher temperatures generally lead to higher sales and vice versa.

2E) Overlay Holiday using the Label scale. Include a snapshot here. Explain the new combined behavior and the impact of Holiday.

The vast number of 1’s at the top of the plot, compared to the majority of the 0’s being below, or at the bottom corner of the plot, would signal that holidays are influential to the fuel volume and sales.

In a separate sheet

2F) Use a Tree Map to best show the combined effect of Sales, Fuel Volume, Temp, and Holiday. A sample view is shown below. Consider using the Quick Filter on Holiday and Temp to isolate and better view the impact of each. You can have more than one filter at a time. Include a snapshot here.

Above is a tree map created to try and determine what factors determine the sales value. After shifting variables in and out of the map, I have concluded with the above visualization.

2G) Write a small paragraph summarizing your final conclusions on what you think most affect Sales and under what conditions.

After analyzing the data that we have visualized, the variables I believe to affect sales most are fuel volume, holidays, and temperature, in that order. Fuel volume will impact sales the most because the volume of fuel sales is what drives the EuroStore’s sales. Additionally, as the plots and tree map showed, people tend to buy more gas on holidays, and in warmer weather. The quantitiy of red to light blue data points, and spaces on the tree map emphasize this point. Looking at the plots and map, there is more red and light blue than dark blue. Lastly, it can be seen on the tree map as well as the 3rd plot, that there is a direct relationship with the number of holidays, being that the majority of data points where value is maximized are labeled with a 1 (holiday in that week).

Lab Example of tree map.