Demo of the Process
Each data file contains four columns, each column records sales in one quarter. Worth noting that initially State and Year info is only available in file names.
We can combine them all into one data file. The merged file has 10,000 rows, with State and Year added as data fields.
Now the data is ready for analysis and visualization. For example, we can look at the trend in sales across two states.
- Based on this hypothetical data, sales in both states increased steadily over the years.
- With sales in VIC increased at a higher rate.
- VIC’s sales surpass NSW’s sales since 2009.
We could also choose recent years and check out the seasonal pattern: Comparing mean sales across seasons in the two states.
- Based on this hypothetical data, the seasonal pattern differs between the two states.
- VIC peaked when approaching the end of the financial year;
- while NSW peaked when approaching the end of the calendar year.
We could also zoom in to look at one particular year. This plot allows us to tell the difference in distribution and shows that VIC’s sales data fluctuates more than NSW’s.
- Based on this hypothetical data, we can see that data in VIC is more dispersed that that in NSW.
- Unlike the bar plot which shows only the mean, the point plot shows all individual sales numbers.
- The boxplot, which overlays with the points, shows the median and the dispersion of the data. The “box” contains 50% of the data, while outliers are highlited (more on boxplot).