Analyzing Store Food Sales Using Python Programming Language

Uncovering Insights from Food Sales

Author
Affiliations

John Karuitha

Karatina University, School of Business

Graduate School of Business Administration, University of the Witwatersrand, Johannesburg, South Africa

Abstract

In this analysis, I analyse sales data in different cities and regions in the United States. The objective of this analysis is to illustrate basic data analysis using Python programming language. Python is the leading competitor for R, the other leading data science analysis programming language.

1 Background

In this analysis I analyze data from contextures regarding food sales to derive insights that could aid the management in making decisions. The data covers the sales of food products by different sales people in several cities from several regions. The food products are grouped into several categories. I do the analysis using the Pandas library in Python (Bogdanchikov, Zhaparov, and Suliyev 2013).

Tip

Please visit my rpubs site to see more data projects. Alternatively, copy and paste the link <www.rpubs.com/Karuitha> into your browser. You can also view my linkedin site for my skills and education.My Tableau public profile contains my data visualizations.

Important

Skills & Technologies Applied: Python, Quarto, Data Science, and Machine Learning.

Important

Python is a widely-used programming language known for its simplicity and readability. Created by Guido van Rossum, Python emphasizes code clarity and comes with an extensive standard library. It supports multiple programming paradigms and has a thriving ecosystem of third-party libraries and frameworks. Python is popular for web development, scientific computing, data analysis, and machine learning. Its clean syntax and active community make it an excellent choice for developers of all levels to build a variety of applications efficiently.

2 Objectives

The objective of this analysis is to illustrate basic data analysis using Python programming language. Python is the leading competitor for R, the other leading data science analysis programming language (VanderPlas 2016).

3 Data

```{python}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```

The data consists of the following variables.

  1. ID: An identifier for the transaction.

  2. Date: Date of the sales transaction.

  3. Region: Region where the sale was made.

  4. City: City in which the sale was made.

  5. Category: Category of the product, for example, bars, crackers, cookies, etc.

  6. Product: The actual product sold, for example, carrot, arrow root.

  7. Qty: The quantity of the product that was sold for a given transaction.

  8. UnitPrice: The price per unit of the product sold duting the transaction.

  9. TotalPrice: The total price of the product sold in the transaction, which is the product of quantity and unit price.

To analyze this data, I start by loading the required python packages.

```{python}
my_data = pd.read_excel("sampledatafoodsales.xlsx", sheet_name='FoodSales')
```

I set the ID column as the index.

```{python}
my_data = my_data.set_index("ID")
```

4 Exploring the Data

I start by displaying the first few rows of the data below.

```{python}
my_data.head()
```
Date Region City Category Product Qty UnitPrice TotalPrice
ID
ID07351 2022-01-01 East Boston Bars Carrot 33 1.77 58.41
ID07352 2022-01-04 East Boston Crackers Whole Wheat 87 3.49 303.63
ID07353 2022-01-07 West Los Angeles Cookies Chocolate Chip 58 1.87 108.46
ID07354 2022-01-10 East New York Cookies Chocolate Chip 82 1.87 153.34
ID07355 2022-01-13 East Boston Cookies Arrowroot 38 2.18 82.84

We then create a pairs plot to visualize the entire data.

```{python}
# help(sns.pairplot)
sns.pairplot(my_data, hue="Region", palette="mako")
```

Next, I examine the total sales (Total Revenue) of each product by product category.

```{python}
my_data.groupby('Category').mean().sort_values('Qty')
```
Qty UnitPrice TotalPrice
Category
Crackers 36.807692 3.490000 128.458846
Snacks 40.689655 2.012069 77.161034
Bars 62.138298 1.814681 112.080532
Cookies 78.568421 2.287684 181.183263

Next, I examine the sales by product.

```{python}
my_data.groupby('Product').mean().sort_values('Qty')
```
Qty UnitPrice TotalPrice
Product
Banana 26.333333 2.27 59.776667
Pretzels 26.571429 3.15 83.700000
Whole Wheat 36.807692 3.49 128.458846
Potato Chips 45.181818 1.65 75.080455
Bran 58.333333 1.87 109.083333
Carrot 65.421875 1.77 115.796719
Chocolate Chip 74.090909 1.87 138.550000
Arrowroot 78.870968 2.18 171.938710
Oatmeal Raisin 83.032258 2.84 235.811613

Table () shows the sales of the products by region.

```{python}
my_data.groupby('Region').mean().sort_values('Qty')
```
Qty UnitPrice TotalPrice
Region
West 61.553191 2.063085 125.544894
East 64.373333 2.287133 143.495733

Finally, I summarise the sales by city.

```{python}
my_data.groupby('City').mean().sort_values('Qty')
```
Qty UnitPrice TotalPrice
City
San Diego 51.717949 2.099231 105.484615
Boston 64.204545 2.425909 150.744659
New York 64.612903 2.090161 133.206935
Los Angeles 68.527273 2.037455 139.769455

5 Research Questions

5.1 Question 1: Which Regions do Different Products Categories Sell the Most?

```{python}
region_sales = my_data.groupby(['Region', 'Category']).sum()[
    'TotalPrice'].sort_values

print(region_sales)
```
<bound method Series.sort_values of Region  Category
East    Bars         6355.20
        Cookies     10683.50
        Crackers     3025.83
        Snacks       1459.83
West    Bars         4180.37
        Cookies      6528.91
        Crackers      314.10
        Snacks        777.84
Name: TotalPrice, dtype: float64>
```{python}
pd.pivot_table(data=my_data, index="Region", columns='Category',
               values='TotalPrice', aggfunc='sum')
```
Category Bars Cookies Crackers Snacks
Region
East 6355.20 10683.50 3025.83 1459.83
West 4180.37 6528.91 314.10 777.84

5.2 Question 2: Which Regions do Different Products Sell the Most?

```{python}
region_sales = my_data.groupby('Region').sum()['TotalPrice']


region_sales = my_data.groupby('Region').sum()['TotalPrice'].plot(
    kind="bar", title="Sales by Region", color="steelblue")
```

5.3 Question 1: Which Cities do Different Products Categories Sell the Most?

```{python}
my_data.head()
my_data.groupby(['City', 'Category']).sum()
```
Qty UnitPrice TotalPrice
City Category
Boston Bars 1827 53.53 3319.99
Cookies 2706 73.53 6481.50
Crackers 726 59.33 2533.74
Snacks 391 27.09 930.30
Los Angeles Bars 1611 43.18 2886.37
Cookies 1753 50.14 4044.53
Crackers 42 6.98 146.58
Snacks 363 11.76 609.84
New York Bars 1683 46.92 3035.21
Cookies 1856 50.76 4202.00
Crackers 141 17.45 492.09
Snacks 326 14.46 529.53
San Diego Bars 720 26.95 1294.00
Cookies 1149 42.90 2484.38
Crackers 48 6.98 167.52
Snacks 100 5.04 168.00

5.4 Question 2: Which City Had the Highest Sales?

Here, we see that Boston had the highest sales.

```{python}
my_data.groupby("City").sum()["TotalPrice"]
```
City
Boston         13265.53
Los Angeles     7687.32
New York        8258.83
San Diego       4113.90
Name: TotalPrice, dtype: float64

Let us plot this on a graph.

```{python}
my_data.groupby("City").sum()["TotalPrice"].plot(
    kind="bar", title="Sales by City")
```
<AxesSubplot:title={'center':'Sales by City'}, xlabel='City'>

6 Conclusion

In this analysis, I examined the sales of products in regions and cities. Please feel free to comment.

References

Bogdanchikov, A, M Zhaparov, and R Suliyev. 2013. “Python to Learn Programming.” Journal of Physics: Conference Series 423 (April): 012027. https://doi.org/10.1088/1742-6596/423/1/012027.
VanderPlas, Jake. 2016. Python Data Science Handbook: Essential Tools for Working with Data. Sebastopol, CA: O’Reilly Media.